v6.12.0: Deterministic entities.json synthesis for legacy session backfill#148
Merged
Conversation
…backfill Transparent 4-tier deterministic synthesizer inside /api/admin/sessions/:key/ rebuild-kg. When entities.json is absent from report_artifacts, synthesizes from data already structured in the DB before running the existing 10-phase KG build. Zero LLM, zero new endpoints, zero frontend changes, zero schema. Tiers (all fail-soft, skip rather than misclassify): 1. Parse ## DEAL_METADATA from orchestrator-state markdown 2. Static map: research agent report_keys → regulator catalog 3. Union with kg_nodes WHERE node_type='regulator' (session-specific) 4. Mine fact_node.fact_name for narrow entity-keyword patterns Dedup case-insensitive on canonical_name; higher-tier wins, loser's match_patterns merge into winner. 50-cap enforced at synthesis + Zod. Operator UX unchanged — existing "Rebuild KG" button gains capability: legacy sessions get entities_source: "synthesized" + per-tier audit; fresh v6.11.0+ sessions continue entities_source: "native". Failure of synthesis is logged but never blocks rebuild — LEGACY fallback still kicks in. Expected SpaceX backfill yield: ~15-18 entities. Phase 9 recovery target: 267 → ~1,500 edges. Files: src/utils/entitySynthesis.js (NEW, ~280 LoC) src/server/adminRouter.js (+35 LoC pre-step) test/sdk/entity-synthesis.test.js (NEW, 34 tests, 9 groups) CHANGELOG.md + super-legal-mcp-refactored/CHANGELOG.md Tests: 85/85 passing across 4 entities-ecosystem files (entity-synthesis, entities-json-schema, fact-validator-entities, kg-phase6-entities). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4 tasks
Owner
Author
|
Tracking issue: #149 |
6 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Transparent 4-tier deterministic synthesizer inside the existing
/api/admin/sessions/:key/rebuild-kgadmin endpoint. Whenentities.jsonis absent fromreport_artifacts(any pre-v6.11.0 session, or v6.11.0+ sessions where fact-validator skipped emission), the rebuild now synthesizes a deterministicentities.jsonfrom data already structured in the DB — zero LLM, zero new endpoints, zero frontend changes, zero DB schema changes — before running the existing 10-phase KG build.Architectural decisions (user-led)
User-led 8-question architectural review rejected four less-elegant paths:
/rebuild-entitiesendpoint + buttonPrinciple surfaced: "Don't re-derive what's already structured. Read the structured source directly."
Tier composition
## DEAL_METADATAtable fromorchestrator-statemarkdown (target, acquirer, underwriters, key_person)AGENT_REGULATOR_MAPkeyed byreport_key(architectural knowledge: which agents imply which regulators)SELECT FROM kg_nodes WHERE node_type='regulator'(catches session-specific regulators like JFTC)fact_node.properties.fact_namefor narrow entity-keyword patterns (Lead Bookrunners, Controlling Shareholder, etc.)All tiers fail-soft on unknown inputs. Cross-tier dedup by case-insensitive
canonical_name; higher-tier wins, loser'smatch_patternsmerge. 50-cap enforced at synthesis + via Zod schema.Operator experience
Rebuild KGbutton (existing) gains capability without UX change:entities_source: "native"entities_source: "synthesized"+ per-tier audit countsTest plan
test/sdk/entity-synthesis.test.js)/rebuild-kgagainst SpaceX session2026-05-16-1778951162→ expectentities_source: "synthesized",entities_audit.final_count≈ 15-18, Phase 9 edge count recovery from 267 → ~1,200-1,800/rebuild-kgagainst a v6.11.0+ fresh session → expectentities_source: "native", no synthesis log linesFiles
Runtime:
super-legal-mcp-refactored/src/utils/entitySynthesis.js(NEW, ~280 LoC)super-legal-mcp-refactored/src/server/adminRouter.js(+35 LoC pre-step in/rebuild-kghandler)Tests:
super-legal-mcp-refactored/test/sdk/entity-synthesis.test.js(NEW, 34 tests across 9 groups)Docs:
CHANGELOG.md(root, v6.12.0 entry)super-legal-mcp-refactored/CHANGELOG.md(canonical detailed entry)Risk
3/10. Additive code, fail-soft throughout, behind
requireAdminauth, idempotent viaON CONFLICT (session_id, file_path) DO UPDATE. Does not modify any existing rebuild behavior whenentities.jsonalready exists. Does not modify the v6.11.0 producer/consumer path. Synthesis failure logs and falls through to existing LEGACY_DIGITALBRIDGE_FALLBACK (i.e., v6.11.0-without-synthesis behavior).Rollback
Revert the adminRouter.js pre-step block (~35 LoC) →
/rebuild-kgreturns to v6.11.0 behavior.entitySynthesis.jsbecomes dead code, harmless. ~5 min.🤖 Generated with Claude Code