Skip to content

v6.12.0: Deterministic entities.json synthesis for legacy session backfill #149

Description

@Number531

Status

Shipped as PR #148, merged eec9f49f. Deploy in flight as be0sf2dr7 (2026-05-17).

Problem statement

After v6.11.0 (PR #147) shipped the entities.json producer/consumer chain, two gaps remained:

  1. Pre-v6.11.0 sessions (all historical sessions including the SpaceX-IPO motivating case 2026-05-16-1778951162) lack the entities.json artifact and cannot benefit from the new Phase 6 path. Rebuilds fall back to LEGACY_DIGITALBRIDGE_FALLBACK → 0 useful entity nodes for non-DigitalBridge memos → Phase 9 edge collapse.
  2. v6.11.0+ sessions where fact-validator skips entities.json emission (model instruction-following failures still happen — verified that the pre-v6.11.0 fact-validator prompt asked for ## Entity Names table but SpaceX session's fact-registry has no such section) have no recovery mechanism.

Solution — deterministic 4-tier synthesizer inside existing admin endpoint

User-led architectural conversation explored and rejected:

  • LLM re-invocation of fact-validator (~$0.05–$1/session, nondeterministic)
  • §II.C "Entity Names" markdown parser (section absent in most real sessions)
  • New `/rebuild-entities` endpoint + frontend button (two-click operator UX)
  • Replacing v6.11.0 fact-validator emission entirely (premature 4 hours after ship)

Final architecture: transparent pre-step inside the existing `POST /api/admin/sessions/:key/rebuild-kg` endpoint. When entities.json is absent from `report_artifacts`, synthesize from data already structured in the DB before running the existing 10-phase KG build.

Principle: Don't re-derive what's already structured. Read the structured source directly.

Tier composition

Tier Source Yields
1 Parse `## DEAL_METADATA` table from `orchestrator-state` markdown target, acquirer, underwriters (comma-split), co_investor, key_person
2 Static `AGENT_REGULATOR_MAP` keyed by `report_key` Regulators implied by which research agents ran
3 `SELECT FROM kg_nodes WHERE node_type='regulator'` Session-specific regulators no static map predicts (e.g., JFTC)
4 Mine `fact_node.properties.fact_name` for narrow entity-keyword patterns Lead Bookrunners, Controlling Shareholder, co-investor lists

All tiers fail-soft (skip rather than misclassify — PR #130 mitigation). Dedup case-insensitive by canonical_name; higher-tier wins. 50-cap at synthesis + Zod.

Files

Runtime:

  • `src/utils/entitySynthesis.js` (NEW, ~280 LoC)
  • `src/server/adminRouter.js` (+35 LoC pre-step)

Tests:

  • `test/sdk/entity-synthesis.test.js` (NEW, 34 tests across 9 groups)

Docs:

  • Root + service CHANGELOG.md (v6.12.0 entries)

Expected impact

Session class Behavior change
Pre-v6.11.0 (incl. SpaceX) First /rebuild-kg → synthesizes ~15-18 entities, Phase 9 recovers 267 → ~1,200-1,800 edges
v6.11.0+ with native entities.json No change (entities_source: "native")
v6.11.0+ where fact-validator skipped emission Falls into the synthesis path on next /rebuild-kg

Validation gauntlet (post-deploy)

  • Trigger /rebuild-kg for SpaceX session `2026-05-16-1778951162` → expect `entities_source: "synthesized"`, `entities_audit.final_count` ≈ 15-18
  • Confirm Phase 9 edge count recovery (267 → ~1,200-1,800)
  • Trigger /rebuild-kg for a fresh v6.11.0+ session → expect `entities_source: "native"`, no synthesis log lines
  • Confirm tier audit numbers in server logs match expected tier yields

Properties

  • Zero LLM dependency (pure code + SQL + Zod)
  • Zero new endpoints (extends existing /rebuild-kg)
  • Zero frontend changes (existing "Rebuild KG" button gains capability)
  • Zero DB schema changes (uses existing report_artifacts shape)
  • Idempotent (ON CONFLICT DO UPDATE)
  • Fail-soft (synthesis errors logged, KG rebuild proceeds via LEGACY fallback)

Risk

3/10 — additive code, fail-soft throughout, behind requireAdmin auth.

Rollback

Revert adminRouter.js pre-step block (~35 LoC) → /rebuild-kg returns to v6.11.0 behavior. entitySynthesis.js becomes dead code, harmless. ~5 min.

Related

🤖 Generated with Claude Code

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions