Status
Shipped as PR #148, merged eec9f49f. Deploy in flight as be0sf2dr7 (2026-05-17).
Problem statement
After v6.11.0 (PR #147) shipped the entities.json producer/consumer chain, two gaps remained:
- Pre-v6.11.0 sessions (all historical sessions including the SpaceX-IPO motivating case
2026-05-16-1778951162) lack the entities.json artifact and cannot benefit from the new Phase 6 path. Rebuilds fall back to LEGACY_DIGITALBRIDGE_FALLBACK → 0 useful entity nodes for non-DigitalBridge memos → Phase 9 edge collapse.
- v6.11.0+ sessions where fact-validator skips entities.json emission (model instruction-following failures still happen — verified that the pre-v6.11.0 fact-validator prompt asked for
## Entity Names table but SpaceX session's fact-registry has no such section) have no recovery mechanism.
Solution — deterministic 4-tier synthesizer inside existing admin endpoint
User-led architectural conversation explored and rejected:
- LLM re-invocation of fact-validator (~$0.05–$1/session, nondeterministic)
- §II.C "Entity Names" markdown parser (section absent in most real sessions)
- New `/rebuild-entities` endpoint + frontend button (two-click operator UX)
- Replacing v6.11.0 fact-validator emission entirely (premature 4 hours after ship)
Final architecture: transparent pre-step inside the existing `POST /api/admin/sessions/:key/rebuild-kg` endpoint. When entities.json is absent from `report_artifacts`, synthesize from data already structured in the DB before running the existing 10-phase KG build.
Principle: Don't re-derive what's already structured. Read the structured source directly.
Tier composition
| Tier |
Source |
Yields |
| 1 |
Parse `## DEAL_METADATA` table from `orchestrator-state` markdown |
target, acquirer, underwriters (comma-split), co_investor, key_person |
| 2 |
Static `AGENT_REGULATOR_MAP` keyed by `report_key` |
Regulators implied by which research agents ran |
| 3 |
`SELECT FROM kg_nodes WHERE node_type='regulator'` |
Session-specific regulators no static map predicts (e.g., JFTC) |
| 4 |
Mine `fact_node.properties.fact_name` for narrow entity-keyword patterns |
Lead Bookrunners, Controlling Shareholder, co-investor lists |
All tiers fail-soft (skip rather than misclassify — PR #130 mitigation). Dedup case-insensitive by canonical_name; higher-tier wins. 50-cap at synthesis + Zod.
Files
Runtime:
- `src/utils/entitySynthesis.js` (NEW, ~280 LoC)
- `src/server/adminRouter.js` (+35 LoC pre-step)
Tests:
- `test/sdk/entity-synthesis.test.js` (NEW, 34 tests across 9 groups)
Docs:
- Root + service CHANGELOG.md (v6.12.0 entries)
Expected impact
| Session class |
Behavior change |
| Pre-v6.11.0 (incl. SpaceX) |
First /rebuild-kg → synthesizes ~15-18 entities, Phase 9 recovers 267 → ~1,200-1,800 edges |
| v6.11.0+ with native entities.json |
No change (entities_source: "native") |
| v6.11.0+ where fact-validator skipped emission |
Falls into the synthesis path on next /rebuild-kg |
Validation gauntlet (post-deploy)
Properties
- Zero LLM dependency (pure code + SQL + Zod)
- Zero new endpoints (extends existing /rebuild-kg)
- Zero frontend changes (existing "Rebuild KG" button gains capability)
- Zero DB schema changes (uses existing report_artifacts shape)
- Idempotent (ON CONFLICT DO UPDATE)
- Fail-soft (synthesis errors logged, KG rebuild proceeds via LEGACY fallback)
Risk
3/10 — additive code, fail-soft throughout, behind requireAdmin auth.
Rollback
Revert adminRouter.js pre-step block (~35 LoC) → /rebuild-kg returns to v6.11.0 behavior. entitySynthesis.js becomes dead code, harmless. ~5 min.
Related
🤖 Generated with Claude Code
Status
Shipped as PR #148, merged
eec9f49f. Deploy in flight asbe0sf2dr7(2026-05-17).Problem statement
After v6.11.0 (PR #147) shipped the entities.json producer/consumer chain, two gaps remained:
2026-05-16-1778951162) lack the entities.json artifact and cannot benefit from the new Phase 6 path. Rebuilds fall back to LEGACY_DIGITALBRIDGE_FALLBACK → 0 useful entity nodes for non-DigitalBridge memos → Phase 9 edge collapse.## Entity Namestable but SpaceX session's fact-registry has no such section) have no recovery mechanism.Solution — deterministic 4-tier synthesizer inside existing admin endpoint
User-led architectural conversation explored and rejected:
Final architecture: transparent pre-step inside the existing `POST /api/admin/sessions/:key/rebuild-kg` endpoint. When entities.json is absent from `report_artifacts`, synthesize from data already structured in the DB before running the existing 10-phase KG build.
Principle: Don't re-derive what's already structured. Read the structured source directly.
Tier composition
All tiers fail-soft (skip rather than misclassify — PR #130 mitigation). Dedup case-insensitive by canonical_name; higher-tier wins. 50-cap at synthesis + Zod.
Files
Runtime:
Tests:
Docs:
Expected impact
Validation gauntlet (post-deploy)
Properties
Risk
3/10 — additive code, fail-soft throughout, behind requireAdmin auth.
Rollback
Revert adminRouter.js pre-step block (~35 LoC) → /rebuild-kg returns to v6.11.0 behavior. entitySynthesis.js becomes dead code, harmless. ~5 min.
Related
🤖 Generated with Claude Code