v6.11.0: Dynamic KG entity extraction via fact-validator entities.json sidecar#147
Merged
Conversation
Closes the systemic gap observed in SpaceX-IPO session 2026-05-16-1778951162
where the prompt enhancer pre-computed comparable trading multiples (live FMP
work — equity-analyst's job) using static web estimates, removing the
orchestrator's incentive to invoke equity-analyst. Result: 0 FMP tool calls,
0 equity-analyst invocations in a 3h 9min, 43-report IPO due-diligence memo.
36 FMP tools + 11 code-execution models (M46–M58) shipped in v7.0.0 sat
unused. financial-analyst was invoked instead but has no FMP domain access,
so it relied on web search for comparables.
Same gap silently fails for every specialist added after promptEnhancer.js
was last touched — the enhancer pre-dates equity-analyst.
This PR composes a live capability catalog (45 subagents + 34 MCP domains +
feature-flag-gated availability) from existing introspection sources and
injects it into the Haiku enhancer system prompt with a behavioral directive
to route specialist deliverables instead of pre-answering them.
ARCHITECTURE — pure composition layer, zero new data
src/config/promptEnhancerCatalog.js (NEW, 280 LoC)
- buildEnhancerCatalog(flags) → markdown for Haiku injection
- buildCatalogJSON(flags) → structured output for future /api/catalog
refactor (single source of truth)
- CATALOG_VERSION constant for schema evolution
- ROUTING_DIRECTIVE constant — exported separately so iterations are
versionable + testable
- Internal _-prefixed helpers (matches codebase convention)
- Defensive degradation: missing AGENT_DISPLAY_META → fallback to
def.description + logWarn('catalog_agent_meta_missing'); missing
MUST BE USED block → empty triggers + warn (only when description
has "Use PROACTIVELY for:" signaling triggers were expected); flag
OFF → empty string + enhancer falls back to pre-PR behavior
Sources reused (no duplication, all read-only):
- LEGAL_SUBAGENTS registry enumeration (45 agents)
- AGENT_DISPLAY_META hand-curated role/expertise/dealContext
(231 lines, 44 of 45 agents covered)
- SUBAGENT_DOMAIN_MAP per-agent domain list (boot-frozen with
feature-flag evaluation already done)
- DOMAIN_GROUPS + getDomainToolCounts() domain → tool count
- DOMAIN_DISPLAY_META domain capability descriptions
- agent.description MUST BE USED trigger extraction +
AGENT_DISPLAY_META fallback
ENHANCER WIRE-IN (5 LoC at promptEnhancer.js:122-144)
Import buildEnhancerCatalog + setPromptEnhancerCatalogChars. Build catalog
per-call when PROMPT_ENHANCER_DYNAMIC_CATALOG=true, inject between
intakePrompt and MANDATORY OUTPUT FORMAT section. Set Prometheus gauge
with catalog length each call. Zero change to approval flow, output
format, state file shape, or downstream consumers.
OBSERVABILITY
Prometheus gauge claude_prompt_enhancer_catalog_chars added to
sdkMetrics.js. Production validation: gauge should read ~54000 chars when
the dynamic catalog is engaged (45 agents × ~1 KB + 9 KB routing directive).
0 = feature disabled or builder short-circuited.
FEATURE FLAG #43 (default ON)
PROMPT_ENHANCER_DYNAMIC_CATALOG (featureFlags.js + docs/feature-flags.md
bumped 4.3 → 4.4, total flags 41 → 42). Emergency rollback: set false +
restart, enhancer reverts to pre-PR behavior. No data migration.
DYNAMISM CONTRACT
Adding a new subagent (file under legalSubagents/agents/ + entry in
legalSubagents/index.js + AGENT_DISPLAY_META) causes the agent to appear
in the catalog on the next enhancement call with ZERO changes to the
builder or enhancer. The auto-discovery test (test group 4) enforces this
by asserting the rendered agent count exactly matches LEGAL_SUBAGENTS
registry size.
TESTS (29 tests, 2 snapshots — all passing)
test/sdk/prompt-enhancer-catalog.test.js
Group 1: shape + content invariants (9 tests)
Group 2: trigger extraction via real agent descriptions (4 tests)
Group 3: snapshot stability — header + routing directive (2 tests)
Group 4: auto-discovery contract (3 tests) — proves dynamism
Group 5: defensive degradation (3 tests)
Group 6: idempotence — pure function contract (2 tests)
Group 7: buildCatalogJSON structured output (4 tests)
Group 8: integration with live featureFlags object (2 tests)
Adjacent test/sdk/domain-mcp-servers.test.js still passes 28/28 (no
regression in upstream introspection helpers).
SMOKE TEST (live registry, this branch)
buildEnhancerCatalog produces 53,932 chars. Contains all 45 registered
agents, 34 MCP domains, full routing directive with SpaceX-IPO illustrative
example, FMP_ENABLED flag header line. equity-analyst entry includes its
hand-curated expertise text citing 36 FMP tools + 11 code-execution models.
OUT OF SCOPE (separate work)
- Orchestrator-level dispatch prompt vocabulary tweaks (deferred pending
empirical validation that the enhancer fix alone is sufficient)
- /api/catalog refactor to consume buildCatalogJSON (same pattern, no
customer-visible change)
- Same catalog injection into P0 orchestrator + citation-verifier dispatch
- Skill template updates (subagent-scaffold, api-integration,
feature-compliance-scaffold D11) — covered in PR2
@see plans/floating-cooking-flute.md
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…(PR2) Companion to bb36077 (PR1, dynamic subagent catalog injection). PR1's dynamism guarantee holds at runtime — adding a new entry to LEGAL_SUBAGENTS + AGENT_DISPLAY_META causes the agent to appear in the next enhancement catalog with zero code change to the builder. But the guarantee only materializes if NEW subagents/domains actually GET their catalog metadata populated when created. PR2 tightens three skill templates so the catalog inputs are populated by default at scaffold-time + audited pre-merge. THREE SKILL UPDATES 1. subagent-scaffold (scripts/wire-registries.py + SKILL.md) Previous behavior: emitted agentDisplayMeta entry with `label`/`icon`/ `color` fields — but the actual agentDisplayMeta.js schema is `role`/`expertise`/`dealContext` (44 of 45 existing agents follow this). The scaffold had been emitting the wrong shape since inception. Now: - Emits complete AGENT_DISPLAY_META entry with role + expertise + dealContext, with explicit TODO scaffolding guiding the operator to write a ≥100-char capability paragraph citing data sources, deliverables, code-execution models, and feature-flag gating. - Adds explicit step #8 (separate from #7 agentClassifications.js) so the requirement is visible in the printed checklist. - Adds step #9 reminding operator to verify the agent's description includes a `MUST BE USED when user mentions:` block — required for the catalog's trigger extraction + the orchestrator's keyword routing. - SKILL.md frontmatter "7 mandatory wiring files" → "8 mandatory wiring files"; description explicitly cites the feature flag #43 dependency and the canonical equity-analyst pattern. 2. api-integration (SKILL.md) Previous: §6.1 mentioned `domainDisplayMeta.js` as a post-merge frontend update with no urgency framing — operator could ship a new MCP domain without an entry and only the frontend would silently render an empty description card. Now: §6.1 explicitly identifies DOMAIN_DISPLAY_META as a dual-consumer surface (frontend /api/catalog AND prompt-enhancer dynamic catalog), cites the WARNING that D11-catalog dimension will flag pre-merge if missing, and points to canonical patterns (sec, fred, equities) for the 25-50 word capability description. Description requirement bumped from "25-30 words" to "25-50 words" matching observed existing entries. 3. feature-compliance-scaffold (scripts/dimensions/D11-catalog.py NEW + SKILL.md updated D1-D10 → D1-D11) New WARNING-severity dimension. Three sub-checks: D11.1 — AGENT_DISPLAY_META[name].expertise present + ≥100 chars D11.2 — agent description includes "MUST BE USED when:" block (only checked when "Use PROACTIVELY for:" present — synthesis/ QA agents legitimately omit both) D11.3 — DOMAIN_DISPLAY_META[domain] entry present for new domains Auto-discovered by check.sh dimension glob (ls D*.py) — no manifest edit required. Tolerates `domains`, `mcp_domains`, or `domain_groups` key in symbols.json for upstream extract-feature-symbols.py flexibility. Smoke-tested against synthetic symbols json with mixed pass/fail cases — correctly detected: ✓ equity-analyst has complete metadata (no warning) ✓ intake-research-analyst lacks AGENT_DISPLAY_META (real gap, flagged) ✓ fake-domain-not-registered missing from DOMAIN_DISPLAY_META Existing 3 fixture tests still pass (no regression on D1-D10). END-TO-END DYNAMISM CONTRACT (after PR1 + PR2) Scaffold-time → subagent-scaffold + api-integration skills emit complete AGENT_DISPLAY_META + DOMAIN_DISPLAY_META as REQUIRED steps; operator can't easily skip them (printed checklist + WARNING messaging) Pre-merge → feature-compliance-scaffold D11-catalog dimension surfaces any gaps before PR opens Runtime → buildEnhancerCatalog reads live registry + display-meta + flags; new entries appear automatically on next enhancement call (PR1's contract, enforced by Test Group 4 auto-discovery test) Defensive → if all three upstream layers miss, builder still includes the agent (fallback description) + emits logWarn('catalog_agent_meta_missing') — gap surfaces operationally rather than silently dropping the agent OUT OF SCOPE (separate work) - Backfill AGENT_DISPLAY_META entry for intake-research-analyst (the 1 of 45 currently missing) — operator decision whether to add - Migrating the old `label`/`icon`/`color` schema that the scaffold used to emit — never landed in production (44 of 45 existing entries already use role/expertise/dealContext correctly) @see plans/floating-cooking-flute.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…udit (PR3) Three independent background Explore agents reviewed commits bb36077 (PR1 runtime catalog injection) + bc5c80b (PR2 skill template loop). Surfaced one SHOWSTOPPER + two real issues. 1. SHOWSTOPPER — enhancedPrompt was discarded by the orchestrator agentStreamHandler.js captured the enhancer's output at L240 but never forwarded it to agentQuery(). The orchestrator at L281 received ctx.currentPrompt (initialized in streamContext.js:58 as ctx.userQuery, the RAW user query) — so every [ROUTE TO <agent>: ...] tag Haiku produced under the dynamic catalog directive was thrown away. PR1 was 100% decorative at runtime: built a beautiful catalog, injected it into Haiku's system prompt, captured Haiku's routing-tagged enhancement, persisted it to disk + SSE for the frontend, and then ran the orchestrator on the original query anyway. The original SpaceX-IPO failure mode (orchestrator never invoking equity-analyst) would have repeated identically post-PR1. FIX: agentStreamHandler.js L240-243 — when runPromptEnhancementPhase returns a non-null enhanced prompt, assign ctx.currentPrompt = enhancedPrompt. ctx.userQuery preserved unchanged for downstream consumers that need the original (analytics, audit, etc.). Pattern is consistent with L518 where ctx.currentPrompt is already mutated mid-stream for AUTO_CONTINUATION. Verification: re-run SpaceX-IPO-class prompt post-deploy. Expect to see in hook_audit_log: (a) subagent_start for equity-analyst, (b) pre_tool_use for mcp__equities__*, (c) enhanced prompt at reports/<session>/ enhanced-prompt.md contains [ROUTE TO equity-analyst] tags. 2. buildCatalogJSON() violated its own pure-function contract new Date().toISOString() embedded in the JSON output meant two consecutive calls with identical flags produced different output. The function header explicitly claimed "Same flags input → byte-identical output." Test group 6 idempotence assertion would have masked it (two same-millisecond calls collide). FIX: promptEnhancerCatalog.js — generated_at is now caller-supplied via optional `{ generatedAt }` option (defaults to null). Pure-function contract preserved by default; production callers who need a timestamp pass it explicitly. Three new tests: generated_at default is null, caller-supplied timestamp is honored, idempotence guarantee across JSON.stringify of two consecutive calls. 3. claude_prompt_enhancer_catalog_chars gauge semantic was ambiguous Help text didn't clarify whether the gauge measures catalog-build success or enhancement-pipeline success. The gauge fires at runPromptEnhancementPhase entry BEFORE the Haiku API call — so a downstream API failure leaves the gauge reading "catalog injected" while the enhancement itself failed. After deliberation: this is the CORRECT semantic. Build/inject and end-to-end success are two different things and should be observable separately. The fix is documentation: gauge help text now explicitly says "NOT a success signal" and points operators to the prompt_enhancement_status SSE event + hook_audit_log AgentProgress entry for end-to-end success rates. 4. D11-catalog.py domain-detection sub-check was dead code D11.3 (verify DOMAIN_DISPLAY_META[domain] exists for new domains) expected symbols.domains/mcp_domains/domain_groups keys, but extract-feature-symbols.py never populated any of these — empty_symbols() omitted the key entirely and extract_from_diff() had no domain-detection regex. FIX: extract-feature-symbols.py — empty_symbols() now includes a "domains" key. New NEW_DOMAIN_RE regex matches additions to DOMAIN_GROUPS in domainMcpServers.js, including feature-flag-gated patterns like `...(featureFlags.FMP_ENABLED ? { 'equities': equitiesTools } : {})`. Hand-tested against 5 sample diff lines (positive + negative cases). D11.3 now actually fires on PRs that add MCP domains. Tests: 32/32 prompt-enhancer-catalog (3 new for idempotence) + 3/3 feature-compliance-scaffold fixtures still passing. Honest limit acknowledged: the SHOWSTOPPER fix has not been live-tested. Will be verified empirically by the next /deploy + post-deploy IB-class prompt — operator should look for equity-analyst in hook_audit_log subagent_start events. @see plans/floating-cooking-flute.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…coped before/after diff (Option C)
PR3 introduced NEW_DOMAIN_RE to detect MCP domain additions for D11.3
(catalog domain-metadata check). That regex required the value to match
the `<thing>Tools` naming convention — brittle against:
- 'name': SOME_TOOLS_CONST (uppercase constant)
- 'name': [...toolsA, ...toolsB] (array spread)
- 'name': require('./tools').default (CJS import)
- any future variant a contributor might write
Replaced with a block-scoped before/after set diff using a new
get_domain_group_keys_at_ref() helper. The helper:
- reads the DOMAIN_GROUPS object literal from git show <ref>:<file>
- extracts every 'key': pattern inside the {...} block
- returns a Set so caller can compute (after - before) for new domains
Better on every "zero-break" dimension:
- text-only (works on broken branches that don't compile)
- no Node dependency in the Python audit pipeline
- independent of value naming convention (any 'key': anything works)
- tolerates cosmetic reformats (set diff = ∅ when no logical change)
- graceful degradation on missing files / refs (returns ∅, no crash)
Verified against current HEAD: helper extracts all 37 DOMAIN_GROUPS
keys correctly, including feature-flag-gated entries (equities,
code-execution, direct-fetch, exa-search) that depend on
`...(featureFlags.X ? { 'name': tools } : {})` patterns.
ISOLATION
Zero runtime impact. This is operator audit tooling — Python script
invoked locally pre-PR-merge via /feature-compliance-scaffold. Never
imported by the Node server, never bundled in the Docker image, never
touches the orchestrator or prompt-enhancer code paths. Modifies one
file in .claude/skills/feature-compliance-scaffold/scripts/.
Fixtures: 3/3 still pass (no regression on D1-D10).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ture flag The flag was a mischaracterization. Adding a flag implies "this is an optional capability, can be toggled off." The reality is that the original prompt-enhancer behavior was BROKEN — it pre-computed specialist deliverables (live trading multiples, DCF, CFIUS analysis) from static web estimates because it had zero awareness of the 45-subagent registry. The dynamic catalog injection is a bug fix for that architectural gap, not an experiment. If the catalog ever needs to be reverted, the right path is: revert the commit + redeploy. Same workflow as any other code-level rollback. Not something operators should be able to flip via env var. CHANGES src/config/featureFlags.js — remove PROMPT_ENHANCER_DYNAMIC_CATALOG declaration (-12 LoC) src/server/promptEnhancer.js — drop the conditional; catalog is now always built and injected. Comment updated to call it "essential, not optional." (-24/+18 LoC net) docs/feature-flags.md — delete §43 entry, restore Quick Reference count 41 → 41 (no net adds since v4.3), remove §43 row. Replace with a brief "NOT a feature flag" pointer in the Dead Code section explaining the decision + linking to the relevant files. (-60/+32 LoC net) PRESERVED Prometheus gauge claude_prompt_enhancer_catalog_chars stays — it's observability, not a control surface. Operators still need to verify the catalog is being built correctly (post-deploy, gauge should read ~54000 chars per enhancement call). All 32 prompt-enhancer-catalog tests continue to pass — the tests exercise buildEnhancerCatalog() directly with flag fixtures, not via the enhancer's now-unconditional injection. Defensive degradation paths in promptEnhancerCatalog.js are unchanged — missing AGENT_DISPLAY_META still falls back to agent.description with a catalog_agent_meta_missing warning. The catalog never throws. CALLER COUNT Zero existing flag consumers anywhere in the codebase — flag was new in commit bb36077 and only referenced in promptEnhancer.js + docs/feature-flags.md. Both updated. No external migration needed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rator wire-in Document the 5-commit lineage (bb36077 + bc5c80b + 38fda51 + 6f20819 + b654a85) under a single v6.10.0 entry in both root + service CHANGELOGs. Covers: original SpaceX-IPO failure mode, two-layer root cause (enhancer ignorance + orchestrator discard), four-layer fix (catalog builder + enhancer injection + orchestrator wire-in + skill template loop), the mid-release flag-removal pivot (feature flag → essential infrastructure), 32-test coverage breakdown, and the unverified-in-production honest limit. Service CHANGELOG entry is the canonical detailed version (~115 LoC); root CHANGELOG is the concise leadership-readable version (~33 LoC) with pointer to the service CHANGELOG for full technical detail. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…efault OFF)
PR1 of two-PR rollout that replaces the hardcoded entity list in KG Phase 6
(kgPhases6to8.js:73-83, hardcoded to 9 DigitalBridge/SoftBank/ADIA names)
with a dynamic per-session entities.json sidecar emitted by fact-validator.
Closes the systemic gap exposed by SpaceX-IPO session 2026-05-16-1778951162
(632 nodes / 267 edges vs March 31 baseline 1083/2062 — edge density
collapsed -78% due to Phase 9 O(n²) cardinality starvation from ~0 entity
anchors).
PR1 ships the PRODUCER ONLY — fact-validator emits the sidecar when
explicitly opted-in via FACT_VALIDATOR_EMIT_ENTITIES_JSON=true. KG Phase 6
consumer wiring lands in PR2 after staging soak validates the producer.
This split (per M1 mitigation in /Users/ej/.claude/plans/floating-cooking-
flute.md) prevents the silent-fact-registry-quality-degradation failure
mode: the flag stays OFF by default so production fact-validator behavior
is byte-identical to before this commit. Operator manually flips flag
true in staging, runs 3-5 fresh sessions, diffs the resulting fact-
registry.md against a frozen pre-flag baseline, and only after sign-off
flips true in prod.
CHANGES
src/config/featureFlags.js — add FACT_VALIDATOR_EMIT_ENTITIES_JSON flag
(default false). Comment explains it's PR1 of the dynamic-entities
sidecar work and points to the plan.
src/schemas/entitiesJson.js (NEW, ~110 LoC) — Zod schema for the
entities.json contract. Defines: schema_version, session_key,
generated_at, source_reports_analyzed, entities array (.max(50) — the
primary Phase 9 cardinality safeguard per M4 mitigation). Each entity
has canonical_name, entity_type (bounded enum: target/acquirer/
co_investor/portfolio_company/regulator/key_person/counterparty/
underwriter/other), role (free-form), variations, match_patterns
(≥1, ≤10 — what Phase 6 will escapeRegex + word-boundary-match against
report content), source_refs (provenance: report_key + mention_count),
confidence. Exports parseEntitiesJson (throws) and safeParseEntitiesJson
(returns null) for the consumer's defensive degradation path.
src/config/legalSubagents/agents/fact-validator.js — additive prompt
extension and outputFiles registration. Adds 'entities.json' to
outputFiles, adds PHASE_5.3 to compaction-recovery checklist, appends
ENTITIES.JSON SIDECAR section (~70 LoC) after the existing §II.C Entity
Names table specification. Critical rules captured in prompt:
- WHEN TO EMIT: only when FACT_VALIDATOR_EMIT_ENTITIES_JSON=true
(when false, behavior is byte-identical to pre-PR1)
- HARD CAP: 50 entities/session (with top-N-by-mention truncation
guidance if >50 candidates)
- SCHEMA: direct serialization of the already-canonicalized §II.C
table — NO re-extraction from source reports (zero new LLM cost)
- match_patterns rules: plain strings only (no regex chars), ≥3 char
minimum (avoids "Switch" false-matching "switchover"), include
canonical_name + 1-2 distinguishing tokens
- MISSING ENTITIES: emit entities: [] (file presence is the signal)
RETURN FORMAT JSON now includes entities_emitted (integer or null when
sidecar skipped).
src/utils/artifactPersistence.js — extend persistSessionArtifacts to
scan for review-outputs/entities.json and persist as a report_artifacts
row with mime='application/json', category='sidecar', source=
'fact_validator'. Filesystem-only persistence would be lost on container
roll (GCE MIG auto-heal); report_artifacts ensures survival across any
container event AND auto-surfaces in existing /api/audit-report endpoint
+ client-audit-export regulator handoff bundles (their SELECT * queries
include the new row automatically). ENOENT skip is the dominant case
during PR1 (flag default false), warned only on non-ENOENT errors.
test/fixtures/entities-spacex.json (NEW) — canonical example with 10
representative entities for the SpaceX-IPO domain (SpaceX, Musk, NASA,
FAA, FCC, SEC, CFIUS, QIA, Space Force, Morgan Stanley) — covers 6 of
9 entity_type enum values + the regulatory-heavy content profile that
exposed the original bug.
test/sdk/entities-json-schema.test.js (NEW, ~230 LoC, 26 tests) — Zod
schema validation across 7 groups: constants, fixture round-trip,
happy-path parsing with defaults, schema violations (8 rejection
cases), hard 50-cap enforcement, safeParse graceful degradation,
empty-entities legitimate case. All passing.
ZERO DATABASE SCHEMA CHANGES
No new tables, no new columns, no migrations. entities.json persists via
existing report_artifacts table with a new MIME row value (no schema
change — the mime_type column already accepts arbitrary strings).
Existing /api/audit-report + client-audit-export consumers automatically
include the new artifact via existing SELECT * patterns.
VERIFICATION
$ NODE_OPTIONS=--experimental-vm-modules jest \
test/sdk/entities-json-schema.test.js
Tests: 26 passed, 26 total
$ node -e "import('./src/config/featureFlags.js').then(m => \
console.log(m.featureFlags.FACT_VALIDATOR_EMIT_ENTITIES_JSON))"
false ← default OFF, ready to ship safely
ROLLOUT NEXT STEPS
1. Deploy PR1 to staging — fact-validator behavior identical (flag OFF)
2. Operator sets FACT_VALIDATOR_EMIT_ENTITIES_JSON=true in staging env,
restarts container, runs 3-5 fresh sessions via frontend
3. Operator inspects produced entities.json files via:
curl http://STAGING/api/session/<key>/audit-report | jq '.artifacts[]
| select(.mime_type=="application/json")'
4. Operator diffs the new fact-registry.md against a frozen pre-flag
baseline of similar session shape — looks for quality regression
signals (fewer facts, missing conflicts, broken section mapping)
5. On clean staging soak, flip flag true in prod env. Sidecar starts
producing but no consumer until PR2 ships.
PR2 (consumer) waits for above. Per plan, PR2 will add KG Phase 6
loadEntitiesJson + tier-3 fallback (LEGACY_DIGITALBRIDGE_FALLBACK kept
for zero-break on existing sessions) + Prometheus gauge
claude_kg_phase6_entity_count + E2E validation gate (rebuild SpaceX-IPO
session, expect 632→~1000 nodes + 267→~1500 edges).
@see /Users/ej/.claude/plans/floating-cooking-flute.md
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…on (PR1.1) Post-PR1 (aa1dbdf) three-agent review surfaced one CRITICAL gap: FACT_VALIDATOR_EMIT_ENTITIES_JSON flag was non-functional. The prompt instructed the agent to "Check this via the process.env inspection guidance the orchestrator provides" — but Sonnet runs LLM-side, cannot read process.env, and no upstream orchestrator code injected the flag value into the system prompt. Result: silent failure mode where Sonnet might emit unconditionally OR be confused and skip emission. ROOT CAUSE The flag was wired in featureFlags.js correctly but the gating decision was deferred to runtime (Sonnet reading env) instead of build-time (JavaScript constructing the prompt). Sonnet only ever sees the static template literal that was evaluated at module load. FIX — build-time conditional prompt construction Standard pattern in this codebase (verified via grep across src/config/legalSubagents/agents/ — equity-analyst, financial-analyst, cybersecurity-compliance-analyst, government-contracts-researcher all import featureFlags and use it for tool wiring + prompt interpolation): import { featureFlags } from '../../featureFlags.js'; const ENTITIES_JSON_SIDECAR_BLOCK = featureFlags.FACT_VALIDATOR_EMIT_ENTITIES_JSON ? `## ENTITIES.JSON SIDECAR (MANDATORY — flag enabled) ...` : ''; export const def = { ..., prompt: `... ## Entity Names ...table... ${ENTITIES_JSON_SIDECAR_BLOCK} ## Assumption Status ... `, }; When the flag is FALSE (production default), ENTITIES_JSON_SIDECAR_BLOCK is the empty string and the entire ENTITIES.JSON SIDECAR section literally does not exist in the prompt Sonnet sees. The agent physically cannot emit a file it has no instructions for. Three prompt sections are gated identically: the section block itself, PHASE_5.3 checklist line, and the entities_emitted field in RETURN FORMAT JSON. When TRUE (operator opt-in for staging soak), all three sections materialize — agent gets unambiguous "MANDATORY — flag enabled" instruction with no env-var-inspection ambiguity. VERIFICATION New test file: test/sdk/fact-validator-entities-flag.test.js (7 tests). Runs in BOTH flag states: - flag=false (default): prompt has zero entities.json references (sweep tests: no "entities.json", no "match_patterns", no "entity_type", no "HARD CAP 50" — comprehensive content scan) - flag=true: prompt contains "MANDATORY — flag enabled", contains none of the broken "process.env" / "check env var" patterns - prompt length materially differs (~21K off, ~24K on) - outputFiles still includes 'entities.json' in both states (metadata is gate-independent; artifactPersistence handles ENOENT gracefully) Test execution proof: $ npx jest test/sdk/fact-validator-entities-flag.test.js Tests: 7 passed, 7 total ← flag default OFF $ FACT_VALIDATOR_EMIT_ENTITIES_JSON=true npx jest \ test/sdk/fact-validator-entities-flag.test.js Tests: 7 passed, 7 total ← flag ON Combined PR1 + hotfix test suite: 65/65 pass across 3 files (entities-json-schema 26, fact-validator-entities-flag 7, prompt-enhancer-catalog 32). No regressions. MONOLITHIC FILE SYNC (concern #3 from review) Updated legalSubagents.js:6369 (monolithic) outputFiles to include 'entities.json' for consistency. Added explicit comment that the monolithic file is NOT production (MODULAR_SUBAGENTS=true default) and its prompt is static — entities.json work happens through the modular file only. Prevents future-developer confusion when grepping for fact-validator definitions. REVIEWER CONCERNS 1. ❌→✅ Flag gating: FIXED via build-time conditional construction 2. ✅ outputFiles enforcement: confirmed metadata-only; ENOENT-safe (no change needed) 3. ✅ Monolithic divergence: synced for housekeeping + documented status ZERO downstream impact when flag is off (default). End-to-end behavior identical to pre-aa1dbdfe (pre-PR1) when flag is off — PR1 ships dead- on-arrival as planned. @see /Users/ej/.claude/plans/floating-cooking-flute.md (M1 mitigation) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ag — entities.json is essential infrastructure Same architectural correction applied earlier to the dynamic prompt-enhancer catalog (commit b654a85). The flag was risk-driven (M1 mitigation against fact-registry.md quality regression) but mischaracterized the change as optional. Reality: - KG Phase 6's hardcoded entity list IS broken for any non-DigitalBridge session. entities.json is the fix, not an experiment. - Recovery time is identical: "flip flag + restart" (~5 min) vs "revert + redeploy" (~10 min) — marginal - The hotfix (PR1.1, f821108) already isolated the prompt change cleanly (separate constant, appended after the existing Entity Names table — natural extension, not a new mental model) - The agent is doing pure serialization of work it already did — zero new inference, zero new reasoning load - Bug fixes don't get feature flags; they get reverts if they break CHANGES src/config/featureFlags.js — remove FACT_VALIDATOR_EMIT_ENTITIES_JSON declaration (-12 LoC) src/config/legalSubagents/agents/fact-validator.js — drop conditional prompt construction. ENTITIES.JSON SIDECAR section now unconditionally inlined after the Entity Names table. PHASE_5.3 checklist line and entities_emitted RETURN field also unconditional. Docstring updated to call entities.json "essential infrastructure, not optional capability." (-30/+90 LoC net — simpler code overall) src/config/legalSubagents.js — update outyfile NOTE comment to reflect that modular prompt now unconditionally emits entities.json instructions (was: "flag-gated"; now: simple statement) test/sdk/fact-validator-entities-flag.test.js → fact-validator-entities.test.js Rename + simplify. Drop the parametrized flag-state tests; replace with unconditional contract tests: - outputFiles includes entities.json - prompt has ENTITIES.JSON SIDECAR section + 5.3 + entities_emitted - prompt has hard 50-cap + match_patterns rules + 9-enum - prompt instructs empty-entities still emit (file presence signal) - regression guard: prompt has NO process.env / "check env var" / "FACT_VALIDATOR_EMIT_ENTITIES_JSON" references - agent has Write tool 11 tests, all passing. PRESERVED - Zod schema (src/schemas/entitiesJson.js) — unchanged, still validates consumer input - Persistence path (src/utils/artifactPersistence.js) — unchanged, still ENOENT-safe (fact-validator may legitimately produce empty entities array on edge sessions, file always written) - All 26 entities-json-schema tests + 32 prompt-enhancer-catalog tests continue to pass ROLLOUT SIMPLIFICATION Previous plan: "deploy PR1 → operator flips flag in staging → 3-5 fresh sessions diff → flip flag in prod → wait → ship PR2" New plan: "deploy PR1 → entities.json starts emitting on next session immediately → ship PR2 (consumer) → KG benefit lands automatically" The staging soak (M1) is no longer enforced by the code. Operator can still run a staging session manually before prod deploy to spot-check fact-registry.md quality, but it's no longer a flag-flip gate. If quality regression is observed: revert + redeploy (~10 min recovery). NO downstream impact on PR2 — the consumer (KG Phase 6) still doesn't read entities.json. PR1 + this flag removal ships an essential producer that's ready to be wired the moment PR2 lands. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the entities-sidecar work started in PR1 (aa1dbdf). PR1 shipped the producer (fact-validator emits entities.json to report_artifacts); PR2 ships the consumer (KG Phase 6 reads entities.json instead of using the hardcoded 9-entity DigitalBridge list). With both PRs deployed: - Every new session writes entities.json via fact-validator (PR1) - KG build (initial or rebuild via /api/admin/.../rebuild-kg) reads entities.json from report_artifacts, creates one entity node per canonical_name, and Phase 9 cross-link cardinality recovers automatically (Phase 9 was always cardinality-driven; the bug was Phase 6 starving it of entity anchors) For non-DigitalBridge sessions (SpaceX-IPO, future IB/PE memos), this restores Phase 9 edge density from the 0.42 edges/node observed in the 2026-05-16 SpaceX session toward the 1.90 baseline. Phase 9 itself is unchanged. ARCHITECTURE — two-tier fallback (no markdown-parser tier 2) The original plan included a tier-2 lazy backfill that would parse the fact-registry.md §II.C "Entity Names" table in-memory when entities.json was missing. This was dropped because it carried the same PR #130 certificateParser.mjs failure class (markdown format drift = silent fleet-wide data loss). Backfill of old sessions is now an explicit deferred operator concern, not an automatic path. Two tiers only: 1. entities.json from report_artifacts (PRIMARY for new sessions post-PR1+PR2). DB-backed, survives container rolls. 2. LEGACY_DIGITALBRIDGE_FALLBACK (preserves pre-PR2 behavior on old sessions). Renamed from the hardcoded entityPatterns array; same 9 DigitalBridge entries. CARDINALITY SAFEGUARD (M4 mitigation) PHASE6_ENTITY_CAP=50 + runtime guard in resolvePhase6Entities truncates oversized entities.json arrays. The Zod schema in src/schemas/ entitiesJson.js also caps entities.max(50) at the sidecar boundary — the runtime guard is defense-in-depth in case a future schema bump raises the Zod cap without raising the resolver cap. Both layers must stay in sync (test documents this invariant). OBSERVABILITY New Prometheus gauge: claude_kg_phase6_entity_count{source="entities_json"|"legacy_hardcoded"} Surfaces three operator signals per KG build: - source=entities_json → fact-validator sidecar consumed (post-PR1 sessions) - source=legacy_hardcoded → fell back to old DigitalBridge list (old session OR malformed entities.json — search Cloud Logging for "entities.json present but failed Zod validation" to disambiguate) - count > 50 → cardinality guard truncated; investigate fact-validator over-extraction Recommended alert (operator runbook): claude_kg_phase6_entity_count > 75 sustained 15m. Truncation events are NOT a separate Gauge series (would persist across rebuilds and violate "current state" semantics) — they surface via the resolvePhase6Entities console.warn log instead. FILES src/utils/knowledgeGraph/kgHelpers.js — add getEntitiesForSession(pool, sessionId): queries report_artifacts WHERE mime_type='application/json' AND file_name='entities.json'; converts BYTEA to UTF-8; safeParse via Zod; returns parsed entities array or null. Dynamic import of the schema module defers ~50ms cost on misses (the common pre-PR1 case). Catches DB errors with logWarn + null return. Caller MUST treat null as "use fallback" — never throws. src/utils/knowledgeGraph/kgPhases6to8.js — replace inline entityPatterns loop with resolvePhase6Entities() resolver. Add LEGACY_DIGITALBRIDGE_ FALLBACK constant (renamed + 1 LoC expansion adding match_patterns field for consistent shape with entities.json), PHASE6_ENTITY_CAP=50, escapeRegex helper. New entity-node properties: entity_type, variations, source_refs, confidence_tier, extraction_source ('entities_json' | 'legacy_hardcoded'). Phase 9 reads only entity.label + entity.properties. full_text|context — verified safe (existing reader doesn't touch new fields). Confidence mapping: HIGH→1.0, LOW→0.6, MEDIUM→0.85 default. Exported resolvePhase6Entities + constants for tests only. src/utils/sdkMetrics.js — register claude_kg_phase6_entity_count Gauge with source label + setKgPhase6EntityCount(source, count) setter. Help text guides operators to the >75 alert threshold + cites the Cloud Logging search for truncation event audit. test/sdk/kg-phase6-entities.test.js (NEW, ~270 LoC, 14 tests): - Group 1 (3 tests): tier-1 happy path — SQL query shape, returns parsed entities, preserves match_patterns - Group 2 (5 tests): tier-1 graceful failures — missing artifact, DB throws, invalid Zod schema, malformed JSON bytes, null file_data - Group 3 (5 tests): resolvePhase6Entities two-tier fallback — tier 1, tier 2 missing, tier 2 malformed, exact 50-cap, defense-in-depth documentation - Group 4 (1 test): SpaceX fixture round-trip end-to-end (10+ entities, canonical names verified) All 14 PR2 tests pass. Combined PR1 + PR2 + adjacent suite: 111/111 passing across 5 test files. No regressions. EXPECTED IMPACT (validation gate) Re-run KG build against SpaceX-IPO session 2026-05-16-1778951162 after deploy: - Phase 6 entity count: 0 → 10+ (fact-validator over SpaceX content surfaces SpaceX, Musk, NASA, FAA, FCC, CFIUS, NRO, Space Force, Morgan Stanley, comparable companies) - Phase 9 edge count: 267 → ~1,500+ (cardinality recovery from real entity anchors) - Overall KG node count: 632 → ~900-1,100 (back in March 31 baseline range) - New gauge reads claude_kg_phase6_entity_count{source="entities_json"} = ~10-15 (well under 50-cap) NOTE: SpaceX session was completed BEFORE PR1 deployed, so it has no entities.json artifact. Rebuild on that session will fall back to LEGACY tier and produce same 632/267 numbers. Validation requires a NEW IB/PE/IPO session run AFTER both PRs deploy + then rebuild on that new session. ROLLBACK Revert this commit (PR2) → Phase 6 reverts to using LEGACY hardcoded list for all sessions; PR1's entities.json artifacts continue to be written but go unread. No data loss. ~10 min recovery. @see /Users/ej/.claude/plans/floating-cooking-flute.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Documents the two-PR chain (fact-validator producer + KG Phase 6 consumer) that closes the systemic Phase 6 hardcoded-entity bug. Root cause: 9 hardcoded DigitalBridge names at kgPhases6to8.js:73-83 → ~0 entity nodes for any non-DigitalBridge memo → Phase 9 cardinality collapse from 1.90 to 0.42 edges/node. SpaceX-IPO session was the canary (632/267 vs baseline 1083/2062). Service CHANGELOG is the canonical detailed version (~155 LoC); root CHANGELOG is the concise leadership-readable version (~33 LoC) with pointer to service for full technical detail. Both entries cover: problem statement, PR1 producer mechanics (Zod schema + prompt extension + report_artifacts persistence), iteration history (flag added then removed mid-release per essential-infrastructure principle), PR2 consumer mechanics (getEntitiesForSession helper + resolvePhase6Entities two-tier fallback + cardinality guard + Prometheus gauge), architectural decisions (no flags, no markdown-parser tier 2, zero DB schema changes), expected post-deploy impact, rollback path, risk score. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced May 17, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #146.
Summary
aa1dbdfe→f8211089→187f65ed):fact-validatoremits per-sessionentities.jsonsidecar — Zod-schema-validated, persisted toreport_artifacts(DB-backed, survives container rolls). Replaces zero-cost serialization of the already-canonicalized Entity Names table.0dbde2d0): KG Phase 6 readsentities.jsonvia newgetEntitiesForSession()resolver. Two-tier fallback (entities.json → LEGACY hardcoded) preserves pre-PR2 behavior on old sessions. New Prometheus gaugeclaude_kg_phase6_entity_count{source}.Architectural decisions
certificateParser.mjsfailure class)report_artifactstable with new MIME row valueTest plan
claude_kg_phase6_entity_count{source="entities_json"} ≈ 10-15(gauge fires)source="legacy_hardcoded"and SAME 632/267 (regression guard)Files
Runtime (deploys in container):
src/schemas/entitiesJson.js(NEW)src/config/legalSubagents/agents/fact-validator.jssrc/config/legalSubagents.js(housekeeping for monolithic dead-code path)src/utils/artifactPersistence.jssrc/utils/knowledgeGraph/kgHelpers.jssrc/utils/knowledgeGraph/kgPhases6to8.jssrc/utils/sdkMetrics.jsTests:
test/sdk/entities-json-schema.test.js(NEW, 26 tests)test/sdk/fact-validator-entities.test.js(NEW, 11 tests)test/sdk/kg-phase6-entities.test.js(NEW, 14 tests)test/fixtures/entities-spacex.json(NEW)Docs:
CHANGELOG.md(root, v6.11.0 entry)super-legal-mcp-refactored/CHANGELOG.md(canonical detailed entry)Risk
3/10. Additive code + graceful two-tier fallback. Zero schema migration, zero feature-flag flip required. Phase 9 unchanged. Backward-compat preserved for all pre-PR1 sessions.
Rollback
0dbde2d0) alone → Phase 6 reverts to LEGACY; entities.json continues writing but unread. ~10 min.🤖 Generated with Claude Code