Skip to content

v6.11.0: Dynamic KG entity extraction via fact-validator entities.json sidecar#147

Merged
Number531 merged 11 commits into
mainfrom
docs/excel-workbook-issues-spec
May 16, 2026
Merged

v6.11.0: Dynamic KG entity extraction via fact-validator entities.json sidecar#147
Number531 merged 11 commits into
mainfrom
docs/excel-workbook-issues-spec

Conversation

@Number531

Copy link
Copy Markdown
Owner

Closes #146.

Summary

  • PR1 producer (aa1dbdfef8211089187f65ed): fact-validator emits per-session entities.json sidecar — Zod-schema-validated, persisted to report_artifacts (DB-backed, survives container rolls). Replaces zero-cost serialization of the already-canonicalized Entity Names table.
  • PR2 consumer (0dbde2d0): KG Phase 6 reads entities.json via new getEntitiesForSession() resolver. Two-tier fallback (entities.json → LEGACY hardcoded) preserves pre-PR2 behavior on old sessions. New Prometheus gauge claude_kg_phase6_entity_count{source}.
  • Closes the systemic bug exposed by SpaceX-IPO session (632/267 vs baseline 1083/2062) where hardcoded 9-entity DigitalBridge list starved Phase 9 cross-link cardinality for any non-DigitalBridge memo.

Architectural decisions

Test plan

  • 111/111 tests passing across 5 test files (kg-phase6-entities + fact-validator-entities + entities-json-schema + prompt-enhancer-catalog + domain-mcp-servers)
  • Schema round-trip test on production-shaped SpaceX fixture (10 entities, 6 entity_type values)
  • Tier-1 happy path + 5 graceful-failure modes (DB error, missing artifact, malformed JSON, Zod rejection, null file_data)
  • Two-tier fallback (tier 1 success, tier 2 hardcoded LEGACY, malformed → LEGACY)
  • Cardinality cap defense-in-depth (Zod max(50) + runtime PHASE6_ENTITY_CAP=50)
  • Post-deploy: fresh IB/PE session shows claude_kg_phase6_entity_count{source="entities_json"} ≈ 10-15 (gauge fires)
  • Post-deploy: rebuild of pre-PR1 SpaceX-IPO session shows source="legacy_hardcoded" and SAME 632/267 (regression guard)
  • Post-deploy: NEW SpaceX-IPO-class session run shows Phase 9 edge recovery

Files

Runtime (deploys in container):

  • src/schemas/entitiesJson.js (NEW)
  • src/config/legalSubagents/agents/fact-validator.js
  • src/config/legalSubagents.js (housekeeping for monolithic dead-code path)
  • src/utils/artifactPersistence.js
  • src/utils/knowledgeGraph/kgHelpers.js
  • src/utils/knowledgeGraph/kgPhases6to8.js
  • src/utils/sdkMetrics.js

Tests:

  • test/sdk/entities-json-schema.test.js (NEW, 26 tests)
  • test/sdk/fact-validator-entities.test.js (NEW, 11 tests)
  • test/sdk/kg-phase6-entities.test.js (NEW, 14 tests)
  • test/fixtures/entities-spacex.json (NEW)

Docs:

  • CHANGELOG.md (root, v6.11.0 entry)
  • super-legal-mcp-refactored/CHANGELOG.md (canonical detailed entry)

Risk

3/10. Additive code + graceful two-tier fallback. Zero schema migration, zero feature-flag flip required. Phase 9 unchanged. Backward-compat preserved for all pre-PR1 sessions.

Rollback

  • Revert PR2 commit (0dbde2d0) alone → Phase 6 reverts to LEGACY; entities.json continues writing but unread. ~10 min.
  • Revert PR1 + PR2 together → full pre-v6.11.0 state. ~15 min.

🤖 Generated with Claude Code

Number531 and others added 11 commits May 16, 2026 17:57
Closes the systemic gap observed in SpaceX-IPO session 2026-05-16-1778951162
where the prompt enhancer pre-computed comparable trading multiples (live FMP
work — equity-analyst's job) using static web estimates, removing the
orchestrator's incentive to invoke equity-analyst. Result: 0 FMP tool calls,
0 equity-analyst invocations in a 3h 9min, 43-report IPO due-diligence memo.
36 FMP tools + 11 code-execution models (M46–M58) shipped in v7.0.0 sat
unused. financial-analyst was invoked instead but has no FMP domain access,
so it relied on web search for comparables.

Same gap silently fails for every specialist added after promptEnhancer.js
was last touched — the enhancer pre-dates equity-analyst.

This PR composes a live capability catalog (45 subagents + 34 MCP domains +
feature-flag-gated availability) from existing introspection sources and
injects it into the Haiku enhancer system prompt with a behavioral directive
to route specialist deliverables instead of pre-answering them.

ARCHITECTURE — pure composition layer, zero new data

  src/config/promptEnhancerCatalog.js (NEW, 280 LoC)
    - buildEnhancerCatalog(flags) → markdown for Haiku injection
    - buildCatalogJSON(flags)     → structured output for future /api/catalog
                                     refactor (single source of truth)
    - CATALOG_VERSION constant for schema evolution
    - ROUTING_DIRECTIVE constant — exported separately so iterations are
      versionable + testable
    - Internal _-prefixed helpers (matches codebase convention)
    - Defensive degradation: missing AGENT_DISPLAY_META → fallback to
      def.description + logWarn('catalog_agent_meta_missing'); missing
      MUST BE USED block → empty triggers + warn (only when description
      has "Use PROACTIVELY for:" signaling triggers were expected); flag
      OFF → empty string + enhancer falls back to pre-PR behavior

  Sources reused (no duplication, all read-only):
    - LEGAL_SUBAGENTS              registry enumeration (45 agents)
    - AGENT_DISPLAY_META           hand-curated role/expertise/dealContext
                                    (231 lines, 44 of 45 agents covered)
    - SUBAGENT_DOMAIN_MAP          per-agent domain list (boot-frozen with
                                    feature-flag evaluation already done)
    - DOMAIN_GROUPS + getDomainToolCounts()  domain → tool count
    - DOMAIN_DISPLAY_META          domain capability descriptions
    - agent.description            MUST BE USED trigger extraction +
                                    AGENT_DISPLAY_META fallback

ENHANCER WIRE-IN (5 LoC at promptEnhancer.js:122-144)

  Import buildEnhancerCatalog + setPromptEnhancerCatalogChars. Build catalog
  per-call when PROMPT_ENHANCER_DYNAMIC_CATALOG=true, inject between
  intakePrompt and MANDATORY OUTPUT FORMAT section. Set Prometheus gauge
  with catalog length each call. Zero change to approval flow, output
  format, state file shape, or downstream consumers.

OBSERVABILITY

  Prometheus gauge claude_prompt_enhancer_catalog_chars added to
  sdkMetrics.js. Production validation: gauge should read ~54000 chars when
  the dynamic catalog is engaged (45 agents × ~1 KB + 9 KB routing directive).
  0 = feature disabled or builder short-circuited.

FEATURE FLAG #43 (default ON)

  PROMPT_ENHANCER_DYNAMIC_CATALOG (featureFlags.js + docs/feature-flags.md
  bumped 4.3 → 4.4, total flags 41 → 42). Emergency rollback: set false +
  restart, enhancer reverts to pre-PR behavior. No data migration.

DYNAMISM CONTRACT

  Adding a new subagent (file under legalSubagents/agents/ + entry in
  legalSubagents/index.js + AGENT_DISPLAY_META) causes the agent to appear
  in the catalog on the next enhancement call with ZERO changes to the
  builder or enhancer. The auto-discovery test (test group 4) enforces this
  by asserting the rendered agent count exactly matches LEGAL_SUBAGENTS
  registry size.

TESTS (29 tests, 2 snapshots — all passing)

  test/sdk/prompt-enhancer-catalog.test.js
    Group 1: shape + content invariants (9 tests)
    Group 2: trigger extraction via real agent descriptions (4 tests)
    Group 3: snapshot stability — header + routing directive (2 tests)
    Group 4: auto-discovery contract (3 tests) — proves dynamism
    Group 5: defensive degradation (3 tests)
    Group 6: idempotence — pure function contract (2 tests)
    Group 7: buildCatalogJSON structured output (4 tests)
    Group 8: integration with live featureFlags object (2 tests)

  Adjacent test/sdk/domain-mcp-servers.test.js still passes 28/28 (no
  regression in upstream introspection helpers).

SMOKE TEST (live registry, this branch)

  buildEnhancerCatalog produces 53,932 chars. Contains all 45 registered
  agents, 34 MCP domains, full routing directive with SpaceX-IPO illustrative
  example, FMP_ENABLED flag header line. equity-analyst entry includes its
  hand-curated expertise text citing 36 FMP tools + 11 code-execution models.

OUT OF SCOPE (separate work)
  - Orchestrator-level dispatch prompt vocabulary tweaks (deferred pending
    empirical validation that the enhancer fix alone is sufficient)
  - /api/catalog refactor to consume buildCatalogJSON (same pattern, no
    customer-visible change)
  - Same catalog injection into P0 orchestrator + citation-verifier dispatch
  - Skill template updates (subagent-scaffold, api-integration,
    feature-compliance-scaffold D11) — covered in PR2

@see plans/floating-cooking-flute.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…(PR2)

Companion to bb36077 (PR1, dynamic subagent catalog injection). PR1's
dynamism guarantee holds at runtime — adding a new entry to LEGAL_SUBAGENTS
+ AGENT_DISPLAY_META causes the agent to appear in the next enhancement
catalog with zero code change to the builder. But the guarantee only
materializes if NEW subagents/domains actually GET their catalog metadata
populated when created. PR2 tightens three skill templates so the catalog
inputs are populated by default at scaffold-time + audited pre-merge.

THREE SKILL UPDATES

1. subagent-scaffold (scripts/wire-registries.py + SKILL.md)

   Previous behavior: emitted agentDisplayMeta entry with `label`/`icon`/
   `color` fields — but the actual agentDisplayMeta.js schema is
   `role`/`expertise`/`dealContext` (44 of 45 existing agents follow this).
   The scaffold had been emitting the wrong shape since inception.

   Now:
   - Emits complete AGENT_DISPLAY_META entry with role + expertise +
     dealContext, with explicit TODO scaffolding guiding the operator to
     write a ≥100-char capability paragraph citing data sources,
     deliverables, code-execution models, and feature-flag gating.
   - Adds explicit step #8 (separate from #7 agentClassifications.js) so
     the requirement is visible in the printed checklist.
   - Adds step #9 reminding operator to verify the agent's description
     includes a `MUST BE USED when user mentions:` block — required for
     the catalog's trigger extraction + the orchestrator's keyword routing.
   - SKILL.md frontmatter "7 mandatory wiring files" → "8 mandatory wiring
     files"; description explicitly cites the feature flag #43 dependency
     and the canonical equity-analyst pattern.

2. api-integration (SKILL.md)

   Previous: §6.1 mentioned `domainDisplayMeta.js` as a post-merge frontend
   update with no urgency framing — operator could ship a new MCP domain
   without an entry and only the frontend would silently render an empty
   description card.

   Now: §6.1 explicitly identifies DOMAIN_DISPLAY_META as a dual-consumer
   surface (frontend /api/catalog AND prompt-enhancer dynamic catalog),
   cites the WARNING that D11-catalog dimension will flag pre-merge if
   missing, and points to canonical patterns (sec, fred, equities) for the
   25-50 word capability description. Description requirement bumped from
   "25-30 words" to "25-50 words" matching observed existing entries.

3. feature-compliance-scaffold (scripts/dimensions/D11-catalog.py NEW +
   SKILL.md updated D1-D10 → D1-D11)

   New WARNING-severity dimension. Three sub-checks:
     D11.1 — AGENT_DISPLAY_META[name].expertise present + ≥100 chars
     D11.2 — agent description includes "MUST BE USED when:" block
             (only checked when "Use PROACTIVELY for:" present — synthesis/
             QA agents legitimately omit both)
     D11.3 — DOMAIN_DISPLAY_META[domain] entry present for new domains

   Auto-discovered by check.sh dimension glob (ls D*.py) — no manifest
   edit required. Tolerates `domains`, `mcp_domains`, or `domain_groups`
   key in symbols.json for upstream extract-feature-symbols.py
   flexibility.

   Smoke-tested against synthetic symbols json with mixed pass/fail
   cases — correctly detected:
     ✓ equity-analyst has complete metadata (no warning)
     ✓ intake-research-analyst lacks AGENT_DISPLAY_META (real gap, flagged)
     ✓ fake-domain-not-registered missing from DOMAIN_DISPLAY_META

   Existing 3 fixture tests still pass (no regression on D1-D10).

END-TO-END DYNAMISM CONTRACT (after PR1 + PR2)

  Scaffold-time  →  subagent-scaffold + api-integration skills emit
                    complete AGENT_DISPLAY_META + DOMAIN_DISPLAY_META as
                    REQUIRED steps; operator can't easily skip them
                    (printed checklist + WARNING messaging)

  Pre-merge      →  feature-compliance-scaffold D11-catalog dimension
                    surfaces any gaps before PR opens

  Runtime        →  buildEnhancerCatalog reads live registry + display-meta
                    + flags; new entries appear automatically on next
                    enhancement call (PR1's contract, enforced by Test
                    Group 4 auto-discovery test)

  Defensive      →  if all three upstream layers miss, builder still
                    includes the agent (fallback description) + emits
                    logWarn('catalog_agent_meta_missing') — gap surfaces
                    operationally rather than silently dropping the agent

OUT OF SCOPE (separate work)
- Backfill AGENT_DISPLAY_META entry for intake-research-analyst (the 1 of
  45 currently missing) — operator decision whether to add
- Migrating the old `label`/`icon`/`color` schema that the scaffold used
  to emit — never landed in production (44 of 45 existing entries already
  use role/expertise/dealContext correctly)

@see plans/floating-cooking-flute.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…udit (PR3)

Three independent background Explore agents reviewed commits bb36077 (PR1
runtime catalog injection) + bc5c80b (PR2 skill template loop). Surfaced
one SHOWSTOPPER + two real issues.

1. SHOWSTOPPER — enhancedPrompt was discarded by the orchestrator

agentStreamHandler.js captured the enhancer's output at L240 but never
forwarded it to agentQuery(). The orchestrator at L281 received
ctx.currentPrompt (initialized in streamContext.js:58 as ctx.userQuery,
the RAW user query) — so every [ROUTE TO <agent>: ...] tag Haiku produced
under the dynamic catalog directive was thrown away. PR1 was 100%
decorative at runtime: built a beautiful catalog, injected it into Haiku's
system prompt, captured Haiku's routing-tagged enhancement, persisted it
to disk + SSE for the frontend, and then ran the orchestrator on the
original query anyway.

The original SpaceX-IPO failure mode (orchestrator never invoking
equity-analyst) would have repeated identically post-PR1.

FIX: agentStreamHandler.js L240-243 — when runPromptEnhancementPhase
returns a non-null enhanced prompt, assign ctx.currentPrompt =
enhancedPrompt. ctx.userQuery preserved unchanged for downstream consumers
that need the original (analytics, audit, etc.). Pattern is consistent
with L518 where ctx.currentPrompt is already mutated mid-stream for
AUTO_CONTINUATION.

Verification: re-run SpaceX-IPO-class prompt post-deploy. Expect to see
in hook_audit_log: (a) subagent_start for equity-analyst, (b) pre_tool_use
for mcp__equities__*, (c) enhanced prompt at reports/<session>/
enhanced-prompt.md contains [ROUTE TO equity-analyst] tags.

2. buildCatalogJSON() violated its own pure-function contract

new Date().toISOString() embedded in the JSON output meant two consecutive
calls with identical flags produced different output. The function header
explicitly claimed "Same flags input → byte-identical output." Test group
6 idempotence assertion would have masked it (two same-millisecond calls
collide).

FIX: promptEnhancerCatalog.js — generated_at is now caller-supplied via
optional `{ generatedAt }` option (defaults to null). Pure-function
contract preserved by default; production callers who need a timestamp
pass it explicitly. Three new tests: generated_at default is null,
caller-supplied timestamp is honored, idempotence guarantee across
JSON.stringify of two consecutive calls.

3. claude_prompt_enhancer_catalog_chars gauge semantic was ambiguous

Help text didn't clarify whether the gauge measures catalog-build success
or enhancement-pipeline success. The gauge fires at runPromptEnhancementPhase
entry BEFORE the Haiku API call — so a downstream API failure leaves the
gauge reading "catalog injected" while the enhancement itself failed.

After deliberation: this is the CORRECT semantic. Build/inject and
end-to-end success are two different things and should be observable
separately. The fix is documentation: gauge help text now explicitly
says "NOT a success signal" and points operators to the
prompt_enhancement_status SSE event + hook_audit_log AgentProgress entry
for end-to-end success rates.

4. D11-catalog.py domain-detection sub-check was dead code

D11.3 (verify DOMAIN_DISPLAY_META[domain] exists for new domains)
expected symbols.domains/mcp_domains/domain_groups keys, but
extract-feature-symbols.py never populated any of these — empty_symbols()
omitted the key entirely and extract_from_diff() had no domain-detection
regex.

FIX: extract-feature-symbols.py — empty_symbols() now includes a
"domains" key. New NEW_DOMAIN_RE regex matches additions to DOMAIN_GROUPS
in domainMcpServers.js, including feature-flag-gated patterns like
`...(featureFlags.FMP_ENABLED ? { 'equities': equitiesTools } : {})`.
Hand-tested against 5 sample diff lines (positive + negative cases).
D11.3 now actually fires on PRs that add MCP domains.

Tests: 32/32 prompt-enhancer-catalog (3 new for idempotence) + 3/3
feature-compliance-scaffold fixtures still passing.

Honest limit acknowledged: the SHOWSTOPPER fix has not been live-tested.
Will be verified empirically by the next /deploy + post-deploy IB-class
prompt — operator should look for equity-analyst in hook_audit_log
subagent_start events.

@see plans/floating-cooking-flute.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…coped before/after diff (Option C)

PR3 introduced NEW_DOMAIN_RE to detect MCP domain additions for D11.3
(catalog domain-metadata check). That regex required the value to match
the `<thing>Tools` naming convention — brittle against:
  - 'name': SOME_TOOLS_CONST            (uppercase constant)
  - 'name': [...toolsA, ...toolsB]      (array spread)
  - 'name': require('./tools').default  (CJS import)
  - any future variant a contributor might write

Replaced with a block-scoped before/after set diff using a new
get_domain_group_keys_at_ref() helper. The helper:
  - reads the DOMAIN_GROUPS object literal from git show <ref>:<file>
  - extracts every 'key': pattern inside the {...} block
  - returns a Set so caller can compute (after - before) for new domains

Better on every "zero-break" dimension:
  - text-only (works on broken branches that don't compile)
  - no Node dependency in the Python audit pipeline
  - independent of value naming convention (any 'key': anything works)
  - tolerates cosmetic reformats (set diff = ∅ when no logical change)
  - graceful degradation on missing files / refs (returns ∅, no crash)

Verified against current HEAD: helper extracts all 37 DOMAIN_GROUPS
keys correctly, including feature-flag-gated entries (equities,
code-execution, direct-fetch, exa-search) that depend on
`...(featureFlags.X ? { 'name': tools } : {})` patterns.

ISOLATION

Zero runtime impact. This is operator audit tooling — Python script
invoked locally pre-PR-merge via /feature-compliance-scaffold. Never
imported by the Node server, never bundled in the Docker image, never
touches the orchestrator or prompt-enhancer code paths. Modifies one
file in .claude/skills/feature-compliance-scaffold/scripts/.

Fixtures: 3/3 still pass (no regression on D1-D10).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ture flag

The flag was a mischaracterization. Adding a flag implies "this is an
optional capability, can be toggled off." The reality is that the original
prompt-enhancer behavior was BROKEN — it pre-computed specialist
deliverables (live trading multiples, DCF, CFIUS analysis) from static web
estimates because it had zero awareness of the 45-subagent registry. The
dynamic catalog injection is a bug fix for that architectural gap, not an
experiment.

If the catalog ever needs to be reverted, the right path is: revert the
commit + redeploy. Same workflow as any other code-level rollback. Not
something operators should be able to flip via env var.

CHANGES

  src/config/featureFlags.js — remove PROMPT_ENHANCER_DYNAMIC_CATALOG
  declaration (-12 LoC)

  src/server/promptEnhancer.js — drop the conditional; catalog is now
  always built and injected. Comment updated to call it "essential, not
  optional." (-24/+18 LoC net)

  docs/feature-flags.md — delete §43 entry, restore Quick Reference count
  41 → 41 (no net adds since v4.3), remove §43 row. Replace with a brief
  "NOT a feature flag" pointer in the Dead Code section explaining the
  decision + linking to the relevant files. (-60/+32 LoC net)

PRESERVED

  Prometheus gauge claude_prompt_enhancer_catalog_chars stays — it's
  observability, not a control surface. Operators still need to verify the
  catalog is being built correctly (post-deploy, gauge should read ~54000
  chars per enhancement call).

  All 32 prompt-enhancer-catalog tests continue to pass — the tests
  exercise buildEnhancerCatalog() directly with flag fixtures, not via
  the enhancer's now-unconditional injection.

  Defensive degradation paths in promptEnhancerCatalog.js are unchanged —
  missing AGENT_DISPLAY_META still falls back to agent.description with a
  catalog_agent_meta_missing warning. The catalog never throws.

CALLER COUNT

  Zero existing flag consumers anywhere in the codebase — flag was new
  in commit bb36077 and only referenced in promptEnhancer.js +
  docs/feature-flags.md. Both updated. No external migration needed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rator wire-in

Document the 5-commit lineage (bb36077 + bc5c80b + 38fda51 + 6f20819
+ b654a85) under a single v6.10.0 entry in both root + service CHANGELOGs.

Covers: original SpaceX-IPO failure mode, two-layer root cause (enhancer
ignorance + orchestrator discard), four-layer fix (catalog builder +
enhancer injection + orchestrator wire-in + skill template loop), the
mid-release flag-removal pivot (feature flag → essential infrastructure),
32-test coverage breakdown, and the unverified-in-production honest limit.

Service CHANGELOG entry is the canonical detailed version (~115 LoC);
root CHANGELOG is the concise leadership-readable version (~33 LoC) with
pointer to the service CHANGELOG for full technical detail.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…efault OFF)

PR1 of two-PR rollout that replaces the hardcoded entity list in KG Phase 6
(kgPhases6to8.js:73-83, hardcoded to 9 DigitalBridge/SoftBank/ADIA names)
with a dynamic per-session entities.json sidecar emitted by fact-validator.
Closes the systemic gap exposed by SpaceX-IPO session 2026-05-16-1778951162
(632 nodes / 267 edges vs March 31 baseline 1083/2062 — edge density
collapsed -78% due to Phase 9 O(n²) cardinality starvation from ~0 entity
anchors).

PR1 ships the PRODUCER ONLY — fact-validator emits the sidecar when
explicitly opted-in via FACT_VALIDATOR_EMIT_ENTITIES_JSON=true. KG Phase 6
consumer wiring lands in PR2 after staging soak validates the producer.

This split (per M1 mitigation in /Users/ej/.claude/plans/floating-cooking-
flute.md) prevents the silent-fact-registry-quality-degradation failure
mode: the flag stays OFF by default so production fact-validator behavior
is byte-identical to before this commit. Operator manually flips flag
true in staging, runs 3-5 fresh sessions, diffs the resulting fact-
registry.md against a frozen pre-flag baseline, and only after sign-off
flips true in prod.

CHANGES

  src/config/featureFlags.js — add FACT_VALIDATOR_EMIT_ENTITIES_JSON flag
  (default false). Comment explains it's PR1 of the dynamic-entities
  sidecar work and points to the plan.

  src/schemas/entitiesJson.js (NEW, ~110 LoC) — Zod schema for the
  entities.json contract. Defines: schema_version, session_key,
  generated_at, source_reports_analyzed, entities array (.max(50) — the
  primary Phase 9 cardinality safeguard per M4 mitigation). Each entity
  has canonical_name, entity_type (bounded enum: target/acquirer/
  co_investor/portfolio_company/regulator/key_person/counterparty/
  underwriter/other), role (free-form), variations, match_patterns
  (≥1, ≤10 — what Phase 6 will escapeRegex + word-boundary-match against
  report content), source_refs (provenance: report_key + mention_count),
  confidence. Exports parseEntitiesJson (throws) and safeParseEntitiesJson
  (returns null) for the consumer's defensive degradation path.

  src/config/legalSubagents/agents/fact-validator.js — additive prompt
  extension and outputFiles registration. Adds 'entities.json' to
  outputFiles, adds PHASE_5.3 to compaction-recovery checklist, appends
  ENTITIES.JSON SIDECAR section (~70 LoC) after the existing §II.C Entity
  Names table specification. Critical rules captured in prompt:
    - WHEN TO EMIT: only when FACT_VALIDATOR_EMIT_ENTITIES_JSON=true
      (when false, behavior is byte-identical to pre-PR1)
    - HARD CAP: 50 entities/session (with top-N-by-mention truncation
      guidance if >50 candidates)
    - SCHEMA: direct serialization of the already-canonicalized §II.C
      table — NO re-extraction from source reports (zero new LLM cost)
    - match_patterns rules: plain strings only (no regex chars), ≥3 char
      minimum (avoids "Switch" false-matching "switchover"), include
      canonical_name + 1-2 distinguishing tokens
    - MISSING ENTITIES: emit entities: [] (file presence is the signal)
  RETURN FORMAT JSON now includes entities_emitted (integer or null when
  sidecar skipped).

  src/utils/artifactPersistence.js — extend persistSessionArtifacts to
  scan for review-outputs/entities.json and persist as a report_artifacts
  row with mime='application/json', category='sidecar', source=
  'fact_validator'. Filesystem-only persistence would be lost on container
  roll (GCE MIG auto-heal); report_artifacts ensures survival across any
  container event AND auto-surfaces in existing /api/audit-report endpoint
  + client-audit-export regulator handoff bundles (their SELECT * queries
  include the new row automatically). ENOENT skip is the dominant case
  during PR1 (flag default false), warned only on non-ENOENT errors.

  test/fixtures/entities-spacex.json (NEW) — canonical example with 10
  representative entities for the SpaceX-IPO domain (SpaceX, Musk, NASA,
  FAA, FCC, SEC, CFIUS, QIA, Space Force, Morgan Stanley) — covers 6 of
  9 entity_type enum values + the regulatory-heavy content profile that
  exposed the original bug.

  test/sdk/entities-json-schema.test.js (NEW, ~230 LoC, 26 tests) — Zod
  schema validation across 7 groups: constants, fixture round-trip,
  happy-path parsing with defaults, schema violations (8 rejection
  cases), hard 50-cap enforcement, safeParse graceful degradation,
  empty-entities legitimate case. All passing.

ZERO DATABASE SCHEMA CHANGES

No new tables, no new columns, no migrations. entities.json persists via
existing report_artifacts table with a new MIME row value (no schema
change — the mime_type column already accepts arbitrary strings).
Existing /api/audit-report + client-audit-export consumers automatically
include the new artifact via existing SELECT * patterns.

VERIFICATION

  $ NODE_OPTIONS=--experimental-vm-modules jest \
      test/sdk/entities-json-schema.test.js
  Tests:       26 passed, 26 total

  $ node -e "import('./src/config/featureFlags.js').then(m => \
      console.log(m.featureFlags.FACT_VALIDATOR_EMIT_ENTITIES_JSON))"
  false  ← default OFF, ready to ship safely

ROLLOUT NEXT STEPS

1. Deploy PR1 to staging — fact-validator behavior identical (flag OFF)
2. Operator sets FACT_VALIDATOR_EMIT_ENTITIES_JSON=true in staging env,
   restarts container, runs 3-5 fresh sessions via frontend
3. Operator inspects produced entities.json files via:
   curl http://STAGING/api/session/<key>/audit-report | jq '.artifacts[]
     | select(.mime_type=="application/json")'
4. Operator diffs the new fact-registry.md against a frozen pre-flag
   baseline of similar session shape — looks for quality regression
   signals (fewer facts, missing conflicts, broken section mapping)
5. On clean staging soak, flip flag true in prod env. Sidecar starts
   producing but no consumer until PR2 ships.

PR2 (consumer) waits for above. Per plan, PR2 will add KG Phase 6
loadEntitiesJson + tier-3 fallback (LEGACY_DIGITALBRIDGE_FALLBACK kept
for zero-break on existing sessions) + Prometheus gauge
claude_kg_phase6_entity_count + E2E validation gate (rebuild SpaceX-IPO
session, expect 632→~1000 nodes + 267→~1500 edges).

@see /Users/ej/.claude/plans/floating-cooking-flute.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…on (PR1.1)

Post-PR1 (aa1dbdf) three-agent review surfaced one CRITICAL gap:
FACT_VALIDATOR_EMIT_ENTITIES_JSON flag was non-functional. The prompt
instructed the agent to "Check this via the process.env inspection
guidance the orchestrator provides" — but Sonnet runs LLM-side, cannot
read process.env, and no upstream orchestrator code injected the flag
value into the system prompt. Result: silent failure mode where Sonnet
might emit unconditionally OR be confused and skip emission.

ROOT CAUSE

The flag was wired in featureFlags.js correctly but the gating decision
was deferred to runtime (Sonnet reading env) instead of build-time
(JavaScript constructing the prompt). Sonnet only ever sees the static
template literal that was evaluated at module load.

FIX — build-time conditional prompt construction

Standard pattern in this codebase (verified via grep across
src/config/legalSubagents/agents/ — equity-analyst, financial-analyst,
cybersecurity-compliance-analyst, government-contracts-researcher all
import featureFlags and use it for tool wiring + prompt interpolation):

  import { featureFlags } from '../../featureFlags.js';

  const ENTITIES_JSON_SIDECAR_BLOCK = featureFlags.FACT_VALIDATOR_EMIT_ENTITIES_JSON
    ? `## ENTITIES.JSON SIDECAR (MANDATORY — flag enabled) ...`
    : '';

  export const def = {
    ...,
    prompt: `...
      ## Entity Names
      ...table...
      ${ENTITIES_JSON_SIDECAR_BLOCK}
      ## Assumption Status ...
    `,
  };

When the flag is FALSE (production default), ENTITIES_JSON_SIDECAR_BLOCK
is the empty string and the entire ENTITIES.JSON SIDECAR section literally
does not exist in the prompt Sonnet sees. The agent physically cannot emit
a file it has no instructions for. Three prompt sections are gated
identically: the section block itself, PHASE_5.3 checklist line, and the
entities_emitted field in RETURN FORMAT JSON.

When TRUE (operator opt-in for staging soak), all three sections
materialize — agent gets unambiguous "MANDATORY — flag enabled" instruction
with no env-var-inspection ambiguity.

VERIFICATION

New test file: test/sdk/fact-validator-entities-flag.test.js (7 tests).
Runs in BOTH flag states:

  - flag=false (default): prompt has zero entities.json references
    (sweep tests: no "entities.json", no "match_patterns", no
    "entity_type", no "HARD CAP 50" — comprehensive content scan)
  - flag=true: prompt contains "MANDATORY — flag enabled", contains
    none of the broken "process.env" / "check env var" patterns
  - prompt length materially differs (~21K off, ~24K on)
  - outputFiles still includes 'entities.json' in both states (metadata
    is gate-independent; artifactPersistence handles ENOENT gracefully)

Test execution proof:
  $ npx jest test/sdk/fact-validator-entities-flag.test.js
  Tests: 7 passed, 7 total                    ← flag default OFF

  $ FACT_VALIDATOR_EMIT_ENTITIES_JSON=true npx jest \
      test/sdk/fact-validator-entities-flag.test.js
  Tests: 7 passed, 7 total                    ← flag ON

Combined PR1 + hotfix test suite: 65/65 pass across 3 files
(entities-json-schema 26, fact-validator-entities-flag 7,
prompt-enhancer-catalog 32). No regressions.

MONOLITHIC FILE SYNC (concern #3 from review)

Updated legalSubagents.js:6369 (monolithic) outputFiles to include
'entities.json' for consistency. Added explicit comment that the
monolithic file is NOT production (MODULAR_SUBAGENTS=true default) and
its prompt is static — entities.json work happens through the modular
file only. Prevents future-developer confusion when grepping for
fact-validator definitions.

REVIEWER CONCERNS

1. ❌→✅ Flag gating: FIXED via build-time conditional construction
2. ✅ outputFiles enforcement: confirmed metadata-only; ENOENT-safe (no
     change needed)
3. ✅ Monolithic divergence: synced for housekeeping + documented status

ZERO downstream impact when flag is off (default). End-to-end behavior
identical to pre-aa1dbdfe (pre-PR1) when flag is off — PR1 ships dead-
on-arrival as planned.

@see /Users/ej/.claude/plans/floating-cooking-flute.md (M1 mitigation)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ag — entities.json is essential infrastructure

Same architectural correction applied earlier to the dynamic
prompt-enhancer catalog (commit b654a85). The flag was risk-driven
(M1 mitigation against fact-registry.md quality regression) but
mischaracterized the change as optional. Reality:

  - KG Phase 6's hardcoded entity list IS broken for any non-DigitalBridge
    session. entities.json is the fix, not an experiment.
  - Recovery time is identical: "flip flag + restart" (~5 min) vs
    "revert + redeploy" (~10 min) — marginal
  - The hotfix (PR1.1, f821108) already isolated the prompt change
    cleanly (separate constant, appended after the existing Entity Names
    table — natural extension, not a new mental model)
  - The agent is doing pure serialization of work it already did —
    zero new inference, zero new reasoning load
  - Bug fixes don't get feature flags; they get reverts if they break

CHANGES

  src/config/featureFlags.js — remove FACT_VALIDATOR_EMIT_ENTITIES_JSON
  declaration (-12 LoC)

  src/config/legalSubagents/agents/fact-validator.js — drop conditional
  prompt construction. ENTITIES.JSON SIDECAR section now unconditionally
  inlined after the Entity Names table. PHASE_5.3 checklist line and
  entities_emitted RETURN field also unconditional. Docstring updated to
  call entities.json "essential infrastructure, not optional capability."
  (-30/+90 LoC net — simpler code overall)

  src/config/legalSubagents.js — update outyfile NOTE comment to reflect
  that modular prompt now unconditionally emits entities.json instructions
  (was: "flag-gated"; now: simple statement)

  test/sdk/fact-validator-entities-flag.test.js → fact-validator-entities.test.js
  Rename + simplify. Drop the parametrized flag-state tests; replace with
  unconditional contract tests:
    - outputFiles includes entities.json
    - prompt has ENTITIES.JSON SIDECAR section + 5.3 + entities_emitted
    - prompt has hard 50-cap + match_patterns rules + 9-enum
    - prompt instructs empty-entities still emit (file presence signal)
    - regression guard: prompt has NO process.env / "check env var" /
      "FACT_VALIDATOR_EMIT_ENTITIES_JSON" references
    - agent has Write tool
  11 tests, all passing.

PRESERVED

  - Zod schema (src/schemas/entitiesJson.js) — unchanged, still validates
    consumer input
  - Persistence path (src/utils/artifactPersistence.js) — unchanged,
    still ENOENT-safe (fact-validator may legitimately produce empty
    entities array on edge sessions, file always written)
  - All 26 entities-json-schema tests + 32 prompt-enhancer-catalog tests
    continue to pass

ROLLOUT SIMPLIFICATION

Previous plan: "deploy PR1 → operator flips flag in staging → 3-5 fresh
sessions diff → flip flag in prod → wait → ship PR2"

New plan: "deploy PR1 → entities.json starts emitting on next session
immediately → ship PR2 (consumer) → KG benefit lands automatically"

The staging soak (M1) is no longer enforced by the code. Operator can
still run a staging session manually before prod deploy to spot-check
fact-registry.md quality, but it's no longer a flag-flip gate. If
quality regression is observed: revert + redeploy (~10 min recovery).

NO downstream impact on PR2 — the consumer (KG Phase 6) still doesn't
read entities.json. PR1 + this flag removal ships an essential producer
that's ready to be wired the moment PR2 lands.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the entities-sidecar work started in PR1 (aa1dbdf). PR1 shipped
the producer (fact-validator emits entities.json to report_artifacts);
PR2 ships the consumer (KG Phase 6 reads entities.json instead of using
the hardcoded 9-entity DigitalBridge list). With both PRs deployed:

  - Every new session writes entities.json via fact-validator (PR1)
  - KG build (initial or rebuild via /api/admin/.../rebuild-kg) reads
    entities.json from report_artifacts, creates one entity node per
    canonical_name, and Phase 9 cross-link cardinality recovers
    automatically (Phase 9 was always cardinality-driven; the bug was
    Phase 6 starving it of entity anchors)

For non-DigitalBridge sessions (SpaceX-IPO, future IB/PE memos), this
restores Phase 9 edge density from the 0.42 edges/node observed in the
2026-05-16 SpaceX session toward the 1.90 baseline. Phase 9 itself is
unchanged.

ARCHITECTURE — two-tier fallback (no markdown-parser tier 2)

The original plan included a tier-2 lazy backfill that would parse the
fact-registry.md §II.C "Entity Names" table in-memory when entities.json
was missing. This was dropped because it carried the same PR #130
certificateParser.mjs failure class (markdown format drift = silent
fleet-wide data loss). Backfill of old sessions is now an explicit
deferred operator concern, not an automatic path.

Two tiers only:

  1. entities.json from report_artifacts (PRIMARY for new sessions
     post-PR1+PR2). DB-backed, survives container rolls.

  2. LEGACY_DIGITALBRIDGE_FALLBACK (preserves pre-PR2 behavior on old
     sessions). Renamed from the hardcoded entityPatterns array; same
     9 DigitalBridge entries.

CARDINALITY SAFEGUARD (M4 mitigation)

PHASE6_ENTITY_CAP=50 + runtime guard in resolvePhase6Entities truncates
oversized entities.json arrays. The Zod schema in src/schemas/
entitiesJson.js also caps entities.max(50) at the sidecar boundary —
the runtime guard is defense-in-depth in case a future schema bump raises
the Zod cap without raising the resolver cap. Both layers must stay in
sync (test documents this invariant).

OBSERVABILITY

New Prometheus gauge:

  claude_kg_phase6_entity_count{source="entities_json"|"legacy_hardcoded"}

Surfaces three operator signals per KG build:
  - source=entities_json → fact-validator sidecar consumed (post-PR1
    sessions)
  - source=legacy_hardcoded → fell back to old DigitalBridge list (old
    session OR malformed entities.json — search Cloud Logging for
    "entities.json present but failed Zod validation" to disambiguate)
  - count > 50 → cardinality guard truncated; investigate fact-validator
    over-extraction

Recommended alert (operator runbook): claude_kg_phase6_entity_count > 75
sustained 15m. Truncation events are NOT a separate Gauge series
(would persist across rebuilds and violate "current state" semantics) —
they surface via the resolvePhase6Entities console.warn log instead.

FILES

  src/utils/knowledgeGraph/kgHelpers.js — add getEntitiesForSession(pool,
  sessionId): queries report_artifacts WHERE mime_type='application/json'
  AND file_name='entities.json'; converts BYTEA to UTF-8; safeParse via
  Zod; returns parsed entities array or null. Dynamic import of the
  schema module defers ~50ms cost on misses (the common pre-PR1 case).
  Catches DB errors with logWarn + null return. Caller MUST treat null
  as "use fallback" — never throws.

  src/utils/knowledgeGraph/kgPhases6to8.js — replace inline entityPatterns
  loop with resolvePhase6Entities() resolver. Add LEGACY_DIGITALBRIDGE_
  FALLBACK constant (renamed + 1 LoC expansion adding match_patterns
  field for consistent shape with entities.json), PHASE6_ENTITY_CAP=50,
  escapeRegex helper. New entity-node properties: entity_type, variations,
  source_refs, confidence_tier, extraction_source ('entities_json' |
  'legacy_hardcoded'). Phase 9 reads only entity.label + entity.properties.
  full_text|context — verified safe (existing reader doesn't touch new
  fields). Confidence mapping: HIGH→1.0, LOW→0.6, MEDIUM→0.85 default.
  Exported resolvePhase6Entities + constants for tests only.

  src/utils/sdkMetrics.js — register claude_kg_phase6_entity_count
  Gauge with source label + setKgPhase6EntityCount(source, count) setter.
  Help text guides operators to the >75 alert threshold + cites the
  Cloud Logging search for truncation event audit.

  test/sdk/kg-phase6-entities.test.js (NEW, ~270 LoC, 14 tests):
  - Group 1 (3 tests): tier-1 happy path — SQL query shape, returns parsed
    entities, preserves match_patterns
  - Group 2 (5 tests): tier-1 graceful failures — missing artifact, DB
    throws, invalid Zod schema, malformed JSON bytes, null file_data
  - Group 3 (5 tests): resolvePhase6Entities two-tier fallback — tier 1,
    tier 2 missing, tier 2 malformed, exact 50-cap, defense-in-depth
    documentation
  - Group 4 (1 test): SpaceX fixture round-trip end-to-end (10+ entities,
    canonical names verified)

  All 14 PR2 tests pass. Combined PR1 + PR2 + adjacent suite: 111/111
  passing across 5 test files. No regressions.

EXPECTED IMPACT (validation gate)

Re-run KG build against SpaceX-IPO session 2026-05-16-1778951162 after
deploy:
  - Phase 6 entity count: 0 → 10+ (fact-validator over SpaceX content
    surfaces SpaceX, Musk, NASA, FAA, FCC, CFIUS, NRO, Space Force,
    Morgan Stanley, comparable companies)
  - Phase 9 edge count: 267 → ~1,500+ (cardinality recovery from real
    entity anchors)
  - Overall KG node count: 632 → ~900-1,100 (back in March 31 baseline
    range)
  - New gauge reads claude_kg_phase6_entity_count{source="entities_json"}
    = ~10-15 (well under 50-cap)

NOTE: SpaceX session was completed BEFORE PR1 deployed, so it has no
entities.json artifact. Rebuild on that session will fall back to
LEGACY tier and produce same 632/267 numbers. Validation requires a
NEW IB/PE/IPO session run AFTER both PRs deploy + then rebuild on that
new session.

ROLLBACK

Revert this commit (PR2) → Phase 6 reverts to using LEGACY hardcoded
list for all sessions; PR1's entities.json artifacts continue to be
written but go unread. No data loss. ~10 min recovery.

@see /Users/ej/.claude/plans/floating-cooking-flute.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Documents the two-PR chain (fact-validator producer + KG Phase 6 consumer)
that closes the systemic Phase 6 hardcoded-entity bug. Root cause: 9
hardcoded DigitalBridge names at kgPhases6to8.js:73-83 → ~0 entity nodes
for any non-DigitalBridge memo → Phase 9 cardinality collapse from 1.90
to 0.42 edges/node. SpaceX-IPO session was the canary (632/267 vs
baseline 1083/2062).

Service CHANGELOG is the canonical detailed version (~155 LoC); root
CHANGELOG is the concise leadership-readable version (~33 LoC) with
pointer to service for full technical detail.

Both entries cover: problem statement, PR1 producer mechanics (Zod
schema + prompt extension + report_artifacts persistence), iteration
history (flag added then removed mid-release per essential-infrastructure
principle), PR2 consumer mechanics (getEntitiesForSession helper +
resolvePhase6Entities two-tier fallback + cardinality guard + Prometheus
gauge), architectural decisions (no flags, no markdown-parser tier 2,
zero DB schema changes), expected post-deploy impact, rollback path,
risk score.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

v6.11.0: KG Phase 6 hardcoded entity list — fixed via fact-validator entities.json sidecar

1 participant