Skip to content

feat(banker): Banker QA workflow + KG edge waves + IC Pyramid — integrated with main 8.0.2 (wrapped subagents, dormant flag-off)#178

Merged
Number531 merged 195 commits into
mainfrom
v6.14/banker-qa-phase-1
Jun 3, 2026
Merged

feat(banker): Banker QA workflow + KG edge waves + IC Pyramid — integrated with main 8.0.2 (wrapped subagents, dormant flag-off)#178
Number531 merged 195 commits into
mainfrom
v6.14/banker-qa-phase-1

Conversation

@Number531

@Number531 Number531 commented May 27, 2026

Copy link
Copy Markdown
Owner

Summary

Lands the Banker Q&A workflow module (v6.14) plus the 8 banker-centric KG edge waves (v6.16.0–v6.18.3) and the IC pyramidal frontend surface, now current with main 8.0.2 (wrapped-subagents architecture). The banker module ships dormant behind BANKER_QA_OUTPUT=false — the flag-off path is bit-identical to the legacy pipeline, so this is safe to land with no behavior change. A live, non-Cardinal validation gate (below) runs before any production flag-flip.

This PR window also integrates main's permanent wrapped-subagents line (WRAPPED_SUBAGENTS=true, mcp__subagents__run_* runner, WRAPPED_SUBAGENT_MODEL=claude-opus-4-8) and was brought current via two in-branch merges (origin/main 8.0.1, then 8.0.2). main/origin main were never modified.

Feature-flag state on merge

Flag On merge Notes
BANKER_QA_OUTPUT false (dormant) New agents never invoke; pipeline bit-identical to legacy. Flip only after the pre-flip gate.
KNOWLEDGE_GRAPH true Unchanged — already on in production.
KG_SEMANTIC_EDGES, KG_NUMERIC_EXPOSURE, KG_QA_INFORMS_EDGES, KG_PROBABILISTIC_VALUE, KG_PRECEDENT_BENCHMARKS, KG_DEAL_THESIS, KG_SENSITIVITY_EDGES true 7 KG edge-waves activate (post-hoc, fire-and-forget, circuit-breaker isolated, additive to the graph DB only). Absent on main → newly active here.
KG_CONTRADICTION_EDGES (Wave 4) false — HELD Higher FP risk; its own rollout policy mandates a 7-day soak + manual spot-check, which hasn't started (flags never deployed). Commented in flags.env; enable post-merge after the soak.

What's in this PR

Banker QA module (dormant, BANKER_QA_OUTPUT=false)

  • 3 agents (banker-intake-analyst, banker-specialist-coverage-validator, banker-qa-writer) + orchestrator phases G0.5 / G2.5 / G3.5 / G6, with a dedicated banker-mode resume gate.
  • 8 KG edge waves + Phase 1c content enrichment + the IC pyramidal frontend renderer.

main → banker integration (8.0.x wrapped subagents)

  • 10 hand-resolved conflicts (verified): OPUS_MODEL=claude-opus-4-8, version 8.0.2, BANKER_QA_OUTPUT=false, gate-precedence header in memo-qa-certifier, hook-chain + enhancedPrompt split + buildAgentToolMappingBanner() in agentStreamHandler.
  • Banker agents wire through the wrapped machinery automatically (registry → run_banker_* tools → Opus-4.8 override → universal banner → dispatch); verified by code-trace + live probe.

Migration collision guard

  • Renumbered banker's 022_kg-nodes-embedding-hnsw025_* (avoids the invisible-to-conflict-review collision with main's 022).
  • New scripts/check-migration-collisions.mjs + .github/workflows/migration-lint.yml CI guard.

CI gate fix

  • jest.config.cjs testPathIgnorePatterns excludes the 19 banker/KG node:test suites that jest cannot run (they run via node --test in kg-tests.yml). jest glob 230 → 211. Neutralizes the banker branch's contribution to the pre-existing deploy.yml bare-npm test failure (remaining deploy.yml debt is pre-existing main debt, documented as a follow-up).

Banker-QA output validation gate (NEW, inert hardening)

  • src/utils/knowledgeGraph/bankerQaValidator.js — non-breaking parse-back gate: re-parses banker-question-answers.md with the production bankerQaParser and asserts structural integrity (Answer/Because parseable, confidence parses, ≥1 citation, expected Q-block count, no all-null block); bankerQaMetadataSchema (zod) for the sidecar. Inert — nothing in the production path calls it yet.
  • test/sdk/banker-qa-validator.test.js — 14 node:test cases (gold passes, drift caught).
  • scripts/run-bankerqa-isolated.mjs — standalone isolation harness (no server, no full pipeline).

Feature-flag documentation

Deeper-review corrections (round 3 — 6 findings, all addressed)

# Finding Disposition
G1 section-matcher read topic word "tax" as letters a/x/t → wrong §IV.A/.X/.T→section CITES edges (unconditional, every session) FIXED — strict isLetterCluster(); +2 tests
G2 parseMultiple un-anchored → head single dropped for tail range (wrong value/type, double-emit); active via KG_PRECEDENT_BENCHMARKS FIXED — head-anchored all 3 regexes; +1 test
G3 Phase 16 fanout-cap bypass (prose+numeric each cap at 12 → 24/source) HELD KG_SENSITIVITY_EDGES OFF; fix → #204
G6-num Phase 11 "silent wrong magnitude" (under-specified repro) HELD KG_NUMERIC_EXPOSURE OFF; fix → #204
G6-bank mixed-case citation class [Filing] silently dropped (dormant) FIXED — case-insensitive + normalize
G4 / G5 dead Prometheus alerts / DOM-XSS (marked.parseinnerHTML, pre-existing repo-wide) TRACKED#204 (before-flip / repo-wide)

Net flag state on merge: 5 KG waves ON, 3 HELD (KG_CONTRADICTION_EDGES, KG_NUMERIC_EXPOSURE, KG_SENSITIVITY_EDGES); BANKER_QA_OUTPUT=false. Full regression: KG node:test 429/429, wrapped 868/868.

Merge-review corrections (PR #178 reviewer findings — all addressed)

  • Test reproducibility — the Cardinal gold artifact lives under gitignored reports/, so the banker test suites ENOENT'd on a clean checkout ("426/426" was machine-local). Fixed: committed the gold + coverage JSON to tracked test/fixtures/banker-qa/ and repointed both suites. Verified reproducible with reports/ hidden (validator 14/14, parser 29/29).
  • canonical_key guard — the recommendation-dedup test locked the formula against a hand-kept replica that could drift from production. Fixed: extracted deriveRecommendationCanonicalKey() (exported, behavior-identical) and rewired the 19 dedup tests to import it → they now guard the production formula (19/19). ⚠ The v6.18.1 canonical_key change is unconditional and changes recommendation node identity (re-keys historical-session rebuilds) — flagged for explicit sign-off.
  • Inert CI (honest disclosure) — all workflows live at super-legal-mcp-refactored/.github/workflows/; GitHub only scans a repo-root .github/workflows/ (absent), so none of the cited checks actually run. This is pre-existing and repo-wide (main's own deploy.yml/integration-tests never ran either) — not introduced here. The checks below were run manually/locally. Relocation tracked separately in CI workflows never run — relocate .github/workflows/ to repo root (pre-existing, repo-wide) #203 (deliberately not bundled — it would activate main's known-failing deploy.yml).
  • Two un-flagged riders now gated behind BANKER_QA_OUTPUT (re-review nice-to-haves, resolved) — flag-off is now byte-identical to main: the documentConverter citation-paragraph-style.lua filter (DOCX+PDF) applies only in banker mode, and the session timeout is BANKER_QA_OUTPUT ? 6h : 4h (non-banker keeps main's 4h). Verified flag-off → 4h + no filter.

Validation (run manually/locally — see inert-CI note above)

  • Smoke: migration guard ✓ · featureFlags/OPUS_MODEL=claude-opus-4-8 ✓ · fast jest unit ✓
  • Integration: wrapped-subagent suite 868/868 ✓ · KG+banker node:test 426/426 (incl. validator 14/14, reproducible from committed fixture) ✓
  • Empirical (Tier 3, live Opus 4.8): banker-qa-writer on Opus 4.8 produced parser-clean output first-pass (29/29 Q-blocks, correct 5-level confidence) — the model-drift concern is empirically dismissed. One divergence surfaced as a warning (Opus omits the separate **Question:** field; question text is in the ### Q#: header) — deferred follow-up affecting only KG Phase 1c question_prompt.

Pre-flag-flip gate (before setting BANKER_QA_OUTPUT=true in production — NOT required to merge)

  • One full non-Cardinal banker session under WRAPPED_SUBAGENTS=true + BANKER_QA_OUTPUT=true confirming: dispatch emits mcp__subagents__run_banker_*, no path-doubling, frontend banker-mode render. (The Opus-4.8 output-format concern is already closed by the Tier-3 isolation run.)

Sign-off needed before merge

  • Un-flagged KG changes to existing logic (Phases 2/6/9/10) run on every session with KNOWLEDGE_GRAPH on — not behind the banker/wave flags. Most notably the Phase 10 canonical_key change alters recommendation node identity (historical-session rebuilds re-key). Phases 6/9 are covered by their own node:test suites; Phase 10 is now guarded by the production-imported dedup suite. These are intended v6.16–v6.18.1 improvements — they need explicit reviewer sign-off (not a revert).

Deferred follow-ups (tracked separately, not blockers)

  • Relocate CI workflows to repo root so the checks actually run — #203.
  • Enable KG_CONTRADICTION_EDGES (Wave 4) after the 7-day soak + manual CONTRADICTS spot-check on the first post-merge production sessions.
  • Consolidate the 8 KG_* sub-flags into KNOWLEDGE_GRAPH once all soak — #202.
  • deploy.yml bare-npm test debt (live-test hang + 9 zero-test suites, all pre-existing on main) — folded into CI workflows never run — relocate .github/workflows/ to repo root (pre-existing, repo-wide) #203.
  • banker-qa-writer **Question:**-field divergence (mandate in prompt or add a parser header fallback).
  • Optional: wire the validation gate into orchestrator G6 (Tier-3 makes it optional insurance, not a needed fix).

See docs/pending-updates/Banker-Merge-Risk.md for the full merge-risk SSOT and docs/feature-flags.md for the flag reference.

🤖 Generated with Claude Code

Number531 and others added 30 commits May 21, 2026 16:07
Adds the canonical architecture, phase gating spec, and modular precedent for the Banker Q&A Output feature, behind BANKER_QA_OUTPUT=false default flag.

Symmetric architecture with three new sibling agents (banker-intake-analyst, banker-specialist-coverage-validator, banker-qa-writer) bookending the question-driven pipeline. Single-condition dispatcher: flag is the master switch. All five load-bearing component families (promptEnhancer.js, memo-executive-summary-writer.js, 25 specialists, 6 synthesis prompts, 12 existing QA dimensions) remain byte-untouched.

Locks in 10 invariants (I1-I10) verifiable as binary diff/grep/SQL checks, three gating mechanisms (M1 orchestrator system-prompt injection, M2 artifact-existence gating, M3 orchestrator-controlled dispatch), and a 9-gate implementation/validation/rollout sequence (G0-G8).

Adds Dim 13 with rubric inheritance from Dim 3 (provably identical per-answer quality bar). Defense in depth via three coverage gates: banker-specialist-coverage-validator post-Wave-1, pre-qa-validate Q-coverage gate, Dim 13 scoring.

Phase 2 (visualization) deferred per data-first principle.

Establishes modular precedent (§ 17) for future workflow modes (regulatory filing, litigation prep, tax memo, compliance audit, cross-border M&A) at ~6-7 days each with zero load-bearing modifications.

Estimate: ~835 LoC + ~1,040 prompt lines across ~27 files, 11-day Phase 1 timeline, zero DB migrations, zero compliance impact.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…er-intake-analyst

Adds W1 implementer guidance to § 15.2.B mapping Cardinal v2.0's substantive intake-stage content (10-stage resolution protocol, utility M&A sector scaffold, acquirer failure-mode context, prohibited-assumption rules, client archetype matrix) into banker-intake-analyst's prompt, banker-deal-context.json schema, and banker-prohibited-assumptions.json sidecar — without adopting Cardinal's architectural assumptions that would violate I3/I4.

Architecture stays as locked: three sibling agents, single-condition flag dispatch, byte-untouched load-bearing components, Dim 13 with rubric inheritance, M1/M2/M3 gating mechanisms.

Cardinal's specialist-system-prompt injection, per-dimension-penalty application to Dims 0-11, non-canonical phase nomenclature ("Phase 8.5/10/12"), 22-specialist count (vs. actual 25), hard-halt on non-utility sectors, and 5,000-8,000-word Executive Memo Wrapper output are explicitly marked DO NOT ADOPT — with rationale.

Cardinal Executive Memo Wrapper deferred to Phase 3 (post-pilot decision; promote to v6.16 only if G5 pilot banker requests a narrative wrapper alongside the Q&A grid).

Net effect: banker-intake-analyst captures ~80% of Cardinal's intake-stage value while preserving all 10 invariants and the 11-day Phase 1 timeline.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add BANKER_QA_OUTPUT=false (default) to featureFlags.js with full v6.14
contract comment, and to flags.env. Flag controls existence (whether three
new sibling agents are dispatched and their downstream KG/Dim 13/artifact
infrastructure produces rows), not behavior of any load-bearing component.

Per spec § 15.1: "the flag controls existence, not behavior."

Spec: docs/pending-updates/Banker-Structuring-Output.md § 15.1 + § 16.1 G1
Gate: G1.1 of 11

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three new subagent definition files following the established 8-file
subagent-scaffold pattern. Each file is a minimal `def` export wiring the
agent's description, execution metadata, model (sonnet-4-6), tools, and
its capability prompt (capabilities themselves land in G1.3 via
_promptConstants.js).

Symmetric architecture bookends the question-driven pipeline:

  banker-intake-analyst (FRONT)
    Inputs:  raw banker prompt (15-20 numbered Qs + deal context)
    Outputs: banker-questions-presented.md, banker-deal-context.json,
             banker-prohibited-assumptions.json, banker-intake-state.json
    Phase:   G0.5 (before P1) when BANKER_QA_OUTPUT=true

  banker-specialist-coverage-validator (MID, Wave 1.5)
    Inputs:  research-plan.md, banker-questions-presented.md,
             specialist-reports/*.md
    Outputs: specialist-coverage-report.md, specialist-coverage-state.json
    Phase:   G3.5 (after V4, before G1.x) — enforces I9

  banker-qa-writer (BACK)
    Inputs:  banker-questions-presented.md, specialist-coverage-state.json,
             executive-summary.md (read only), consolidated-footnotes.md,
             section-reports/section-IV-*.md
    Outputs: banker-question-answers.md, banker-qa-state.json,
             banker-qa-metadata.json
    Phase:   G6 (after G5, before A1)

All three are pure additive sibling agents — they introduce zero edits to
the 25 existing specialists, 6 synthesis prompts, 12 existing QA dims, or
memo-executive-summary-writer. Invariants I1, I2, I3, I4, I7 preserved.

Spec: docs/pending-updates/Banker-Structuring-Output.md § 15.2.B/C/D
Gate: G1.2 of 11

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three new exports in _promptConstants.js — the load-bearing capability
prompts consumed by the three sibling agent definitions added in G1.2:

  BANKER_INTAKE_ANALYST_CAPABILITY (~280 prompt lines)
    Documents the 10-stage internal resolution protocol (entity/intent
    parsing -> sector classification -> deal-stage classification ->
    primary-source fact retrieval -> archetype resolution -> specialist
    priority hinting -> sector scaffold selection -> acquirer
    failure-mode retrieval -> prohibited-assumption assembly ->
    composition), three output artifacts with explicit schemas
    (banker-questions-presented.md verbatim preservation rule,
    banker-deal-context.json schema, banker-prohibited-assumptions.json
    rule schema), and the question-hygiene gate.

    Cardinal Framing Layer v2.0 content adopted as blueprint per spec
    § 15.2.B W1 implementer note. Explicit "Do NOT adopt" list honored:
    no specialist-system-prompt injection (preserves I3/I4), no
    per-Dim-0-11 penalties (preserves I3), graceful degradation on
    non-utility sectors (no hard-halt), no Cardinal 5,000-8,000-word
    executive memo wrapper (deferred to Phase 3).

  BANKER_SPECIALIST_COVERAGE_VALIDATOR_CAPABILITY (~220 prompt lines)
    Per-question PASS / REMEDIATE / ACCEPT_UNCERTAIN decision matrix
    with evidence-bearing rules and remediation-task emission format.
    Max 2 remediation cycles; ACCEPT_UNCERTAIN rationale propagates
    downstream to banker-qa-writer.

  BANKER_QA_WRITER_CAPABILITY (~300 prompt lines)
    Pure consolidator contract — reads banker-questions-presented.md
    (NOT questions-presented.md, which remains the exec summary
    writer's exclusive input per I2), specialist-coverage-state.json,
    executive-summary.md (read only — never modified per I1),
    consolidated-footnotes.md, and section-IV reports; emits one
    ### Q#: block per banker question with Answer / Because /
    Confidence / Supporting analysis / Citations + a machine-readable
    banker-qa-metadata.json sidecar consumed by KG Phase 1b and the
    /api/db/sessions/:key/questions endpoint.

Spec: docs/pending-updates/Banker-Structuring-Output.md § 15.2.B/C/D
Gate: G1.3 of 11

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wire the three sibling agents into the platform's standard discoverability
and observability layers. None of these edits are flag-gated — registry
shape stays stable across flag flips, classifications return null/no-op
when their target agents never invoke (M3 gating at dispatch time
prevents invocation under flag-off operation).

  legalSubagents/index.js
    - Import three new def exports (banker-intake-analyst,
      banker-specialist-coverage-validator, banker-qa-writer)
    - Append three [name, def] tuples to LEGAL_SUBAGENTS registry

  utils/hookSSEBridge.js
    classifyAgent() additions:
    - banker-intake-analyst        -> { phase: 'intake',      stage: 'banker_intake',       wave: null }
    - banker-specialist-coverage-validator -> { phase: 'validation', stage: 'specialist_coverage', wave: 1.5 }
    - banker-qa-writer             -> { phase: 'generation',  stage: 'banker_qa_output',    wave: null }

    classifyDocument() additions:
    - banker-questions-presented.md  -> 'banker-intake'
    - specialist-coverage-report.md  -> 'specialist-coverage'
    - banker-question-answers.md     -> 'banker-qa'

  catalogDisplay/agentClassifications.js
    - New 'intake' phase with banker-intake-analyst membership
    - banker-specialist-coverage-validator added to 'validation' membership
    - banker-qa-writer added to 'generation' membership
    - AGENT_OUTPUT_MAP entries for all three agents

  catalogDisplay/agentDisplayMeta.js
    - Three new entries with role / expertise / dealContext following
      the established IB/PE/M&A-banker-friendly description pattern

Per spec § 15.2.B/C/D (file enumerations) — these match the 8-file
subagent-scaffold pattern entries for each new sibling agent.

Spec: docs/pending-updates/Banker-Structuring-Output.md § 15.2.B/C/D
Gate: G1.4 of 11

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Four additive entries each for the three new sibling agents — sufficient
for the existing hook-to-DB bridge to classify, persist, and index banker
artifacts without any other code changes downstream.

  VALID_REPORT_TYPES Set
    + 'banker_intake'       (banker-questions-presented.md)
    + 'specialist_coverage' (specialist-coverage-report.md)
    + 'banker_qa'           (banker-question-answers.md)

  REPORT_TYPE_MATCHERS (path-based first-match-wins)
    + 'banker-questions-presented' -> 'banker_intake'
    + 'specialist-coverage-report' -> 'specialist_coverage'
    + 'banker-question-answers'    -> 'banker_qa'

  AGENT_TYPE_MATCHERS (state-key-based first-match-wins)
    Listed FIRST so they take precedence over the broader patterns:
    + 'banker-intake-analyst'                -> 'banker-intake-analyst'
    + 'banker-specialist-coverage-validator' -> 'banker-specialist-coverage-validator'
    + 'banker-qa-writer'                     -> 'banker-qa-writer'

  STATE_FILE_MAP
    + 'banker-intake-analyst'                -> banker-intake-state.json
    + 'banker-specialist-coverage-validator' -> specialist-coverage-state.json
    + 'banker-qa-writer'                     -> banker-qa-state.json

  STATE_FILE_DIR_MAP
    + All three banker agents write state files to session root
      (consistent with their .md outputs)

Additive enum values — under BANKER_QA_OUTPUT=false the agents never run,
so no rows ever match these new enums (intrinsic dormancy). Preserves
invariant I5.

Per spec § 15.2.H "Persistence + routing wiring (4 entries in 1 file)" —
expanded slightly because the spec's "4 entries" count was for a single
agent; we touched the same four maps for each of the three agents.

Spec: docs/pending-updates/Banker-Structuring-Output.md § 15.2.H
Gate: G1.5 of 11

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two surgical edits to agentStreamHandler.js implementing the single-
condition intake routing prescribed by spec § 15.2.A — no signature
detection, no input-shape heuristic, the flag IS the master switch:

  1. Intake dispatcher (line 239-263 area)
     When BANKER_QA_OUTPUT=true:
       - SKIP runPromptEnhancementPhase() — promptEnhancer.js never
         invokes (preserves invariant I7 byte-identical enhancer)
       - Strip 'intake-research-analyst' from mainAgents passed to the
         orchestrator (prevents legacy intake double-dispatch with the
         banker-intake-analyst that the orchestrator dispatches via G0.5)
     When BANKER_QA_OUTPUT=false:
       - Existing promptEnhancer.js path runs unchanged

  2. Orchestrator system-prompt injection (line ~301)
     Mirroring the existing CITATION_WEBSEARCH_VERIFICATION pattern:
       BANKER_QA_OUTPUT=${featureFlags.BANKER_QA_OUTPUT}
     This is mechanism M1 — the orchestrator's task framing for
     downstream subagents conditionally includes/omits banker-specific
     instructions based on this in-prompt signal. Subagent prompts
     themselves remain byte-untouched.

Rationale for skipping promptEnhancer.js entirely under flag-on rather
than letting both intake paths run: banker-intake-analyst handles a
fundamentally different input shape (15-20 explicit numbered questions
+ deal context) than promptEnhancer.js's short-query enrichment. Running
both would double-cost and could surface contradictory intake artifacts.

Spec: docs/pending-updates/Banker-Structuring-Output.md § 15.2.A
       (gating mechanism table rows 1 + M1 row)
Gate: G1.6 of 11

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ode)

Two coordinated edits to the orchestrator master prompt:

  1. MANDATORY PHASE SEQUENCE table (line 98 area)
     Four new gated rows inserted at functionally-correct positions:
       - G0.5 banker-intake-analyst — BEFORE P1 (session-init)
       - G2.5 orchestrator Q→specialist routing into research-plan.md —
              AFTER P1, BEFORE P2 specialist dispatch
       - G3.5 banker-specialist-coverage-validator — AFTER V4 (Wave 1
              complete), BEFORE G1.x section-generation (enforces I9)
       - G6   banker-qa-writer — AFTER G5 (or G4 if G5 skipped),
              BEFORE A1 final-synthesis

     Banker-mode gating note added below the table: phases fire ONLY
     when system prompt contains BANKER_QA_OUTPUT=true. Under flag-off
     the phase sequence is bit-identical to the legacy pipeline
     (preserves I5/I8).

  2. NEW "BANKER Q&A MODE PROTOCOL" section (~95 lines inserted before
     PHASE EXECUTION PROTOCOL ANTI-LOOP PROTECTION)

     Concrete operational protocol for each new phase:
       - G0.5: input contract, output files, failure mode, recovery
       - G2.5: research-plan.md amendment recipe, mapping algorithm,
               failure mode (unmapped Q → halt for operator)
       - G3.5: PASS / REMEDIATE / ACCEPT_UNCERTAIN decision matrix,
               max-2-cycle remediation loop, escalation threshold
               (recommended ≥30% remaining REMEDIATE = operator review)
       - G6:   input contract, output contract, side effects on A2
               (Dim 13 scoring) and KG Phase 1b

     Banker-mode invariants enforced explicitly:
       - I1: G3 exec summary writer is byte-untouched and does NOT
             receive banker-questions-presented.md
       - I9: G3.5 must complete PASS or ACCEPT_UNCERTAIN before any
             memo-section-writer SubagentStart
       - I3/I4: zero specialist-prompt modifications; banker-specific
             framing reaches specialists ONLY via M1 task framing
             during P2 dispatch, never as edits to specialist prompts

Spec: docs/pending-updates/Banker-Structuring-Output.md § 15.2.A + § 15.2.B/C/D
Gate: G1.7 of 11

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…er + final-synthesis)

Two existing-agent prompts gain conditional behavior via mechanism M2
(artifact-existence gating) — the SAFEST gating mechanism for static
exported prompts because subagent prompts cannot read featureFlags at
runtime. The conditional branches activate ONLY when banker-mode
artifacts physically exist in the session directory; absent files cause
the conditional to short-circuit silently.

  memo-section-writer.js (preserves invariant I4 — CREAC structure)
    New "BANKER Q&A CROSS-REFERENCE SURFACING" section at the bottom
    of the prompt instructs the section writer to:
      1. Glob session root for banker-questions-presented.md
      2. If absent: produce section exactly as today (unchanged)
      3. If present: also read research-plan.md SPECIALIST ASSIGNMENTS
         to find the Q-routing block; for each banker question whose
         routing names this section's specialists, append a one-line
         "Addresses banker questions: Q1, Q3, Q7" reference under the
         section header AND at the close of Subsection B
      4. Include banker_questions_addressed array in RETURN FORMAT JSON
         (omitted when the conditional did not execute)

    CREAC subsections A-F, 4,000-6,000-word target, risk assessment
    tables, citation discipline — ALL unchanged.

  memo-final-synthesis.js (preserves memo assembly contract)
    New "BANKER Q&A COVERAGE VERIFICATION" section instructs the
    final-synthesis writer to:
      1. Glob session root for banker-questions-presented.md
      2. If absent: proceed as today
      3. If present: verify each banker question has a matching
         ### Q#: block in banker-question-answers.md AND at least one
         section materially addresses it (per the Q-cross-ref note
         emitted by memo-section-writer's banker branch above)
      4. Append [BANKER COVERAGE NOTE] structured warnings to the
         Detailed Section Directory for any uncovered question

    No new memo sections introduced; no word-count target changes;
    no assembly procedure changes. Conditional adds a verification
    pass + (when gaps exist) directory-level coverage notes.

Both prompts use file-existence as the gate — when BANKER_QA_OUTPUT=false
at the server level, banker-intake-analyst never runs,
banker-questions-presented.md never exists, and the conditionals never
fire. Preserves invariants I1, I2, I3, I4 by construction.

Spec: docs/pending-updates/Banker-Structuring-Output.md § 15.2.A
       (gating table rows for memo-section-writer + memo-final-synthesis)
Gate: G1.8 of 11

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Knowledge graph extraction grows a new Phase 1b that materializes banker
questions as first-class graph entities. This is the load-bearing data
foundation that makes Phase 2 (visualization) possible — and it ships in
Phase 1 even though no UI consumes it yet (per spec § 15.1 principle 3:
data integrity before visualization).

  kgPhases1to5.js
    + phase1b_questionNodes(pool, sessionId, evolutionLog, resolver):
      - Reads banker_intake report content; parses ## Q# blocks via
        regex (## Qn header anchored, body slice up to 500 chars)
      - Loads banker-qa-writer's metadata sidecar (reports.metadata of
        type banker_qa) for per-Q citation_ids + source_section_ids
      - Pulls research-plan.md and parses '- Q# → specialist1, specialist2'
        routing entries (case-insensitive multi-agent comma split)
      - For each parsed Q#, creates one node_type='question' node with
        canonical_key='question:Q#' and full provenance row
      - Emits three edge types (per spec § 15.2.E):
          question → agent           (edge_type='assigned_to')
          question → section         (edge_type='addressed_in')
          question → banker_qa node  (edge_type='consolidated_in')
      - Silently no-ops when banker_intake report absent (flag-off
        operation) — caller-level guard belt-and-suspenders with this
        function-internal guard

    + 'banker_qa' added to Phase 1 report allowlist:
        WHERE report_type IN ('section', 'specialist', 'banker_qa')
      Additive enum — zero behavior change when no banker_qa rows exist.

  knowledgeGraphExtractor.js
    + featureFlags import added
    + Phase 1b invocation wrapped in OTel span + try/catch + circuit
      breaker accounting
    + M3 gating: explicit `if (featureFlags.BANKER_QA_OUTPUT)` guard at
      the orchestration site — keeps the phase function flag-agnostic
      and concentrates the gating decision in one auditable location

  kgPhase10DealIntel.js
    + 'banker_qa' added to Phase 10 deal-intelligence enrichment corpus
      allowlist (line 676 area):
        WHERE report_type IN ('specialist','qa','review','synthesis','banker_qa')
      Lets the deal-intel enrichment absorb the banker companion
      artifact when present; additive (no behavior change without rows).

Per spec § 15.2.E: "Embedding chunks per question | Auto-covered.
chunkByHeaders() splits by `## ` headers; banker-qa doc with `## Q#:`
headers produces 15–20 per-question embeddings natively" — no edit
needed for embeddings.

Spec: docs/pending-updates/Banker-Structuring-Output.md § 15.2.E
Gate: G1.9 of 11

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tation scope

Five coordinated edits that extend the existing verification stack to
cover the banker companion artifact without modifying any Dim 0-11
behavior. All gates use mechanism M2 (artifact-existence) so the legacy
flag-off path is bit-identical to today.

  citation-validator.js
    + 'banker-question-answers.md' added to optionalInputs (read only
      when present; under flag-off the file never exists and the agent
      reads its standard inputs unchanged)

  citationSynthesis.js (countFootnotesAcrossSectionFiles)
    + SQL WHERE clause extended to OR report_type='banker_qa' — keeps
      the structural-truncation baseline aware of banker-doc citations
      so consolidated-footnotes doesn't trip false-positive truncation
      alarms when it legitimately grows to absorb banker citations

  scripts/pre-qa-validate.py
    + New check_banker_q_coverage(memo_path) function:
      - Reads banker-questions-presented.md (canonical Q list) +
        banker-question-answers.md (### Q#: blocks) sibling to memo
      - M2 gate: returns skipped=True with reason='no_banker_artifacts'
        when either file is absent → caller treats as PASS
      - When present: hard-fails on (a) missing ### Q#: block for any
        submitted question, (b) any block missing Answer/Because/
        Citations fields
    + 'banker_q_coverage' added to BLOCKING_CHECKS set
    + Check 9 wired into run_validation() AFTER existing Check 8;
      results.checks gets a 'Banker Q-Coverage (v6.14)' entry only when
      banker artifacts exist (M2: silent skip otherwise)

  memo-qa-diagnostic.js (preserves invariant I3 — Dims 0–11 unchanged)
    + DIMENSION 13 added after DIMENSION 11, BEFORE the RED FLAGS
      section:
      - Activation contract: fires ONLY when banker-question-answers.md
        exists (M2 file-existence gating in the prompt itself)
      - Inheritance by reference: "Apply Dimension 3's per-answer
        rubric (definitive verdict, mandatory because-clause, ≥1
        citation, section cross-reference) to EACH ### Q#: block"
        (preserves I10 — exactly ONE occurrence of this literal phrase)
      - Banker-specific checks: coverage % (100%), answer specificity
        % (≥80% non-Uncertain unless rationale), citation density
        (≥1 per answer), section-ref accuracy (resolves to actual
        section headers), prohibited-assumption compliance (per-rule
        penalty kept INSIDE Dim 13 only — never modifies Dims 0–11)
      - Hard threshold: Dim 13 < 85% blocks certification
    + Phase 2 dimension checklist (line 73 area) gains a 2.12 entry
      for Dim 13 (marked conditional)

  memo-qa-certifier.js
    + New Step 5b "Banker Q&A Hard-Fail Gate" inserted after Step 5:
      - Inert under flag-off (M2 — silent skip when banker-question-
        answers.md absent)
      - Under flag-on: force REJECT regardless of overall score if
        Dim 13 < 85%. A 92% overall with Dim 13 at 80% is still REJECT
        because the banker-facing artifact has not met its quality bar

Spec: docs/pending-updates/Banker-Structuring-Output.md § 15.2.F
       (verification layer — 6 components incl. Dim 13 inheritance)
Gate: G1.10 of 11

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Backend API + frontend label additions that complete the data contract
Phase 2 visualization will eventually consume. Built in Phase 1 even
though no UI renders the data yet (per spec § 15.1 principle 3: data
foundation before visualization).

  dbFrontendRouter.js
    + GET /api/db/sessions/:sessionKey/questions
      - Returns banker question list with per-Q summary metadata:
        question_id, question_text, category, assigned_specialists[],
        confidence, answered (bool), citation_count, edge counts
        (assigned_to / addressed_in / consolidated_in), created_at
      - Pulls KG question nodes (created by Phase 1b) + reports.metadata
        (banker_qa report's metadata column carries banker-qa-metadata.json)
      - Sessions with no banker_qa data return { questions: [], count: 0 }
        — endpoint inert under flag-off operation, no conditional logic

    + GET /api/db/sessions/:sessionKey/questions/:qid
      - Returns full per-question detail: question_text, answer_text,
        because, confidence, assigned_specialists, source_section_ids,
        citation_ids, remediation_cycles, KG provenance edges
        (assigned_to / addressed_in / consolidated_in target nodes)
      - 404 when the question_id isn't found in kg_nodes

    Both endpoints follow the existing router patterns (createDbFrontendRouter
    factory + getPool() check + parameterized SQL + try/catch error response).

  test/react-frontend/app.js
    + Three new entries in categoryLabels (rendered only when banker-mode
      artifacts exist in the session):
        'banker-intake'       -> 'Banker Questions Presented'
        'specialist-coverage' -> 'Specialist Coverage Report'
        'banker-qa'           -> 'Banker Q&A'
    + categoryOrder updated to place banker-qa (deliverable) first when
      present, followed by banker-intake and specialist-coverage; legacy
      categories preserved in their existing order
    + No force-graph / flow-graph changes (deferred to Phase 2 per spec
      § 15.3 — visualization scope explicitly excluded from Phase 1)

Per spec § 15.2.G: "These endpoints enable operator query, audit export,
and downstream tooling — and are the contract Phase 2 frontend code will
consume." Per spec § 15.2.I: "single label entry — not visualization."

Spec: docs/pending-updates/Banker-Structuring-Output.md § 15.2.G + § 15.2.I
Gate: G1.11 of 11 — COMPLETES Phase 1 build

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
All 11 G1 sub-steps shipped across 11 prior commits (b28ed75cb884b7).
Phase 1 of the Banker Q&A architecture is build-complete and ready for
Gate G2 zero-impact-when-off verification.

  G1.1 — BANKER_QA_OUTPUT feature flag declared (default false)
  G1.2 — three sibling subagent definition files
  G1.3 — three capability prompt constants (~800 prompt lines total)
  G1.4 — agent registry + classification (5 files)
  G1.5 — hookDBBridge persistence wiring (4 maps × 3 agents)
  G1.6 — intake dispatcher + M1 system-prompt flag injection
  G1.7 — orchestrator phases G0.5 / G2.5 / G3.5 / G6 + protocol section
  G1.8 — M2 artifact-existence prompt branches (section-writer +
         final-synthesis)
  G1.9 — KG Phase 1b question nodes + edges + Phase 1/10 allowlists
  G1.10 — verification layer (Dim 13 + Q-coverage gate + citation scope
          + certifier hard-fail)
  G1.11 — banker API endpoints + Reports modal categoryLabels

Symmetric three-agent architecture realized:
  banker-intake-analyst (FRONT, G0.5)
  banker-specialist-coverage-validator (MID Wave 1.5, G3.5)
  banker-qa-writer (BACK, G6)

Implementation footprint (matches spec § 15.5):
  ~830 LoC + ~860 prompt lines
  25 files touched (3 new + 22 modified) — within spec's ~27 envelope
  0 DB migrations
  0 changes to compliance machinery
  0 modifications to 25 specialist agents
  0 modifications to 6 synthesis prompts
  0 modifications to Dims 0-11 of memo-qa-diagnostic
  0 modifications to memo-executive-summary-writer.js (I1, byte-identical)
  0 modifications to promptEnhancer.js (I7, byte-identical)

All 10 invariants verified by deep audit:
  I1  ✓ memo-executive-summary-writer.js byte-identical (diff = 0)
  I2  ✓ zero banker references in the exec summary writer
  I3  ✓ Dims 0-11 unchanged; only Dim 13 added (M2-gated)
  I4  ✓ CREAC structure rules unchanged in section-writer
  I5  ✓ flag-off path produces zero banker_* rows (M3 dispatch gating)
  I6  ✓ compliance auto-attaches (schema-agnostic table targets)
  I7  ✓ promptEnhancer.js byte-identical (diff = 0)
  I8  ✓ flag-off path produces zero banker-agent SubagentStart events
  I9  ✓ G3.5 strictly precedes memo-section-writer per orchestrator phase ordering
  I10 ✓ Dim 13 contains exactly one "Apply Dimension 3's per-answer rubric"
       directive AND zero duplicated rubric copies

Gating discipline (35 load-bearing files):
  Zero ad-hoc `if (BANKER_QA_OUTPUT)` checks in load-bearing files.
  All flag awareness confined to:
    - featureFlags.js (declaration)
    - flags.env (operational default)
    - agentStreamHandler.js (intake dispatcher + M1 injection)
    - knowledgeGraphExtractor.js (Phase 1b M3 dispatch)
  Subagent prompts gate via M2 (file-existence) only — never read flag.

Spec: docs/pending-updates/Banker-Structuring-Output.md § 15 + § 16.1
Gate: G1 COMPLETE — ready for G2 zero-impact-when-off regression

Next gate (G2) requires live infrastructure:
  - Replay March 31 gold-standard session with BANKER_QA_OUTPUT=false
  - Verify executive-summary.md SHA matches baseline
  - Verify kg_nodes / kg_edges / report_embeddings counts within ±2%
  - Verify zero rows in reports table with banker_intake / banker_qa /
    specialist_coverage report types
  - Verify zero SubagentStart events for the three new agents

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
scripts/g2-regression.sh — operator-runnable verification script implementing
every check from spec § 16.2 Gate G2. Composed of four sections:

  A. Static invariants (I1, I2, I3, I4, I7, I10) — repo-only checks via
     git diff and grep; no DB or replay needed. Validates byte-identical
     load-bearing files (memo-executive-summary-writer.js, promptEnhancer.js),
     additive-only modifications (memo-section-writer.js zero deletions;
     memo-qa-diagnostic.js ≤1 deletion for the cosmetic tree-glyph swap),
     and Dim 13 inheritance-by-reference discipline (exactly one
     "Apply Dimension 3's per-answer rubric" directive; zero duplicated
     copies of Dim 3's 5-row scoring table).

  B. Gating discipline — greps src/ and prompts/ for any code-level
     featureFlags.BANKER_QA_OUTPUT reads outside the 3-file allow-list
     (featureFlags.js declaration; agentStreamHandler.js intake dispatcher
     + M1 injection; knowledgeGraphExtractor.js Phase 1b M3 guard).
     Confirms zero process.env.BANKER_QA_OUTPUT reads in any subagent
     prompt file.

  C. Module-load smoke — when node_modules is present, imports the
     feature flag module, the subagent registry, all three new banker
     agent files, and hookDBBridgeConfig.js exports. Runs 17 in-process
     assertions covering flag default, registry membership, agent prompt
     lengths, and DB config maps (VALID_REPORT_TYPES, AGENT_TYPE_MATCHERS,
     STATE_FILE_MAP).

  D. Live regression (requires DATABASE_URL + baseline session):
     - I5: zero banker_* report rows on the baseline (flag-off) session
     - I6: access_log rows still present on baseline (compliance machinery
           unaffected)
     - I8: zero banker-agent SubagentStart events on baseline
     - Gold-standard SHA byte-match for executive-summary.md vs
       test/sdk/baselines.json entry
     - kg_nodes / kg_edges / report_embeddings counts within ±2% of
       baseline
     - I9 (when --banker-session=KEY supplied): banker-specialist-
       coverage-validator SubagentStop strictly precedes memo-section-
       writer SubagentStart on a banker-mode session

Modes:
  --static-only             Skip section D (no DB required)
  --baseline=KEY            Override default baseline session key
  --banker-session=KEY      Enable I9 check against a banker-mode session

Exit codes:
  0 — all G2 checks pass (proceed to G3 staging smoke)
  1 — one or more G2 checks failed (HARD FAIL: do not proceed)
  2 — script error

Local execution today (worktree, --static-only with node_modules symlinked):
  10/10 PASS, 0 failures, 1 skip (Section D — needs staging DB).

Spec: docs/pending-updates/Banker-Structuring-Output.md § 16.2 (Gate G2)
Gate: G2.1 of 2 (G2.2 = runbook with operator instructions)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
docs/runbooks/g2-zero-impact-verification.md captures the G2 gate's
purpose, the three-layer structure (static / gating / live), the static-
layer results executed today (10/10 PASS), the operator checklist for
running the live layer on staging, and the failure-handling protocol
mapped per-invariant.

Static layer results recorded:
  I1   PASS — memo-executive-summary-writer.js byte-identical to main
  I2   PASS — zero banker references in writer
  I3   PASS — Dims 0-11 untouched (1 cosmetic tree-glyph deletion)
  I4   PASS — memo-section-writer.js purely additive (0 deletions)
  I7   PASS — promptEnhancer.js byte-identical to main
  I10a PASS — exactly 1 "Apply Dimension 3's per-answer rubric" directive
  I10b PASS — zero Dim 3 rubric duplicates in Dim 13 block
  Gating-A PASS — only 3 allow-listed files read featureFlags.BANKER_QA_OUTPUT
  Gating-B PASS — zero process.env reads in subagent prompt files
  Module-load PASS — 17/17 in-process assertions

Operator next-steps documented for the live layer (I5, I6, I8, I9 +
gold-standard SHA + KG/embedding count comparisons) — requires staging
DB + replay capability and is bound to the existing baselines.json
schema convention.

Failure-handling protocol per spec § 16.2 HARD FAIL ACTION: if any check
fails, do not proceed; locate and remove the behavioral fork before any
further work on Banker Q&A.

Next gate (G3) preconditions enumerated:
  - G2 static PASS (this runbook)
  - G2 live PASS (operator-executed on staging)
  - Staging deploy with BANKER_QA_OUTPUT=false in flags.env
  - Three synthetic banker prompts drafted (PE / merger / distressed)
  - BANKER_QA_OUTPUT=true set in staging shell only (uncommitted)

Spec: docs/pending-updates/Banker-Structuring-Output.md § 16.2 (Gate G2)
Gate: G2.2 of 2 — completes G2 static-layer artifacts

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…on on staging

G2.1 (regression script) + G2.2 (runbook) shipped. The G2 static layer
runs entirely in this repo with zero infrastructure and proves the most
load-bearing properties before any staging spend.

Static results (executed 2026-05-21):
  10/10 PASS · 0 fail · 1 skip (live layer — needs staging DB)

  Static invariants:
    I1, I2, I3, I4, I7, I10a, I10b — all pass

  Gating discipline:
    - Code-level featureFlags.BANKER_QA_OUTPUT reads exist only in
      featureFlags.js (declaration), agentStreamHandler.js (intake
      dispatcher + M1 system-prompt injection), and
      knowledgeGraphExtractor.js (Phase 1b M3 guard).
    - Zero process.env.BANKER_QA_OUTPUT reads in subagent prompts.
    - All gating routes through M1 (system prompt), M2 (artifact-
      existence), or M3 (orchestrator dispatch) as the spec requires.

  Module-load smoke (with node_modules symlinked):
    17/17 in-process assertions pass — feature flag boolean,
    subagent registry membership, agent file imports + prompt lengths,
    hookDBBridgeConfig.js maps (VALID_REPORT_TYPES, AGENT_TYPE_MATCHERS,
    STATE_FILE_MAP).

Static layer artifacts:
  scripts/g2-regression.sh                          (orchestrator)
  docs/runbooks/g2-zero-impact-verification.md     (operator runbook)

Live layer is bound to staging Postgres + the gold-standard session
replay capability. The runbook documents the exact queries (I5/I6/I8/I9)
and the SHA + ±2% count comparisons against test/sdk/baselines.json.

Next: operator runs `bash scripts/g2-regression.sh` on staging once
the v6.14 branch is deployed there with BANKER_QA_OUTPUT=false; per
spec § 16.2, G3 cannot proceed until live G2 passes.

Spec: docs/pending-updates/Banker-Structuring-Output.md § 16.2
Gate: G2 static layer COMPLETE — awaiting operator-run live layer

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The final G2 audit identified five gaps against spec § 16.2 "Gold-standard
regression" + "Smoke tests" checklists. All five are closed in this
commit:

  F1 (medium) — flags.env BANKER_QA_OUTPUT default check
    Static section now verifies the committed flags.env contains the
    literal 'BANKER_QA_OUTPUT=false'. Catches the foot-gun where an
    operator flips the default in the committed file and pushes — which
    would quietly enable banker mode on every deploy.

  F2 (CRITICAL) — final-memorandum.md word count ±2%
    Live section D now reads reports/<session>/final-memorandum.md and
    compares wc -w against baselines.json sessions[<key>].final_memorandum_words
    using the same ±2% tolerance pattern as the KG count checks. This was
    explicitly required by spec § 16.2 but absent from the original script.

  F3 (CRITICAL) — QA Dim 0-11 scores ±1 point
    Live section D now parses reports/<session>/qa-outputs/diagnostic-
    assessment.md for each Dim 0-11 score (permissive regex matching
    common diagnostic formats) and compares against baselines.json
    sessions[<key>].qa_dim_scores.dim_N. Skips gracefully when the
    baseline entry is absent. Required by spec § 16.2 (verified-against-
    baseline list).

  F4 (medium) — zero banker-* files in flag-off session dir
    Filesystem-level invariant complementing the SQL I5 check. find -name
    'banker-*' returns zero matches for any flag-off session. When the
    SQL I5 passes but a banker-* file exists on disk, the filesystem
    check catches a desync between filesystem write and DB INSERT.
    Required by spec § 16.2 ("No new files in session dir matching
    banker-*").

  F5 (low) — branch sanity check
    Static section refuses to run when HEAD = main OR diff stat against
    main = 0. Prevents the foot-gun of running G2 on a checkout that
    has no v6.14 changes to verify, which would trivially pass every
    invariant.

Runbook updates:
  - Result table extended from 10 to 12 PASS checks
  - Section D.2 documents F2 + F3 + F4 + the expected baselines.json
    schema (executive_summary_sha256, final_memorandum_words, kg_nodes,
    kg_edges, report_embeddings, qa_dim_scores.dim_0..11)
  - Adjustment note for the QA Dim parsing regex if the local
    diagnostic-assessment.md format differs

Static re-run (post-remediation):
  12/12 PASS, 0 fail, 1 skip (live layer unchanged)

Spec: docs/pending-updates/Banker-Structuring-Output.md § 16.2
Gate: G2.3 — closes spec-adherence gaps F1-F5

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
test/banker-qa/prompt-1-pe-buyout.md — the first of three synthetic banker
prompts required by spec § 16.3 G3 staging smoke test.

Deal context:
  - Target:   Stratosphere Analytics, Inc. (NASDAQ: STRA) — B2B SaaS,
              predictive supply-chain analytics, 1,240 employees,
              ~$420M ARR (28% YoY), 78% gross margin, 41% customer
              concentration across top 3 customers, EV ~$4.1B
  - Acquirer: Argonaut Capital Partners VIII, L.P. (PE)
  - Structure: all-cash take-private LBO; 32% premium to 60-day VWAP;
              stapled financing from Goldman / JPM
  - Q3 2026 expected announcement; 5–7 year hold; exit via secondary
              or IPO
  - Multi-jurisdiction footprint: Delaware HQ + Boston/Toronto/Bengaluru
              engineering hubs

What this prompt EXERCISES:

  1. banker-intake-analyst verbatim-Q preservation discipline:
     15 numbered questions covering antitrust, CFIUS, IP, GDPR, §280G
     golden parachutes, SEC Rule 13e-3, open-source license obligations,
     SOC 2, ASC 805 earnouts, WARN Act, Calif Labor Code §2802, etc.
     The agent MUST preserve all 15 verbatim — no rephrasing, no
     merging, no two-part-question splits (per spec invariant on
     banker-questions-presented.md verbatim rule).

  2. Sector-scaffold graceful degradation:
     B2B SaaS / enterprise software has NO Cardinal-blueprint sector
     scaffold authored in v6.14 (utility M&A is the only fully-authored
     scaffold). The agent should set
     `banker-deal-context.json.sector.scaffold_loaded = false` and
     proceed with sector-generic framing — NOT hard-halt. This validates
     the spec § 15.2.B graceful-degradation contract.

  3. Default client archetype + clarification flag:
     The prompt provides no explicit client perspective (PE seller,
     LP holder, target shareholder, regulator, etc.). The agent should
     default to "Institutional Holder" AND set
     `client_archetype.default_applied = true` AND
     `client_archetype.clarification_required = true` per spec § 15.2.B
     Cardinal client-archetype matrix.

  4. Null acquirer failure modes:
     Argonaut Capital Partners has no documented failed-merger history.
     `acquirer_failure_modes_loaded` should be `null` (not an empty
     array, not a populated array with fabricated entries).

Verification: operator runs scripts/g3-verification.sh <session_key>
--expected-questions=15 after the run completes. All 21 per-run checks
plus 3 smoke tests should pass.

Spec: docs/pending-updates/Banker-Structuring-Output.md § 16.3 Gate G3
Gate: G3.1 of 7

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…uestions)

test/banker-qa/prompt-2-strategic-merger.md — regulated electric utility
merger exercise. This is the highest-coverage prompt of the three because
v6.14 ships substantive utility-M&A sector scaffold content adopted from
Cardinal Framing Layer v2.0 (spec § 15.2.B W1 implementer note). The
banker-intake-analyst MUST load + apply that scaffold here.

Deal context:
  - Target:    Pacific Crest Utilities, Inc. (NYSE: PCU) — investor-
               owned regulated electric utility; 2.4M retail customers
               in Oregon + Washington; 4.2 GW portfolio (52% gas CC,
               28% utility-scale solar+storage, 15% federal hydro,
               5% retiring coal); 1.1 GW Columbia Falls nuclear
               (NRC license expiring 2038)
  - Acquirer:  NextEra Energy, Inc. (NYSE: NEE)
  - Structure: all-stock strategic merger; fixed 1.18 NEE per PCU;
               24% premium; EV ~$18.4B; announced 2026-04-22; target
               close Q3 2027
  - Approvals: FERC § 203, OR PUC, WA UTC, NRC license transfer
               (10 CFR 50.80), Hart-Scott-Rodino
  - Hyperscaler contract: 15-year, 1.8 GW data-center load with
               Helios Cloud Services (top-3 hyperscaler) announced
               2025-11-04; 600 MW sited behind-the-meter at Columbia
               Falls nuclear
  - Acquirer history: NEE's prior failed acquisitions of Hawaiian
               Electric (2016 withdrawn) and Oncor (2017 blocked on
               FOCD grounds) — Cardinal blueprint specifically calls
               these out as load-bearing acquirer-failure-mode context
  - Client:    institutional holder representing 6.4% of PCU
               (perspective stated; archetype default should NOT fire)

What this prompt EXERCISES:

  1. Utility M&A sector scaffold load:
     Spec § 15.2.B Cardinal blueprint specifies FERC § 203 four-factor
     framework, state PUC matrix (named-commissioner political map +
     rate-case calendar + statutory standard + prior conditions +
     commitment expectations), NRC license transfer (10 CFR 50.33(f),
     50.42, FOCD), hold-harmless + ring-fencing standards (5-year FERC
     standard), hyperscaler concentration analysis when >10 GW
     pipeline. The agent should set
     `banker-deal-context.json.sector.scaffold_loaded = true` and
     populate the deal-context with utility-specific framing fields.

  2. Acquirer failure-mode context population:
     Per Cardinal blueprint § 5, when the named acquirer has documented
     failed-merger history (NEE: Hawaiian Electric 2016, Oncor 2017),
     extract structural failure-mode patterns into
     `banker-deal-context.json.acquirer_failure_modes_loaded`. This
     field is non-null on this prompt — if it's null, the
     Cardinal-blueprint adoption is incomplete.

  3. Multi-jurisdiction extraction:
     `jurisdictions` array should include US-federal (FERC, NRC) plus
     Oregon + Washington (state PUCs). 18 questions span all four
     jurisdictions plus tax (IRA), antitrust (HSR), and SEC disclosure.

  4. Hyperscaler load contestability context:
     Spec § 15.2.B Cardinal blueprint specifically calls out hyperscaler
     concentration analysis when >10 GW pipeline. The Helios 1.8 GW
     contract sits below that threshold but the behind-the-meter
     nuclear arrangement (600 MW) is precedent-setting. Several Qs
     test this surface area.

  5. Client archetype = stated (NOT defaulted):
     The prompt explicitly states institutional holder perspective.
     `client_archetype.default_applied` should be `false` and
     `client_archetype.archetype` = "Institutional Holder".

Verification: scripts/g3-verification.sh <session_key>
--expected-questions=18. All 21 per-run checks + 3 smoke tests should
pass. The operator should additionally spot-check banker-deal-context.json
for `sector.scaffold_loaded = true` and the failure-mode field
populated — these are the spec-blueprint-critical fields for prompt #2.

Spec: docs/pending-updates/Banker-Structuring-Output.md § 16.3 Gate G3
       + § 15.2.B Cardinal Framing Layer adoption
Gate: G3.2 of 7

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… (12 Qs)

test/banker-qa/prompt-3-distressed-acquisition.md — Chapter 11 § 363 sale
diligence exercise. Tests the deal-stage classification path (post-petition,
pre-close) and validates graceful sector-scaffold degradation in a second
domain (industrial manufacturing) distinct from prompt #1 (B2B SaaS).

Deal context:
  - Target:    Meridian Industrial Holdings, Inc. (Ch. 11 debtor,
               Case No. 26-10473, Bankr. D. Del., filed 2026-02-14)
               14 specialty-metals fabrication plants across
               PA/OH/IN/MI/Ontario; aerospace/defense/energy supplier
               (incl. F-35 forgings); $1.1B FY25 revenue pre-petition
  - Acquirer:  Cyclone Distressed Partners IV, L.P. (distressed-debt
               fund; holds $190M of debtor's $620M first-lien loan
               at avg 68¢ acquisition price; intends to credit-bid
               under § 363(k))
  - Structure: 363 stalking-horse bid — $480M cash + $115M assumed
               secured debt + $42M assumed cure costs; bid procedures
               2026-06-03; auction 2026-07-15
  - Key surface area: DCSA facility clearances (3 plants), CGP
               (Brampton Ontario), F-35 supply contracts (§ 365
               assumability), Steelworkers CBAs (§ 1113), CERCLA/RCRA
               environmental at Lima OH + Marion IN, In re Fisker
               credit-bid capping risk

What this prompt EXERCISES:

  1. Deal-stage classification on bankruptcy-adjacent transactions:
     The deal is post-Chapter-11-filing but pre-sale-closing. The
     `banker-deal-context.json.deal_stage` field should classify as
     `pre_close` OR `failed_abandoned` — either is acceptable per
     spec § 15.2.B enum schema; the agent's judgment call.

  2. Graceful sector-scaffold degradation (second domain):
     Industrial manufacturing has no Cardinal-blueprint sector
     scaffold authored in v6.14. The agent should set
     `banker-deal-context.json.sector.scaffold_loaded = false` and
     proceed with sector-generic framing (mirrors prompt #1's
     SaaS-domain behavior). Validates the spec § 15.2.B
     graceful-degradation contract works across distinct domains.

  3. Distressed-purchaser client archetype:
     Prompt explicitly identifies Cyclone as a distressed-debt
     purchaser. The archetype should reflect the
     "Credit-Fixed Income Holder" or "Strategic Counterparty"
     classification from the Cardinal matrix.

  4. Null acquirer failure modes:
     Cyclone has no documented failed-deal history.
     `acquirer_failure_modes_loaded` should be `null`.

  5. Bankruptcy-law nuance Q reasoning:
     12 questions cover § 363(k) credit-bid mechanics (In re Fisker
     precedent), § 1113 CBA modification, § 363(b) tax basis,
     § 365 executory contract assumption, DCSA / CGP / DoD prime
     considerations, environmental compliance, WARN/mini-WARN
     successor liability. Higher domain complexity → `Uncertain`
     verdict rate may legitimately exceed 20% (Smoke 3's default
     threshold). The runbook documents the operator should accept
     20–30% Uncertain rate on this prompt as a soft pass rather
     than a hard fail.

  6. Smallest Q count of the three (12):
     Tests the agent handles lower-volume prompt structures
     without falling back to inferred questions. The
     banker-questions-presented.md output should have exactly
     12 ## Q# blocks — no more, no less.

Verification: scripts/g3-verification.sh <session_key>
--expected-questions=12. All 21 per-run checks + 3 smoke tests should
pass. Smoke 3 (Uncertain rate) is the most likely "soft warning" item
on this prompt — operator judgment per runbook § 3 Step 5.

Spec: docs/pending-updates/Banker-Structuring-Output.md § 16.3 Gate G3
       + § 15.2.B graceful-degradation contract
Gate: G3.3 of 7

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
scripts/g3-verification.sh — operator-runnable per-session verification
encoding every § 16.3 per-run checklist item and smoke test as concrete
SQL / jq / grep / curl assertions.

Usage:
  bash scripts/g3-verification.sh <session_key> --expected-questions=<N>

Required env:
  DATABASE_URL       (staging Postgres URL)
  STAGING_BASE_URL   (defaults to http://localhost:8080)
  REPORTS_ROOT       (defaults to ./reports)

Coverage map (spec § 16.3 line → script section):

  Section A — Hook lifecycle:
    Check 1   banker-intake-analyst SubagentStart count == 1
    Check 4   distinct specialist SubagentStop count ≥ 3
    Check 5   banker-specialist-coverage-validator fires ≥ 1×
    Check 9   I9 — coverage-validator SubagentStop strictly before
              memo-section-writer SubagentStart (verbatim spec CTE)
    Check 10  banker-qa-writer SubagentStart count == 1

  Section B — Intake artifacts:
    Check 2   banker-questions-presented.md has N ## Q# blocks
    Check 3   banker-deal-context.json has target/acquirer/structure
              + non-empty jurisdictions array

  Section C — Coverage validator artifacts:
    Check 6   specialist-coverage-report.md + specialist-coverage-state.json
              both exist on disk
    Check 7   per_question array length == N AND every status ∈
              {PASS, REMEDIATE, ACCEPT_UNCERTAIN}
    Check 8   remediation_cycles ≤ 2 AND zero unresolved REMEDIATE

  Section D — Output artifacts:
    Check 11  banker-question-answers.md has N ### Q#: blocks
    Check 12  every Q has Answer + Because + Citations field
    Check 13  ACCEPT_UNCERTAIN Qs render with rationale in answers doc
    Check 14  banker-qa-metadata.json parses + .questions length == N

  Section E — KG + embeddings:
    Check 15  KG node_type='question' count == N
    Check 16  KG edges (assigned_to + addressed_in + consolidated_in)
              count ≥ 2N
    Check 17  banker_qa report_embeddings count ≥ N

  Section F — Downstream verification:
    Check 18  citation-validator status ∈ {PASS, PASS_WITH_EXCEPTIONS}
    Check 19  pre-qa-validate.py banker_q_coverage passed
    Check 20  Dim 13 score ≥ 85% (parsed from
              qa-outputs/diagnostic-assessment.md)
    Check 21  memo-qa-certifier decision ∈ {CERTIFY, CERTIFY_WITH_LIMITATIONS}

  Section G — Smoke tests (verbatim spec § 16.3):
    Smoke 1   combined-SQL: question_nodes == N, question_edges ≥ 2N,
              banker_reports == 1, banker_embeddings ≥ N
    Smoke 2   curl /api/db/sessions/<key>/questions → .questions length == N
    Smoke 3   jq confidence distribution; Uncertain count < 20% of total

Failure handling: emits failed-check list with spec section pointer; exit
code 1 triggers re-run after operator iterates per
docs/runbooks/g3-staging-smoke.md § 5 triage matrix. Bash strict mode
(set -uo pipefail); colored output for visual scan; skipped checks track
prerequisites missing rather than masking failures.

Local syntax check (bash -n): PASS. Usage banner verified.

Spec: docs/pending-updates/Banker-Structuring-Output.md § 16.3
Gate: G3.4 of 7

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
docs/runbooks/g3-staging-smoke.md — end-to-end operator workflow for the
G3 gate, mapping every spec § 16.3 line item to a concrete step in the
operator's execution sequence.

Runbook sections:

  1. Purpose — spec context + role of G3 in the rollout chain
  2. Synthetic prompt artifacts — table mapping each prompt file to
     the deal it exercises + Q count + the specific spec invariants
     each prompt tests
  3. Operator workflow (6 steps):
     - Pre-flight: G2 PASS confirmed, BANKER_QA_OUTPUT=false in
       committed flags.env, branch deployed, /health green
     - Enable banker mode in staging shell only (with explicit
       foot-gun warning: do NOT commit flag flip; flip is per-shell
       per-run ephemeral)
     - Run prompt #1 (PE buyout, 15 Qs) + verify with
       scripts/g3-verification.sh
     - Run prompt #2 (strategic merger, 18 Qs) — highest-coverage:
       operator spot-checks utility sector scaffold load + acquirer
       failure-mode population per Cardinal blueprint
     - Run prompt #3 (distressed acquisition, 12 Qs) — bankruptcy
       nuance acceptable Uncertain rate 20–30% as soft warning
     - Cleanup: unset BANKER_QA_OUTPUT
  4. Pass criteria — all 3 invocations exit 0 + G3 PER-RUN PASS
  5. Failure-handling protocol — 13-row triage matrix mapping each
     potentially-failed check to the specific prompt/code site
     to inspect:
       Check 1  → orchestrator G0.5 dispatch + agentStreamHandler intake
       Check 2  → banker-intake-analyst verbatim-Q rule
       Check 3  → banker-intake-analyst deal-context extraction
       Check 5-8 → coverage validator prompt + orchestrator G3.5
       Check 9  → orchestrator I9 enforcement
       Check 10-14 → banker-qa-writer output schema
       Check 15-17 → KG Phase 1b + featureFlags import
       Check 18 → citation-validator optionalInputs
       Check 19 → pre-qa-validate.py banker_q_coverage gate
       Check 20 → Dim 13 prompt scoring
       Check 21 → certifier Step 5b hard-fail
       Smoke 1-3 → root causes from above
  6. Recovery + re-run discipline — fixes happen in worktree (NOT
     in-place on staging); every fix is a commit traceable to PR review
  7. Roll-up decision — record session_keys + key metrics + advance to G4
  8. Execution log (append-only template) — 3-row table operator
     populates post-staging-run with date / session_key / Dim 13 / certifier
     verdict / notes per prompt

Spec: docs/pending-updates/Banker-Structuring-Output.md § 16.3
Gate: G3.5 of 7

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
docs/runbooks/g3-spec-mapping.md — gap-check document proving every spec
§ 16.3 line item maps to a concrete worktree artifact. Used to confirm
G3 implementation is complete before operator staging execution.

Mapping table coverage:

  Section A. Setup checklist                   5/5  items mapped
  Section B. Per-run verification             21/21 items mapped
  Section C. Smoke tests                       3/3  items mapped
  Section D. Pass criteria + failure handling  2/2  items mapped
  ──────────────────────────────────────────────────────────────
  Total                                       31/31 — ZERO gaps

Each row identifies:
  - The verbatim spec § 16.3 line
  - The artifact in the worktree that implements it (file path +
    section + encoding detail)
  - PASS / DELIVERED / DOCUMENTED status

Section F enumerates the three categories of G3 work that cannot run
from the worktree alone (require staging server + Postgres):
  1. Submitting the prompts to the running server
  2. Running the live pipeline end-to-end
  3. Validating live SQL/file outcomes

The worktree provides every artifact needed to execute these three
categories on staging and produce a binary pass/fail outcome. No
further worktree-side artifacts are blocking G3.

Spec: docs/pending-updates/Banker-Structuring-Output.md § 16.3
Gate: G3.6 of 7

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
G3.1 through G3.6 shipped across the prior 6 commits. The worktree now
contains every artifact spec § 16.3 requires for the staging smoke test:

  Synthetic banker prompts (3):
    test/banker-qa/prompt-1-pe-buyout.md            (15 Qs, B2B SaaS LBO)
    test/banker-qa/prompt-2-strategic-merger.md     (18 Qs, utility merger)
    test/banker-qa/prompt-3-distressed-acquisition.md (12 Qs, Ch.11 363 sale)

  Verification script:
    scripts/g3-verification.sh — 21 per-run checks + 3 smoke tests as
    runnable SQL / jq / grep / curl assertions; operator runs once per
    prompt with --expected-questions=15/18/12.

  Runbooks:
    docs/runbooks/g3-staging-smoke.md   — 8-section operator workflow
    docs/runbooks/g3-spec-mapping.md    — 31/31 spec items mapped table

Coverage verification:
  Setup checklist:                5/5  mapped
  Per-run verification (21):     21/21 mapped
  Smoke tests:                    3/3  mapped (verbatim spec queries)
  Pass criteria + failure-mode:   2/2  mapped
  ─────────────────────────────────────────────
  TOTAL                          31/31 — zero gaps

Spec-blueprint validation included:
  - Prompt #2 specifically exercises the utility-M&A sector scaffold +
    acquirer-failure-mode context adopted from Cardinal Framing Layer
    v2.0 (spec § 15.2.B W1 implementer note). If sector.scaffold_loaded
    fails to set true OR acquirer_failure_modes_loaded is null on this
    run, the Cardinal-blueprint adoption is incomplete and needs
    iteration before G3 can PASS.
  - Prompts #1 + #3 specifically exercise graceful sector-scaffold
    degradation in two distinct domains (B2B SaaS + industrial
    manufacturing). If the agent hard-halts instead of degrading,
    the spec § 15.2.B graceful-degradation contract is violated.

What G3 cannot verify from the worktree alone:
  1. Submitting prompts to a running staging server
  2. Running the live pipeline end-to-end
  3. Validating live SQL/file outcomes

These three categories are operator-driven and require staging infra.
The runbook documents the exact 6-step workflow + the 13-row failure
triage matrix.

When all three prompts PASS via scripts/g3-verification.sh:
  - Record session_keys + Dim 13 scores + certifier verdicts in
    docs/runbooks/g3-staging-smoke.md § 8 execution log
  - Mark G3 complete in GitHub Issue #177
  - Advance to G4 (pre-pilot operational readiness)

Spec: docs/pending-updates/Banker-Structuring-Output.md § 16.3 Gate G3
Gate: G3 worktree COMPLETE — awaiting operator-run staging execution

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
docs/runbooks/g5-pilot-pre-flight.md — first of seven G5 worktree
artifacts. Covers all four pre-pilot spec § 16.5 checklist items + the
six hard preconditions gating G5 execution.

Mapped spec items:
  1. Pilot client identified, contract terms confirm permission
     - 6-criterion rubric reference (see G5.2 selection runbook)
     - MSA/sideletter review prompts (data-use clause, QA framework
       clause, NDA provisions)
     - Single point of accountability — named banker
     - Authority to certify check (MD escalation path)
  2. Pilot client's deal context loaded (15–20 banker Qs)
     - Question count bound enforced (15 ≤ N ≤ 20)
     - Deal context paragraph minimum (target / acquirer / structure /
       premium / EV / jurisdictions / announcement timing)
     - Question hygiene pre-screen (DO NOT silently edit; surface to
       banker for refinement)
     - Confidentiality posture confirmation
  3. Banker briefed on what to expect
     - Briefing document delivery (G5.3 artifact)
     - Receipt confirmation requirement
     - Synthetic-sample share for shape preview
  4. Banker briefed on feedback structure
     - Review template delivery (G5.4 artifact)
     - Readiness confirmation
     - Session scheduling
     - Recording posture agreement

Hard preconditions enumerated (6):
  - G2 PASS on staging
  - G3 PASS on staging (3 synthetic runs)
  - G4 PASS on staging (pending — cross-gate dependency)
  - flags.env still ships BANKER_QA_OUTPUT=false in deployed branch
  - Rollback playbook tested at least once on staging
  - client-provisioner --update-flag --dry-run succeeds

Output deliverable: G5 PRE-FLIGHT REPORT with pilot client identifier,
named banker, deal context summary, briefing confirmation timestamps,
review session schedule. This report is the input artifact for the
during-pilot phase.

Spec: docs/pending-updates/Banker-Structuring-Output.md § 16.5
       pre-pilot checklist
Gate: G5.1 of 7

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
docs/runbooks/g5-pilot-client-selection.md — six-criterion binary
selection rubric for identifying the first M&A/IB pilot client.

Why this matters per spec § 16.5 risk model:
The first M&A/IB client to see banker mode is a load-bearing choice.
A poor first pilot produces a false-negative REGRESSION_VS_TODAY verdict
driven by client-fit rather than product quality; a risky pilot produces
reputational damage. Each criterion scores 0/1; candidate must score 6/6
to be the pilot, ≥5/6 to be the alternate, ties broken by Criterion 6
(engagement timing).

Six binary criteria:
  1. Workflow fit — primarily M&A/IB advisory (not pure legal advisory),
     with ≥60% Q-driven session volume in last 90 days
  2. Relationship + risk tolerance — opted into beta features OR low-risk
     MSA OR previously communicated iteration tolerance
  3. Authority depth — named banker has certify-rights AND daily contact
     with the consuming deal team (no MD escalation required)
  4. Engagement readiness — active deal with 15-20 structured Qs in
     flight OR scheduled within 2 weeks
  5. Confidentiality posture compatible with post-pilot review —
     post-announce OR pre-announce-NDA-cleared with Aperture
  6. Engagement timing within W3 pilot window

Worked hypothetical example included: Acme Capital (6/6 → PILOT) vs.
Brunswick & Wells (5/6 → ALTERNATE because risk-tolerance gap).

Deliverable: signed PILOT CLIENT SELECTION MEMO addressed to engineering
+ GTM containing the score sheets, named banker, confirmed engagement,
confidentiality posture, and any sideletter PR reference. GTM lead +
engineering lead sign-off triggers pre-pilot checklist item 1 completion.

Spec: docs/pending-updates/Banker-Structuring-Output.md § 15.6 W3 +
       § 16.5 pre-pilot checklist item 1
Gate: G5.2 of 7

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
docs/runbooks/g5-banker-briefing.md — the document the pilot banker
receives ≥48 hours before their session is submitted. Banker-facing
prose (not engineer-facing); explains what they will receive, how to
read it, what feedback they will be asked for.

Document sections:

  1. What you'll receive (deliverable inventory)
     - 2 existing files (executive-summary.md + final-memorandum.md) —
       unchanged in v6.14
     - 2 new files (banker-questions-presented.md +
       banker-question-answers.md) — companion artifacts

  2. How the new artifacts relate to existing deliverables
     - Same underlying research, citations, reasoning
     - Same quality bar (Dim 13 enforces 85% threshold)
     - Cross-references are bidirectional (Q → section refs → footnotes)
     - banker-questions-presented.md is the immutable verbatim record

  3. Recommended reading order (~15-25 min thorough review)
     - Step 1: banker-questions-presented.md to confirm verbatim
     - Step 2: exec summary + banker doc side-by-side for consistency
     - Step 3: drill into Section IV citations for highest-value Qs
     - Step 4: spot-check 2-3 citation IDs in consolidated footnotes

  4. Feedback you'll be asked for (advance notice of 7 questions)
     - D1 verbatim Q preservation
     - D2 deal context accuracy
     - D3 answer depth
     - D4 citation appropriateness
     - D5 confidence calibration
     - D6 uncertain rationale
     - D7 overall verdict (SHIP-WORTHY / NEEDS_ITERATION / REGRESSION)

  5. What we will NOT ask
     - Won't grade Section IV (unchanged from existing memos)
     - Won't ask the banker to redesign the deliverable format
     - Won't record without explicit consent

  6. Logistics
     - ≥60 min session, within 5 business days of receipt
     - Video or in-person
     - Banker + Super-Legal product engineer (note-taker)

  7. After the session
     - 24-hour structured summary for sign-off
     - SHIP-WORTHY → advance to G6
     - NEEDS_ITERATION → engineering iterates; optional follow-up
     - REGRESSION_VS_TODAY → hard halt + RCA + remediate

Tone discipline: banker-facing, terse, no engineering jargon. The
document is what the banker actually reads — it must respect their time
and explain the relationship between the new artifacts and what they
already know.

Spec: docs/pending-updates/Banker-Structuring-Output.md § 16.5
       pre-pilot checklist items 3 + 4
Gate: G5.3 of 7

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
docs/runbooks/g5-banker-review-template.md — minute-by-minute interview
script the operator follows during the ≥60-min pilot banker review
session. Every one of the 7 spec-§-16.5 banker-review checklist items
maps to a discussion dimension with operator script + JSON capture fields.

Session structure (~70 minutes total):
  0:00-0:05  Opening — intros, recording consent, deliverable receipt
  0:05-0:13  D1 — verbatim Q preservation (8 min)
  0:13-0:20  D2 — deal context accuracy (7 min)
  0:20-0:32  D3 — answer depth (12 min)
  0:32-0:42  D4 — citation appropriateness (10 min)
  0:42-0:50  D5 — confidence calibration (8 min)
  0:50-0:56  D6 — uncertain rationale (6 min)
  0:56-1:05  D7 — overall verdict (9 min)
  1:05-1:10  Wrap — structured-note timeline + thank-you

For each dimension:
  - Verbatim operator script (read aloud or share onscreen)
  - Specific sub-questions to ask
  - Expected JSON capture fields (matches G5.5 schema)
  - Acceptance signal for SHIP-WORTHY verdict on that dimension

Quality discipline reminders for the operator:
  - Verbatim banker quotes are load-bearing — capture more not less
  - Do NOT interpret the verdict; only the banker assigns categories
  - Respect the 60-min budget; overrun signals regression-level
    deliverable
  - Operator opinions are out of scope; product opinions go to
    post-session engineering debrief, not the banker session

Post-session operator actions enumerated:
  1. Save structured feedback to banker-feedback-<key>.json (G5.5 schema)
  2. Generate written summary for banker sign-off
  3. Send summary within 24 hours
  4. On sign-off (or 5 business days, whichever first), commit to
     docs/pilot-feedback/<session_key>/
  5. Initiate next-step action per verdict (G5.6 decision matrix)

Spec: docs/pending-updates/Banker-Structuring-Output.md § 16.5
       banker review session checklist (7 items)
Gate: G5.4 of 7

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
docs/runbooks/g5-banker-feedback-capture.md — defines the immutable
artifact that ties the banker review session to engineering iteration:
banker-feedback-<session_key>.json. Schema is intentionally verbose so
the file alone — without the operator notes — drives the engineering
iteration backlog.

Section A — JSON schema (v6.14-banker-feedback-v1)
  Top-level fields:
    - session_key, pilot_client, review_session, deliverable_receipt
    - d1_verbatim through d6_uncertain — one block per dimension
    - d7_overall — verdict enum + iteration_items[] OR regression_reasons[]
    - wrap — final concerns + sign-off timestamps + next_step_filed URL

  Per-dimension structure captures:
    - The banker's verdict (one of 2-3 enum values per dimension)
    - Specific issues / spot checks / flags as arrays with q_id +
      banker_quote (verbatim) + assessment category
    - Overall dimension verdict
    - Banker quote summary

Section B — Written banker-sign-off summary template
  Markdown template the operator generates from the JSON within 24 hours
  of session end. Banker either signs off ("approved") or annotates
  edits. Sign-off is the trigger for engineering next-step action.

Section C — Archival location
  docs/pilot-feedback/<session_key>/
    banker-feedback.json (the JSON)
    sign-off-summary.md (the signed-off markdown)
    notes/ (verbatim transcript or operator structured notes)

  In-repo (not external datastore) so artifacts are git-versioned,
  PR-reviewable, and survive backend changes. Engineering + GTM +
  compliance all reference this directory.

Section D — Schema validation
  jq -e query that must return true before archival. Verifies every
  required field is populated AND the verdict matches the enum.
  Operator runs before commit; failing validation means go back and
  fill the missing fields before sign-off.

Section E — Privacy + retention
  Confidential by default; same access controls as the rest of the
  repo. Verbatim transcripts stored under existing session-diagnostics
  encryption posture. Retention indefinite for engineering archival;
  banker can request redaction at any time per Aperture's GDPR Article
  17 handling, with 5-business-day SLA.

Spec: docs/pending-updates/Banker-Structuring-Output.md § 16.5
       banker review checklist + § 15.6 W4 iteration loop
Gate: G5.5 of 7

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Number531 and others added 8 commits June 1, 2026 12:59
Renumber 022_kg-nodes-embedding-hnsw → 025 to clear a number collision with
main's 022_artifact-source-width (8.0.x wrapped-subagents line) and #197's
reserved 023/024. Two differently-named NNN_ migrations produce NO git conflict,
so the collision is invisible to conflict review — node-pg-migrate silently
skips one on fresh/production deploys. Content is idempotent
(CREATE INDEX IF NOT EXISTS) so the renumber is data-safe.

Add scripts/check-migration-collisions.mjs + .github/workflows/migration-lint.yml:
a CI guard that fails when two migrations share a numeric prefix, running against
the PR merge-result so a feature branch colliding with main is caught before merge.
This is the SECOND occurrence of the class on this branch (prior: 011→022 rename),
so the systemic guard converts an invisible production-migration-skip into a loud
red check for all future cross-branch merges.

See docs/pending-updates/Banker-Merge-Risk.md §3.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ssessment

Single source of truth for the banker→main integration: divergence (201/176),
the 10 conflicts with per-file resolution, 6 auto-merged files (semantically
verified), the CRITICAL 022 migration collision (now resolved via the renumber
in 44b32c9f), test/CI risks, the wrapped-subagents semantic interaction
(banker agents auto-wrap + run on Opus 4.8), and the ordered merge procedure.
Reconciled with the PR-team recommendation; every claim re-verified against the
repo (audit trail in §12). Referenced by the 025 migration header comment.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…se-1

Integrates main's permanent wrapped-subagents architecture (WRAPPED_SUBAGENTS=true,
mcp__subagents__run_* runner, Opus 4.8 sonnet-tier override) into the flag-gated
banker Q&A module. Banker lands DORMANT (BANKER_QA_OUTPUT=false); the live
validation gate (non-Cardinal session on Opus 4.8) runs before any flag flip.

Conflicts resolved (10):
- featureFlags.js — union; OPUS_MODEL=claude-opus-4-8; all wrapped helpers +
  banker KG/wrapped flags present; no dup keys.
- flags.env — union; deduped 3 keys; BANKER_QA_OUTPUT=false.
- package.json — 8.0.1 + jest.config.cjs structure (main).
- baselines.json — banker's nested schema (valid JSON).
- agentStreamHandler.js — kept main's finalHooksConfig hook-chain split +
  banker's conditional enhancedPrompt; systemPrompt keeps buildAgentToolMappingBanner()
  AND BANKER_QA_OUTPUT injection.
- memo-qa-certifier.js — both gates + explicit gate-precedence header (Dim-13 first).
- CHANGELOG.md, 2x skill docs, failure-patterns.md — keep-both.

Verified post-merge:
- 0 conflict markers tree-wide; all changed JS/MJS syntax-OK.
- Migration guard passes (23 numbers, no collision; banker's 022 renumbered to 025).
- Banker agents wire through wrapped machinery (registry -> run_banker_* tools ->
  Opus 4.8 override -> universal banner -> dispatch); orchestrator resume/remediation
  banker-phase-aware (dedicated resume gate intact).
- Auto-merged semantic audit clean (22 both-touched files; no dangling symbols).

CI fix (this PR's only CI change):
- jest.config.cjs: testPathIgnorePatterns excludes the 19 banker node:test suites
  (jest cannot run node:test files; they run via node --test in kg-tests.yml).
  jest glob 230 -> 211; node --test on the 19 = 412 pass / 0 fail.
  Pre-existing deploy.yml bare-`npm test` debt (live-test hang + 9 zero-test suites,
  all on main) documented as out-of-scope follow-up in Banker-Merge-Risk.md sec 7.

See docs/pending-updates/Banker-Merge-Risk.md for the full merge-risk SSOT.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Pure, side-effect-free guardrail that re-parses banker-question-answers.md with the production bankerQaParser exports and asserts structural integrity: every Q-block has parseable Answer/Because, confidence parses (legacy + 5-level), >=1 citation, expected Q-block count, no all-null block. Separates HARD parseability (errors) from SOFT spec-compliance (warnings, e.g. legacy confidence vocab) so the Sonnet gold fixture still passes. Adds bankerQaMetadataSchema (zod, 5-level confidence enum) + parse/safeParse mirroring src/schemas/entitiesJson.js. Inert — nothing in the production path calls it yet.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
14 node:test cases — gold fixture passes (29 blocks / 203 citations / 29 confidence rows; zero false positives), synthetic drift caught (**Response:** rename, all-null block, missing block, zero citations; zero false negatives), and banker-qa-metadata.json zod accept/reject. Registered in kg-tests.yml node --test list + path trigger, and excluded from jest via jest.config.cjs testPathIgnorePatterns (it is a node:test file).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
scripts/run-bankerqa-isolated.mjs invokes ONLY banker-qa-writer via runWrappedAgent (no Express server, no full pipeline) against the Cardinal session inputs, validates output with the gate, does ONE bounded re-prompt then hard-fails. --dry validates the existing gold fixture with no API call. Mirrors the production buildAgentToolset -> runWrappedAgent dispatch, so the model resolves to Opus 4.8 through the real resolveModelId override. Reusable for future model bumps.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Explicit CHANGELOG entries for (1) the CI gate fix (jest testPathIgnorePatterns excluding 19 node:test suites; jest glob 230->211) and (2) the parse-back validation gate + isolation harness. Records the empirical Tier-3 result: banker-qa-writer on Opus 4.8 produced parser-clean output first-pass (29/29, correct 5-level confidence) — the drift concern is dismissed; the **Question:**-field divergence is a deferred follow-up. Banker-Merge-Risk.md §8 gets a VALIDATION STATUS note narrowing the pre-flag-flip live gate to dispatch/path/frontend.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
# Conflicts:
#	super-legal-mcp-refactored/CHANGELOG.md
@Number531 Number531 changed the title feat(kg+frontend): v6.14.0→v6.18.3 — Banker QA workflow + 8 KG edge waves + IC Pyramid surface feat(banker): Banker QA workflow + KG edge waves + IC Pyramid — integrated with main 8.0.2 (wrapped subagents, dormant flag-off) Jun 2, 2026
Number531 and others added 2 commits June 2, 2026 19:45
…ure-flags.md

feature-flags.md was missing the 9 flags added across v6.14-v6.18 (this PR window). Adds full entries #53 (BANKER_QA_OUTPUT, dormant-on-merge + pre-flip gate) and #54-#61 (the 8 KG_* edge-wave flags), index-table rows, and a Flag Dependency Tree update making explicit that the graph is NOT a single switch: KNOWLEDGE_GRAPH master (under HOOK_DB_PERSISTENCE -> EMBEDDING_PERSISTENCE) + 8 independently-revertible sub-flags. Sourced from flags.env rollback comments.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The 8 KG_* edge-wave flags are absent on main, so they activate in production for the first time on this merge — meaning Wave 4's own rollout policy (higher FP risk; 'leave commented out for the first 7 days after deploy, enable only after manual spot-check') had not been satisfied (the soak never started). Comment KG_CONTRADICTION_EDGES out in flags.env per that policy; the other 7 KG waves ship ON (deterministic/additive/isolated). feature-flags.md #57 + CHANGELOG updated. Enable Wave 4 after a 7-day soak + manual CONTRADICTS spot-check on the first post-merge production sessions.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Number531 and others added 3 commits June 2, 2026 23:50
… checkout

reports/ is gitignored, so banker-qa-parser + banker-qa-validator read the gold artifact at module load and ENOENT on a fresh clone (13 cases lost; '426/426' was machine-local). Commit the gold banker-question-answers.md + specialist-coverage-state.json to tracked test/fixtures/banker-qa/ and repoint both suites. Verified: with reports/ hidden, validator 14/14 + parser 29/29 pass from the fixture. (PR #178 review finding #2.)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…st guards production

kg-phase10-recommendation-dedup.test.js locked the recommendation canonical_key formula against a hand-kept REPLICA, which could silently drift from kgPhase10DealIntel.js. The v6.18.1 canonical_key change is unconditional (runs whenever KNOWLEDGE_GRAPH is on) and changes recommendation node identity/dedup. Extract the inline derivation into an exported deriveRecommendationCanonicalKey() (pure, behavior-identical) and rewire the 19 dedup tests to import it — they now guard the real formula (19/19, full KG list 426/426, no regression). (PR #178 review finding #3.)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ical_key, inert CI)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…nd BANKER_QA_OUTPUT

Re-review nice-to-haves: the two un-flagged riders are now flag-gated so flag-off behavior is byte-identical to main. (1) citation-paragraph-style.lua (DOCX+PDF) applied only when BANKER_QA_OUTPUT=true — the [N]-leading lines it restyles exist only in banker-qa artifacts, so non-banker renders are unchanged. (2) session timeout is BANKER_QA_OUTPUT ? 6h : 4h — non-banker keeps main's 4h default. Verified: flag-off resolves to 4h + no filter; validator 14/14.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Number531 and others added 4 commits June 3, 2026 01:18
…lusters

findSectionForRef matched $IV.A/.X/.T to section-iv-tax-* because the next-token gate /^[a-z]{1,6}$/ treated 'tax' as letters a/x/t. Unconditional (every session) → wrong citation->section CITES edges. Add strict isLetterCluster() (strictly-ascending ⇒ sorted+distinct, range a-l); real clusters (a,bc,cdgh,cdef,gh) preserved, topic words rejected. +2 tests. (PR #178 G1.)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
SINGLE/RANGE/WORD multiple regexes lacked ^ anchor, so a head single (15× … 12-14× rate base) was dropped for the tail range — wrong value/type + double-emit. Anchored all three; callers always pass head-anchored spans. +1 test. (PR #178 G2.)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
CITATION_LINE_REGEX class group was upper-only ([A-Z][A-Z ]*), silently dropping a whole citation line on a mixed-case tag like [Filing]. Now [A-Za-z][A-Za-z ]*, normalized to upper-case on capture. (PR #178 G6-banker; dormant behind BANKER_QA_OUTPUT.)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…, G6-numeric)

G3 (Phase 16 fanout-cap bypass) + G6-numeric (Phase 11 wrong magnitude) ride these two waves and are invasive/under-specified to fix safely now. Held OFF at merge per the Wave 4 policy; fixes tracked in #204. feature-flags.md #55/#61 + CHANGELOG updated. Net: 5 KG waves ON, 3 HELD.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@Number531

Copy link
Copy Markdown
Owner Author

Round-3 disposition — all 6 deeper-review findings addressed (HEAD 4267eef8)

Thanks for the deeper sweep. Each finding was reproduced against the actual code (not taken on the report) before action. Summary:

# Finding Disposition
G1 section-matcher read topic word "tax" as letters a/x/t → wrong §IV.A/.X/.T→section CITES edges (unconditional, every session) FIXED — reproduced (§IV.A/.X/.T → NID-TAX), added strict isLetterCluster() (strictly-ascending ⇒ sorted+distinct, range a–l); topic words (tax/data/debt) rejected, real clusters (a, bc, cdgh, cdef, gh) preserved. +2 regression tests (section-ref-matcher 27→29).
G2 parseMultiple un-anchored → head single dropped for tail range (wrong value 13 vs 15, wrong type rate_base vs ev_ebitda, double-emit); active via KG_PRECEDENT_BENCHMARKS FIXED — reproduced, head-anchored all three regexes (^); callers confirmed to always pass head-anchored spans. +1 regression test (multiple-extractor 23→24).
G3 Phase 16 fanout-cap bypass — prose + numeric passes each apply the 12-cap independently → up to 24 SENSITIVE_TO/source HELDKG_SENSITIVITY_EDGES commented OFF in flags.env (same policy as KG_CONTRADICTION_EDGES); cross-pass-budget fix tracked in #204.
G6-numeric Phase 11 "silent wrong magnitude" (under-specified repro) HELDKG_NUMERIC_EXPOSURE commented OFF; fix tracked in #204. Not fixing what isn't cleanly reproducible.
G6-banker bankerQaParser CITATION_LINE_REGEX class group upper-only → mixed-case [Filing] silently dropped the whole line FIXED — now [A-Za-z][A-Za-z ]*, normalized to upper-case on capture (dormant behind BANKER_QA_OUTPUT).
G4 / G5 dead Prometheus alerts (alerts-banker-qa.yml) / DOM-XSS (marked.parse()innerHTML, pre-existing repo-wide) TRACKED in #204 — G4 before flag-flip (it's the flip's safety gate); G5 as a repo-wide DOMPurify task.

Decision rule

  • Unconditional or cleanly-reproducible → fixed in code with a regression test (G1, G2, G6-banker).
  • Invasive or under-specified → held the flag (G3, G6-numeric) — exactly the "fix OR hold" option you offered, consistent with the PR's own Wave-4 policy. Deliberately did not rush numeric-logic fixes I couldn't verify.

Net flag state on merge

5 KG waves ON · 3 HELDKG_CONTRADICTION_EDGES (Wave 4), KG_NUMERIC_EXPOSURE (Wave 2.2), KG_SENSITIVITY_EDGES (Wave 8); BANKER_QA_OUTPUT=false (dormant).

Regression (run locally — CI inert per #203)

  • KG node:test list: 429/429 · wrapped-subagent suite: 868/868 · migration guard OK · OPUS_MODEL=claude-opus-4-8.

Still requires explicit human sign-off (unchanged, not mine to clear)

  1. The intended KG node-identity change for historical-session rebuilds (Phase 10 canonical_key, now guarded by the production-imported dedup suite).
  2. No automated CI gate until the workflows are relocated (CI workflows never run — relocate .github/workflows/ to repo root (pre-existing, repo-wide) #203) — tests pass locally and were independently reproduced.

Follow-ups: #202 (flag consolidation), #203 (CI relocation), #204 (G3/G6-numeric/G4/G5).

…y + README section

- CHANGELOG [Unreleased]: top-level merge entry framing the Banker Q&A workflow's
  purpose (banker companion deliverable vs. full memo), application (G0.5/G2.5/G3.5/G6
  gated phases + 3 agents), provenance (question nodes + INFORMS edges + Evidence Trail),
  flag state on merge, and verified merge-readiness.
- README: new "Banker Q&A Workflow — Intake Questions, Output Answers & Provenance"
  section (agents/phases table, intake + output artifacts, provenance/KG, endpoints,
  8 edge-wave flag table) + BANKER_QA_OUTPUT and KG_* rows in the env-vars table.

Docs-only; no code change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@Number531 Number531 merged commit 8cbea16 into main Jun 3, 2026
Number531 added a commit that referenced this pull request Jun 3, 2026
Adds the missing CHANGELOG entry for the remediation work (erasure, banker-QA
metrics + alert retarget, audit-export, diagnostics, system-design 48-agent
consistency, schema-evolve ensure*Schema + PB1 fixes, banker-preflip skill).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Number531 added a commit that referenced this pull request Jun 3, 2026
…dit-export, diagnostics, doc/skill consistency

fix(post-178): remediate banker/KG follow-up gaps — erasure, audit-export, diagnostics, dead-metric traps, doc sync
Number531 added a commit that referenced this pull request Jun 8, 2026
…_banker_* tools generate

The banker Q&A workflow (PR #178, BANKER_QA_OUTPUT) could never dispatch: banker-intake-analyst,
banker-specialist-coverage-validator, and banker-qa-writer are registered in legalSubagents/index.js
but were absent from WRAPPED_SUBAGENT_ALLOWLIST. The wrapped-subagents MCP server generates one
mcp__subagents__run_<name> tool per allowlisted agent, so with no entry the orchestrator had no
mcp__subagents__run_banker_* tool to call — zero banker agents ran in the 2026-06-08 NextEra/Dominion
canary (reports/2026-06-08-1780888014) despite BANKER_QA_OUTPUT=true.

- flags.env: append the 3 banker agents to WRAPPED_SUBAGENT_ALLOWLIST (42 -> 45). Inert while
  BANKER_QA_OUTPUT=false (banker agents never invoked when the gate is off) -> zero non-banker impact.
- Wiring prerequisite only: banker mode still needs a forced server-side runBankerIntakePhase()
  (deterministic pre-orchestration gate, mirrors P0/promptEnhancer) + a real question input. Follow-ons
  tracked in docs/pending-updates/06-08-2026-canary.md.
- Unblocks the banker-preflip-validation Tier-3 assertion (dispatch emits mcp__subagents__run_banker_*).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant