Skip to content

fix(post-178): remediate banker/KG follow-up gaps — erasure, audit-export, diagnostics, dead-metric traps, doc sync#205

Merged
Number531 merged 11 commits into
mainfrom
fix/post-178-remediation
Jun 3, 2026
Merged

fix(post-178): remediate banker/KG follow-up gaps — erasure, audit-export, diagnostics, dead-metric traps, doc sync#205
Number531 merged 11 commits into
mainfrom
fix/post-178-remediation

Conversation

@Number531

Copy link
Copy Markdown
Owner

Summary

Post-merge remediation of the gaps surfaced auditing system-design.md + .claude/skills/ against main after PR #178 (banker Q&A workflow + 8 KG edge waves). Single omnibus branch per the agreed plan (docs/pending-updates/post-178-remediation-plan.md). Three logical commits.

Decisions locked with the operator: wire the missing metrics (not retarget), export the KG breaker to /metrics, one branch.

🔴 High-severity defects (code/compliance/runtime) — PR-1 commit

  • GDPR Art. 17 erasure (transcriptArtifacts.js) now deletes the 9 banker artifacts at the session root + review-outputs/ (was leaking verbatim deal facts after erasure). Explicit allowlist BANKER_ARTIFACT_FILES, drift-guarded against the banker agents' outputFiles; never rm the session dir. +3 tests.
  • Art. 13 regulator bundle (client-audit-export): range-query.py now exports kg_nodes/kg_edges/kg_provenance/kg_evolution; collect-transcripts.sh now bundles the banker artifacts as banker-qa-artifacts.tar.gz (auto-hashed/uploaded by bundle.sh). SKILL.md edge/node-type table reconciled.
  • session-diagnostics render-report.py: navigates the nested baselines.json (primary snapshot) so the comparison table populates (was all ); added Pattern 10 (KG phase breaker); reconciled the auto-vs-manual pattern claim; 04-kg-counts.sql per-node-type breakdown + held-flag note.

🟠 Dead-metric traps — PR-2 commit

  • Wired the 3 missing metrics so all 5 banker-qa alerts back to real series: claude_kg_phase_duration_ms{phase,status} (new runKgPhase() wrapper times each banker KG wave phase), claude_qa_dimension_score{dimension} (emitted from the qa-diagnostic state-file write hook), and exported claude_kg_circuit_breaker_state{phase} (the breaker was in-memory only).
  • Retargeted alerts-banker-qa.yml to those real metrics (0 dead metric names remain) and replaced all 17 dead claude_circuit_breaker_state{breaker="KG-Phase*"} refs across 5 skills with the real metric.
  • Hardened g4-readiness.sh Check B to assert each alert's metric exists in the registry (+ live /metrics if BASE_URL set) — closes the false-green on the pre-flip gate.

📘 Doc sync + ➕ new skill — PR-3/PR-4 commit

  • system-design.md: agent count 45→48 + banker rows, §15 flag registry (9 flags), new §16.7 Banker Q&A Workflow, questions endpoints + migration 025.
  • schema-evolve (next-migration 015→026 + collision-guard), subagent-scaffold (2nd archetype + missing wiring + path fixes), deploy/client-provisioner (flags + manual migration 025 + Gotcha maxThinkingTokens breaks all Agent SDK hooks regardless of value #7), schema-doc-validator (reads alerts-banker-qa.yml).
  • Stale-ref sweep (client-offboarding 11→16 edge types; Pattern-10 migration 022→025); README regenerated (34 skills) after adding xlsx-workbook-template-creator + the new banker-preflip-validation skill (3-tier GO/NO-GO gate for the flag flip).

Verification

  • Tier 1 (smoke): node --check all changed JS green; py_compile/bash -n green; alerts-banker-qa.yml YAML valid; README --check in sync.
  • Tier 2 (integration): 118/118 affected node:test suites pass incl. the new erasure suite; metrics module loads + emits; audit-export collector smoke-tested (banker tarball produced, memo excluded); renderer smoke-tested (baseline table populates, Pattern 10 fires).
  • runKgPhase is behaviour-equivalent to the prior try/catch blocks; legacy KG phases 1–10 untouched.
  • Note: kg-phase6-entities.test.js is a pre-existing jest-only file (fails only under node --test), unrelated to this branch.

Not in scope (tracked)

🤖 Generated with Claude Code

Number531 and others added 11 commits June 3, 2026 02:57
…gnostics)

WS1.1 — GDPR Art.17 erasure now removes the 9 banker artifacts (session root +
review-outputs/) in addition to wrapped-subagent transcripts. Explicit allowlist
(BANKER_ARTIFACT_FILES, drift-guarded against the banker agents' outputFiles);
never rm the session dir. +test (3 cases). art-17-flow.md updated.

WS1.2 — Art.13 regulator bundle now includes KG tables (kg_nodes/edges/provenance/
evolution) in range-query.py and the banker artifacts as banker-qa-artifacts.tar.gz
in collect-transcripts.sh (auto-hashed/uploaded by bundle.sh). SKILL.md edge/node
type table reconciled to the full v6.18.x set + held-flag note.

WS1.3 — session-diagnostics render-report.py now navigates the nested baselines.json
('primary' snapshot) so the comparison table populates (was all "—"). Added Pattern 10
(KG phase breaker via kg_build_last_error). failure-patterns.md claim reconciled
(auto vs manual). 04-kg-counts.sql gains a per-node-type breakdown + SIMILAR_TO +
held-flag completeness note.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
WS2.1 — wire the 3 previously-missing metrics so all 5 banker-qa alerts back to
real, emitting series:
- claude_kg_phase_duration_ms{phase,status} — new runKgPhase() wrapper in
  knowledgeGraphExtractor.js times each banker KG wave phase (1b/1c/4c/4d/11-16).
- claude_qa_dimension_score{dimension} — emitted from sdkHooks.js PreToolUse on
  qa-diagnostic-state.json writes (dimensions_scored[d].score).
- Retargeted alerts-banker-qa.yml: subagent-failure → claude_gate_check_results_total
  {status=~"failed|error"}; Dim13 → claude_qa_dimension_score; phase1b latency →
  claude_kg_phase_duration_ms_bucket. 0 dead metric names remain.
- Hardened scripts/g4-readiness.sh Check B: asserts each alert metric exists in the
  registry (+ live /metrics if BASE_URL set) — closes the false-green.
- infrastructure-health/references/prometheus-alerts.md: documents the banker-qa group.

WS2.2 — export the KG circuit breaker as claude_kg_circuit_breaker_state{phase}
(runKgPhase sets it; resets to closed on success). Replaced all 17 dead
claude_circuit_breaker_state{breaker="KG-Phase*"} refs across 5 skills with the
real claude_kg_circuit_breaker_state{phase="KG-Phase*"}.

runKgPhase is behaviour-equivalent to the prior try/catch+recordFailure blocks
(legacy phases 1-10 untouched). All changed modules load + emit; node --check green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…, README)

WS3.1 — system-design.md: agent count 45→48 + 3 banker rows; §15 flag registry
gains BANKER_QA_OUTPUT + 8 KG flags (5 ON / 3 held) and the KNOWLEDGE_GRAPH row
fixed (10→multi-phase); new §16.7 Banker Q&A Workflow (3 agents, G0.5/G2.5/G3.5/G6,
9 artifacts, dormancy contract, bankerQaValidator); §14.9 questions endpoints + migration 025.

WS3.2 — schema-evolve: next-migration 015→026 (derive-from-highest), collision-guard
referenced (CI now detects), ensure*Schema line numbers refreshed + ensureKnowledgeGraphSchema,
KG-type-needs-no-migration note.

WS3.3 — subagent-scaffold: 2nd archetype (flag-gated/ship-dormant/no-MCP); missing
wiring (hookDBBridgeConfig maps, classifyDocument, intake phase, extended agent-def
fields, withheld-allowlist case); fixed wrong paths (domainMcpServers.js, p0GateHook.js);
scaffold/verify/wire scripts updated.

WS3.4 — deploy + client-provisioner: 9 flags + manual migration 025 (partial b-tree,
not HNSW) + Gotcha #7 rewrite + KG_QA_INFORMS dormant-not-useless reconciliation +
breaker-metric name fix. schema-doc-validator: extract-truth.py now reads alerts-banker-qa.yml.

WS3.5 — stale-ref sweep: client-offboarding 11→16 edge types; failure-patterns Pattern 10
migration 022→025; README regenerated (34 skills, no drift) after adding xlsx-workbook-
template-creator + banker-preflip-validation to CATEGORY_MAP; feature-compliance 10→11 dims.

PR-4 — new banker-preflip-validation skill (3-tier GO/NO-GO gate for the
BANKER_QA_OUTPUT=true flip: isolation harness + g4-readiness metric-existence + live
non-Cardinal session).

Verification: all changed JS node --check green; py_compile/bash -n green; 118/118
affected node:test suites pass; README --check in sync. (kg-phase6-entities.test.js is a
pre-existing jest-only file, unrelated.)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…hygiene, stray symlink

Addresses findings from the parallel review of PR #205:

- **Remove accidentally-committed `node_modules` symlink** (was added by `git add -A`
  in the PR-1 commit; a machine-specific abs-path symlink into another worktree,
  broken on any other checkout). `git rm --cached`.
- **BankerCertifierReject** (was `Dim13ScoreLow`): the alert targeted
  `claude_qa_dimension_score{dimension=~"dim_13.*"}` but NO stable `dim_13` key
  exists (qa-diagnostic keys are dim_0..dim_12, varying suffixes, no banker fixture
  to confirm) → it would have been a silent no-data alert. Retargeted to the
  provably-emitting certifier gate result with an honest in-file note; renamed across
  g4-readiness.sh, schema-doc-validator, banker-preflip-validation, prometheus-alerts.md.
- **Gate-status matchers**: `status=~"failed|error"` missed the real `PARTIAL`
  status (and `error` is never emitted). Switched alerts 1-4 to `status!~"VERIFIED|passed"`
  (robust: catches failed/PARTIAL/error regardless of SDK-vs-wrapped vocabulary).
- **Swept 6 more dead `claude_circuit_breaker_state{breaker="KG-Phase*"}` refs** in
  docs/runbooks/wave-{4,5,6,7}-*.md + system-design.md → `claude_kg_circuit_breaker_state{phase=...}`.
- **system-design.md** corrections: migration 025 is a partial b-tree (NOT HNSW;
  3072-dim > pgvector cap; manual-only) not "the HNSW index"; banker-qa-metadata.json
  consumed by KG Phase 1b (not 1c) per code.
- Swept stale `Dim13ScoreLow` from g4-spec-mapping.md + Banker-Structuring-Output.md.
- Fixed stale sdkMetrics.js phase-label comment ('phase4c' → 'KG-Phase4c').

Verification: 89/89 affected node:test suites pass; alerts YAML valid; all 5 alert
metrics exist in registry; README --check in sync (34 skills); 0 dead breaker refs
and 0 Dim13ScoreLow refs repo-wide.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ctions

Pre-existing bug surfaced in the PR #205 review (folded in per decision): the
schema-evolve skill pointed operators at three functions that don't exist
(ensureSessionsSchema, ensureWave3Schema, ensureCodeExecutionsSchema) plus two
non-existent tables (citations, agent_progress) and stale postgres.js line numbers.
`/schema-evolve --table access_log` told operators to edit ensureWave3Schema()
"around L965" — impossible.

Verified against src/db/postgres.js: ensureHookSchema (L1199) runs the DDL constants
for nearly the whole app schema (sessions/reports/users/code_executions/source_writes/
access_log/pii_mappings/citation_*/etc.); ensureSchema (L26) only makes runs/tool_calls/
evidence. All three bogus names collapse to ensureHookSchema.

- patterns.md: replaced the function→table map with the accurate 9-function map +
  a note that the session/wave3/code-exec helpers never existed and there is no
  citations/agent_progress table.
- generate-migration.py: detect_ensure_function() now returns only real functions
  (folded the 2 bogus branches into ensureHookSchema; added artifact/xlsx/chat/
  client_errors branches; default points to a grep recipe). PB2/PB3 ensure_patch
  strings retargeted to ensureHookSchema with grep anchors instead of brittle/wrong
  line numbers (the line numbers are how this rotted).
- SKILL.md: example output ensureSessionsSchema() → ensureHookSchema() (the example
  patches `reports`, which lives in HOOK_SCHEMA_DDL).

Verification: 0 bogus function names; every cited ensure* fn exists in postgres.js;
detect_ensure_function dry-run returns correct real functions for access_log/
code_executions/sessions/embeddings/kg_*/report_artifacts/xlsx/chat/client_errors;
py_compile OK.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ing migration)

The PB1 prebuilt-fix sample emitted a migration that would FAIL on apply and cause
dual-path drift — the exact failure class this skill exists to prevent:
- citation_source_links referenced a NON-EXISTENT `citations(id)` table (FK error
  on apply) and used a fabricated shape (SERIAL id, citation_id/source_write_id/
  similarity_score) that does not match production.
- code_execution_inputs likewise used a fabricated shape (SERIAL id, input_hash/
  input_source) vs production.

PB1's own description says it backfills DDL that "exists in postgres.js", so it MUST
mirror production. Rewrote both CREATE TABLE bodies + indexes to match src/db/
postgres.js (CODE_EXECUTIONS_DDL) exactly: UUID PKs, report_id→reports(id) +
citation_marker + source_hash/confidence/matched_via/citation_text + UNIQUE; and
execution_id→code_executions(id) + source_type/source_name/source_url/content_hash/
fetched_at/http_status/fetch_mode/source_metadata. down_sql unchanged (already correct).

Verification: py_compile OK; 0 `REFERENCES citations(...)` refs; sample CREATE TABLE
bodies match production column-for-column. (PB2/PB3 are illustrative add-column
examples, not production mirrors — correct as-is. The generic add-table scaffold's
SERIAL placeholder is a marked TODO, intentionally generic.)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…dels, feature-compliance)

Final-investigation sweep found the same bogus-function bug class in two skills
outside the schema-evolve scope:
- code-execution-models/SKILL.md: code_executions/code_execution_inputs cited as
  "ensureCodeExecutionsSchema block" → corrected to CODE_EXECUTIONS_DDL run by
  ensureHookSchema().
- feature-compliance-scaffold/SKILL.md: D7 example cited "ensureSessionsSchema()" →
  ensureHookSchema() (reports table lives in HOOK_SCHEMA_DDL).

Invariant now holds repo-wide: 0 references to ensureSessionsSchema/ensureWave3Schema/
ensureCodeExecutionsSchema; every ensure*Schema name referenced across all skills
resolves to a real `export async function` in src/db/postgres.js.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…line-ref

Final adversarial sweep found two doc-only inconsistencies (the doc-drift class this
PR exists to eliminate), both in files this PR edited:
- prometheus-alerts.md: BankerQAWriterFailure row still showed the old
  status=~"failed|error" matcher while the actual alert (and the BankerCertifierReject
  row two lines down) use status!~"VERIFIED|passed". Aligned it.
- g4-spec-mapping.md: BankerCertifierReject cited "alerts-banker-qa.yml lines 132-160"
  (stale after the retarget grew the file). Replaced the brittle line range with a
  name anchor (`alert: BankerCertifierReject`).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ody consistency)

The earlier 45→48 header bump left a header/body mismatch: §4.1 said "48 Subagents"
but the table only listed 46 rows (numbered 0-45), missing two agents that ARE in the
registry — government-affairs-analyst and macro-economic-analyst (the pre-existing
roster-sync gap WS3.1 flagged). The local-main copy the reviewer was viewing still
shows the pre-merge "45"; this commit makes the branch table fully self-consistent.

- Inserted the 2 missing research agents (rows 21-22, sonnet; gov-affairs →
  federal-register/congress/regulations-gov/govinfo/sam-gov; macro-economic →
  fred/bls/ecb + code-execution) and renumbered the remainder (old 21-45 → 23-47).
- Table now enumerates rows 0-47 = 48 agents, set-identical to the 48 LEGAL_SUBAGENTS
  tuples in src/config/legalSubagents/index.js (verified: 0 missing, 0 extra).
- Fixed the stale "Phase 2 — Parallel Research (19 Agents)" subheader → (23 Agents).
- Updated banker-rows references "rows 43-45" → "rows 45-47" (§4.1 note + § Banker Q&A).

(Line 2897 "Zod schemas for all 42 subagent outputs" is the historical issue-#10
deliverable count — point-in-time, left as-is like the dated CHANGELOG snapshots.)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds the missing CHANGELOG entry for the remediation work (erasure, banker-QA
metrics + alert retarget, audit-export, diagnostics, system-design 48-agent
consistency, schema-evolve ensure*Schema + PB1 fixes, banker-preflip skill).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…e) + root changelog pointer

- service CHANGELOG: added Reason / Benefits / Expected-outcome framing above the
  per-file breakdown.
- root CHANGELOG: brief [Unreleased] entry summarizing the remediation (reason/benefit/
  outcome/scope) pointing to the service CHANGELOG for full detail — matches the root
  changelog's "see service CHANGELOG for details" convention.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@Number531 Number531 merged commit 770d703 into main Jun 3, 2026
Number531 added a commit that referenced this pull request Jun 3, 2026
…about as-shipped

Audit of PR #206 found the committed plan (marked EXECUTED) self-contradicted and
overstated fidelity to what PR #205 actually shipped:
- §5 "Out of scope" still listed "wire claude_qa_dimension_score" + "export kgBreaker"
  as out-of-scope, contradicting the locked decisions (§2 D1=wire, D2=export). Removed.
- Added §8 "Execution divergences" recording what shipped vs. planned: the
  Dim13ScoreLow→BankerCertifierReject retarget (no dim_13 key), 9 artifacts not 8,
  and the review-time scope expansion (schema-evolve/PB1, cross-skill sweep, §4.1
  48-agent table, CHANGELOG) not in the original plan.
- Sequencing note: D3 chose one omnibus branch, not 4 PRs.

Docs-only.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Number531 added a commit that referenced this pull request Jun 4, 2026
…root CHANGELOG

The root summary-pointer CHANGELOG.md tracks major entries (e.g. PR #205) but
was missing the G3 fix recorded in the service changelog (commit 842bf97).
Adds a matching [Unreleased] summary above the PR #205 entry, pointing to the
service CHANGELOG for the full per-file detail. Keeps the two changelogs
consistent.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant