fix(post-178): remediate banker/KG follow-up gaps — erasure, audit-export, diagnostics, dead-metric traps, doc sync#205
Merged
Merged
Conversation
…gnostics)
WS1.1 — GDPR Art.17 erasure now removes the 9 banker artifacts (session root +
review-outputs/) in addition to wrapped-subagent transcripts. Explicit allowlist
(BANKER_ARTIFACT_FILES, drift-guarded against the banker agents' outputFiles);
never rm the session dir. +test (3 cases). art-17-flow.md updated.
WS1.2 — Art.13 regulator bundle now includes KG tables (kg_nodes/edges/provenance/
evolution) in range-query.py and the banker artifacts as banker-qa-artifacts.tar.gz
in collect-transcripts.sh (auto-hashed/uploaded by bundle.sh). SKILL.md edge/node
type table reconciled to the full v6.18.x set + held-flag note.
WS1.3 — session-diagnostics render-report.py now navigates the nested baselines.json
('primary' snapshot) so the comparison table populates (was all "—"). Added Pattern 10
(KG phase breaker via kg_build_last_error). failure-patterns.md claim reconciled
(auto vs manual). 04-kg-counts.sql gains a per-node-type breakdown + SIMILAR_TO +
held-flag completeness note.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
WS2.1 — wire the 3 previously-missing metrics so all 5 banker-qa alerts back to
real, emitting series:
- claude_kg_phase_duration_ms{phase,status} — new runKgPhase() wrapper in
knowledgeGraphExtractor.js times each banker KG wave phase (1b/1c/4c/4d/11-16).
- claude_qa_dimension_score{dimension} — emitted from sdkHooks.js PreToolUse on
qa-diagnostic-state.json writes (dimensions_scored[d].score).
- Retargeted alerts-banker-qa.yml: subagent-failure → claude_gate_check_results_total
{status=~"failed|error"}; Dim13 → claude_qa_dimension_score; phase1b latency →
claude_kg_phase_duration_ms_bucket. 0 dead metric names remain.
- Hardened scripts/g4-readiness.sh Check B: asserts each alert metric exists in the
registry (+ live /metrics if BASE_URL set) — closes the false-green.
- infrastructure-health/references/prometheus-alerts.md: documents the banker-qa group.
WS2.2 — export the KG circuit breaker as claude_kg_circuit_breaker_state{phase}
(runKgPhase sets it; resets to closed on success). Replaced all 17 dead
claude_circuit_breaker_state{breaker="KG-Phase*"} refs across 5 skills with the
real claude_kg_circuit_breaker_state{phase="KG-Phase*"}.
runKgPhase is behaviour-equivalent to the prior try/catch+recordFailure blocks
(legacy phases 1-10 untouched). All changed modules load + emit; node --check green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…, README) WS3.1 — system-design.md: agent count 45→48 + 3 banker rows; §15 flag registry gains BANKER_QA_OUTPUT + 8 KG flags (5 ON / 3 held) and the KNOWLEDGE_GRAPH row fixed (10→multi-phase); new §16.7 Banker Q&A Workflow (3 agents, G0.5/G2.5/G3.5/G6, 9 artifacts, dormancy contract, bankerQaValidator); §14.9 questions endpoints + migration 025. WS3.2 — schema-evolve: next-migration 015→026 (derive-from-highest), collision-guard referenced (CI now detects), ensure*Schema line numbers refreshed + ensureKnowledgeGraphSchema, KG-type-needs-no-migration note. WS3.3 — subagent-scaffold: 2nd archetype (flag-gated/ship-dormant/no-MCP); missing wiring (hookDBBridgeConfig maps, classifyDocument, intake phase, extended agent-def fields, withheld-allowlist case); fixed wrong paths (domainMcpServers.js, p0GateHook.js); scaffold/verify/wire scripts updated. WS3.4 — deploy + client-provisioner: 9 flags + manual migration 025 (partial b-tree, not HNSW) + Gotcha #7 rewrite + KG_QA_INFORMS dormant-not-useless reconciliation + breaker-metric name fix. schema-doc-validator: extract-truth.py now reads alerts-banker-qa.yml. WS3.5 — stale-ref sweep: client-offboarding 11→16 edge types; failure-patterns Pattern 10 migration 022→025; README regenerated (34 skills, no drift) after adding xlsx-workbook- template-creator + banker-preflip-validation to CATEGORY_MAP; feature-compliance 10→11 dims. PR-4 — new banker-preflip-validation skill (3-tier GO/NO-GO gate for the BANKER_QA_OUTPUT=true flip: isolation harness + g4-readiness metric-existence + live non-Cardinal session). Verification: all changed JS node --check green; py_compile/bash -n green; 118/118 affected node:test suites pass; README --check in sync. (kg-phase6-entities.test.js is a pre-existing jest-only file, unrelated.) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…hygiene, stray symlink Addresses findings from the parallel review of PR #205: - **Remove accidentally-committed `node_modules` symlink** (was added by `git add -A` in the PR-1 commit; a machine-specific abs-path symlink into another worktree, broken on any other checkout). `git rm --cached`. - **BankerCertifierReject** (was `Dim13ScoreLow`): the alert targeted `claude_qa_dimension_score{dimension=~"dim_13.*"}` but NO stable `dim_13` key exists (qa-diagnostic keys are dim_0..dim_12, varying suffixes, no banker fixture to confirm) → it would have been a silent no-data alert. Retargeted to the provably-emitting certifier gate result with an honest in-file note; renamed across g4-readiness.sh, schema-doc-validator, banker-preflip-validation, prometheus-alerts.md. - **Gate-status matchers**: `status=~"failed|error"` missed the real `PARTIAL` status (and `error` is never emitted). Switched alerts 1-4 to `status!~"VERIFIED|passed"` (robust: catches failed/PARTIAL/error regardless of SDK-vs-wrapped vocabulary). - **Swept 6 more dead `claude_circuit_breaker_state{breaker="KG-Phase*"}` refs** in docs/runbooks/wave-{4,5,6,7}-*.md + system-design.md → `claude_kg_circuit_breaker_state{phase=...}`. - **system-design.md** corrections: migration 025 is a partial b-tree (NOT HNSW; 3072-dim > pgvector cap; manual-only) not "the HNSW index"; banker-qa-metadata.json consumed by KG Phase 1b (not 1c) per code. - Swept stale `Dim13ScoreLow` from g4-spec-mapping.md + Banker-Structuring-Output.md. - Fixed stale sdkMetrics.js phase-label comment ('phase4c' → 'KG-Phase4c'). Verification: 89/89 affected node:test suites pass; alerts YAML valid; all 5 alert metrics exist in registry; README --check in sync (34 skills); 0 dead breaker refs and 0 Dim13ScoreLow refs repo-wide. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ctions Pre-existing bug surfaced in the PR #205 review (folded in per decision): the schema-evolve skill pointed operators at three functions that don't exist (ensureSessionsSchema, ensureWave3Schema, ensureCodeExecutionsSchema) plus two non-existent tables (citations, agent_progress) and stale postgres.js line numbers. `/schema-evolve --table access_log` told operators to edit ensureWave3Schema() "around L965" — impossible. Verified against src/db/postgres.js: ensureHookSchema (L1199) runs the DDL constants for nearly the whole app schema (sessions/reports/users/code_executions/source_writes/ access_log/pii_mappings/citation_*/etc.); ensureSchema (L26) only makes runs/tool_calls/ evidence. All three bogus names collapse to ensureHookSchema. - patterns.md: replaced the function→table map with the accurate 9-function map + a note that the session/wave3/code-exec helpers never existed and there is no citations/agent_progress table. - generate-migration.py: detect_ensure_function() now returns only real functions (folded the 2 bogus branches into ensureHookSchema; added artifact/xlsx/chat/ client_errors branches; default points to a grep recipe). PB2/PB3 ensure_patch strings retargeted to ensureHookSchema with grep anchors instead of brittle/wrong line numbers (the line numbers are how this rotted). - SKILL.md: example output ensureSessionsSchema() → ensureHookSchema() (the example patches `reports`, which lives in HOOK_SCHEMA_DDL). Verification: 0 bogus function names; every cited ensure* fn exists in postgres.js; detect_ensure_function dry-run returns correct real functions for access_log/ code_executions/sessions/embeddings/kg_*/report_artifacts/xlsx/chat/client_errors; py_compile OK. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ing migration) The PB1 prebuilt-fix sample emitted a migration that would FAIL on apply and cause dual-path drift — the exact failure class this skill exists to prevent: - citation_source_links referenced a NON-EXISTENT `citations(id)` table (FK error on apply) and used a fabricated shape (SERIAL id, citation_id/source_write_id/ similarity_score) that does not match production. - code_execution_inputs likewise used a fabricated shape (SERIAL id, input_hash/ input_source) vs production. PB1's own description says it backfills DDL that "exists in postgres.js", so it MUST mirror production. Rewrote both CREATE TABLE bodies + indexes to match src/db/ postgres.js (CODE_EXECUTIONS_DDL) exactly: UUID PKs, report_id→reports(id) + citation_marker + source_hash/confidence/matched_via/citation_text + UNIQUE; and execution_id→code_executions(id) + source_type/source_name/source_url/content_hash/ fetched_at/http_status/fetch_mode/source_metadata. down_sql unchanged (already correct). Verification: py_compile OK; 0 `REFERENCES citations(...)` refs; sample CREATE TABLE bodies match production column-for-column. (PB2/PB3 are illustrative add-column examples, not production mirrors — correct as-is. The generic add-table scaffold's SERIAL placeholder is a marked TODO, intentionally generic.) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…dels, feature-compliance) Final-investigation sweep found the same bogus-function bug class in two skills outside the schema-evolve scope: - code-execution-models/SKILL.md: code_executions/code_execution_inputs cited as "ensureCodeExecutionsSchema block" → corrected to CODE_EXECUTIONS_DDL run by ensureHookSchema(). - feature-compliance-scaffold/SKILL.md: D7 example cited "ensureSessionsSchema()" → ensureHookSchema() (reports table lives in HOOK_SCHEMA_DDL). Invariant now holds repo-wide: 0 references to ensureSessionsSchema/ensureWave3Schema/ ensureCodeExecutionsSchema; every ensure*Schema name referenced across all skills resolves to a real `export async function` in src/db/postgres.js. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…line-ref Final adversarial sweep found two doc-only inconsistencies (the doc-drift class this PR exists to eliminate), both in files this PR edited: - prometheus-alerts.md: BankerQAWriterFailure row still showed the old status=~"failed|error" matcher while the actual alert (and the BankerCertifierReject row two lines down) use status!~"VERIFIED|passed". Aligned it. - g4-spec-mapping.md: BankerCertifierReject cited "alerts-banker-qa.yml lines 132-160" (stale after the retarget grew the file). Replaced the brittle line range with a name anchor (`alert: BankerCertifierReject`). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ody consistency) The earlier 45→48 header bump left a header/body mismatch: §4.1 said "48 Subagents" but the table only listed 46 rows (numbered 0-45), missing two agents that ARE in the registry — government-affairs-analyst and macro-economic-analyst (the pre-existing roster-sync gap WS3.1 flagged). The local-main copy the reviewer was viewing still shows the pre-merge "45"; this commit makes the branch table fully self-consistent. - Inserted the 2 missing research agents (rows 21-22, sonnet; gov-affairs → federal-register/congress/regulations-gov/govinfo/sam-gov; macro-economic → fred/bls/ecb + code-execution) and renumbered the remainder (old 21-45 → 23-47). - Table now enumerates rows 0-47 = 48 agents, set-identical to the 48 LEGAL_SUBAGENTS tuples in src/config/legalSubagents/index.js (verified: 0 missing, 0 extra). - Fixed the stale "Phase 2 — Parallel Research (19 Agents)" subheader → (23 Agents). - Updated banker-rows references "rows 43-45" → "rows 45-47" (§4.1 note + § Banker Q&A). (Line 2897 "Zod schemas for all 42 subagent outputs" is the historical issue-#10 deliverable count — point-in-time, left as-is like the dated CHANGELOG snapshots.) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds the missing CHANGELOG entry for the remediation work (erasure, banker-QA metrics + alert retarget, audit-export, diagnostics, system-design 48-agent consistency, schema-evolve ensure*Schema + PB1 fixes, banker-preflip skill). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…e) + root changelog pointer - service CHANGELOG: added Reason / Benefits / Expected-outcome framing above the per-file breakdown. - root CHANGELOG: brief [Unreleased] entry summarizing the remediation (reason/benefit/ outcome/scope) pointing to the service CHANGELOG for full detail — matches the root changelog's "see service CHANGELOG for details" convention. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Number531
added a commit
that referenced
this pull request
Jun 3, 2026
…about as-shipped Audit of PR #206 found the committed plan (marked EXECUTED) self-contradicted and overstated fidelity to what PR #205 actually shipped: - §5 "Out of scope" still listed "wire claude_qa_dimension_score" + "export kgBreaker" as out-of-scope, contradicting the locked decisions (§2 D1=wire, D2=export). Removed. - Added §8 "Execution divergences" recording what shipped vs. planned: the Dim13ScoreLow→BankerCertifierReject retarget (no dim_13 key), 9 artifacts not 8, and the review-time scope expansion (schema-evolve/PB1, cross-skill sweep, §4.1 48-agent table, CHANGELOG) not in the original plan. - Sequencing note: D3 chose one omnibus branch, not 4 PRs. Docs-only. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Number531
added a commit
that referenced
this pull request
Jun 4, 2026
…root CHANGELOG The root summary-pointer CHANGELOG.md tracks major entries (e.g. PR #205) but was missing the G3 fix recorded in the service changelog (commit 842bf97). Adds a matching [Unreleased] summary above the PR #205 entry, pointing to the service CHANGELOG for the full per-file detail. Keeps the two changelogs consistent. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Post-merge remediation of the gaps surfaced auditing
system-design.md+.claude/skills/againstmainafter PR #178 (banker Q&A workflow + 8 KG edge waves). Single omnibus branch per the agreed plan (docs/pending-updates/post-178-remediation-plan.md). Three logical commits.Decisions locked with the operator: wire the missing metrics (not retarget), export the KG breaker to
/metrics, one branch.🔴 High-severity defects (code/compliance/runtime) — PR-1 commit
transcriptArtifacts.js) now deletes the 9 banker artifacts at the session root +review-outputs/(was leaking verbatim deal facts after erasure). Explicit allowlistBANKER_ARTIFACT_FILES, drift-guarded against the banker agents'outputFiles; neverrmthe session dir. +3 tests.client-audit-export):range-query.pynow exportskg_nodes/kg_edges/kg_provenance/kg_evolution;collect-transcripts.shnow bundles the banker artifacts asbanker-qa-artifacts.tar.gz(auto-hashed/uploaded bybundle.sh). SKILL.md edge/node-type table reconciled.render-report.py: navigates the nestedbaselines.json(primarysnapshot) so the comparison table populates (was all—); added Pattern 10 (KG phase breaker); reconciled the auto-vs-manual pattern claim;04-kg-counts.sqlper-node-type breakdown + held-flag note.🟠 Dead-metric traps — PR-2 commit
claude_kg_phase_duration_ms{phase,status}(newrunKgPhase()wrapper times each banker KG wave phase),claude_qa_dimension_score{dimension}(emitted from the qa-diagnostic state-file write hook), and exportedclaude_kg_circuit_breaker_state{phase}(the breaker was in-memory only).alerts-banker-qa.ymlto those real metrics (0 dead metric names remain) and replaced all 17 deadclaude_circuit_breaker_state{breaker="KG-Phase*"}refs across 5 skills with the real metric.g4-readiness.shCheck B to assert each alert's metric exists in the registry (+ live/metricsifBASE_URLset) — closes the false-green on the pre-flip gate.📘 Doc sync + ➕ new skill — PR-3/PR-4 commit
alerts-banker-qa.yml).xlsx-workbook-template-creator+ the newbanker-preflip-validationskill (3-tier GO/NO-GO gate for the flag flip).Verification
node --checkall changed JS green;py_compile/bash -ngreen;alerts-banker-qa.ymlYAML valid; README--checkin sync.node:testsuites pass incl. the new erasure suite; metrics module loads + emits; audit-export collector smoke-tested (banker tarball produced, memo excluded); renderer smoke-tested (baseline table populates, Pattern 10 fires).runKgPhaseis behaviour-equivalent to the prior try/catch blocks; legacy KG phases 1–10 untouched.kg-phase6-entities.test.jsis a pre-existing jest-only file (fails only undernode --test), unrelated to this branch.Not in scope (tracked)
🤖 Generated with Claude Code