Context
Closes a deferred architectural decision surfaced during the SpaceX IPO E2E canary on 2026-05-27 (session 2026-05-27-1779903178 in production DB).
Branch context: feature/wrapped-subagents-migration
Related work: Task #121 (QA bypass prompt fix — three coordinated edits to certifier.js, orchestrator.md, remediation-agent.md, already applied + tested)
Related session: Production DB session 2026-05-27-1779903178 — cycle 1 bypass + cycle 2 successful remediation
Trigger artifact: reports/2026-05-27-1779903178/qa-outputs/delivery-decision-v2.md (88 → 97 score progression after manual remediation override)
What was bypassed
During SpaceX IPO canary cycle 1, the orchestrator dispatched memo-qa-certifier directly after memo-qa-diagnostic returned outcome: CONDITIONAL with 10 unresolved issues (3 HIGH including DIM9-RL2-001 — missing draft contract provision for HIGH-severity EU 2 GHz MSS regulatory finding). The orchestrator NEVER emitted mcp__subagents__run_memo_remediation_writer despite:
- The agent being in the wrapped subagent allowlist (
flags.env:118)
- The diagnostic having written
remediation-plan.md + remediation-dispatch.md
- The diagnostic's outcome explicitly stating "TIER 2 STANDARD" remediation required
The certifier accepted 88/100 + zero CRITICALs and issued CERTIFY_WITH_LIMITATIONS — shipping a memo with substantive legal gaps disclosed as 'limitations.'
Empirical validation: when remediation was MANUALLY invoked in cycle 2, the same pipeline (same agents, same infrastructure) successfully remediated the issues and achieved 97/100 CERTIFIED status. The bypass was specifically the orchestrator's decision NOT to invoke remediation, not a missing capability.
Task #121 (already applied) — three-layer prompt fix
Three coordinated prompt edits address the bypass at the prompt level:
src/config/legalSubagents/agents/memo-qa-certifier.js — added LOOP_FOR_REMEDIATION decision row, tightened CERTIFY_WITH_LIMITATIONS to require no SUBSTANTIVE unresolved HIGHs, added SUBSTANTIVE-vs-EDITORIAL classification table
prompts/memorandum-orchestrator.md — added MANDATORY REMEDIATION ROUTING decision table
prompts/memorandum-synthesis/remediation-agent.md — narrowed DIRECT REMEDIATION PATH eligibility
Status: shipped to branch feature/wrapped-subagents-migration. Empirical validation pending in next canary.
The case for ADDITIONAL hook engine enforcement
Prompts are persuasive instructions, not enforcement. They can be:
- Ignored under adaptive thinking pressure (cost/latency optimization bias)
- Eroded by context compaction in long sessions
- Interpreted differently by different models (Sonnet 4.6 vs 4.7 vs GPT-5.4)
- Subject to drift over future prompt edits without regression tests
Defense-in-depth pattern already established in this codebase:
A blockCertifierWithoutRemediation hook rule would make the SpaceX-pattern bypass structurally impossible at the runtime layer, not just prompt-level.
The case AGAINST building NOW
Architectural risk investigation (4-agent Explore on 2026-05-27) surfaced concrete concerns:
🔴 Show-stopper risks (require resolution before building)
-
State files NOT written atomically — qa-diagnostic-state.json uses direct fs.writeFileSync(). atomicWriteText exists (src/wrappedSubagents/transcriptUtils.js:45-66) but isn't used for QA state files. Hook reading mid-write could see partial JSON.
-
Write ordering race window — state file written DURING agent execution, NOT atomically with tool_result return. Hook may evaluate before state file is fully on disk.
🟡 High risks (mitigatable)
- Filesystem reads in hook are a NEW pattern — only
largeReadCap does selective fs.stat(). A rule reading 2-3 paths is unprecedented in the engine.
- Provider portability — Anthropic Messages API specific. OpenAI Responses API has no equivalent hook surface. Becomes dead code if Phase 0.5 GPT-5.4 migration proceeds.
- Recovery requires restart — no admin endpoint, no SIGHUP, no hot reload. False-positive bug = 30-60s downtime via flags.env + redeploy.
🟢 Addressable (standard mitigation)
- Handler throws silently swallowed → wrap in try/catch + Prometheus counter
- No Prometheus metrics yet → follow existing project pattern
Required design (if proceeding)
Use in-memory ctx state, NOT filesystem reads — eliminates risks 1 + 2:
// In runner.js after diagnostic returns:
if (toolName === 'mcp__subagents__run_memo_qa_diagnostic') {
const result = JSON.parse(toolResult.content);
ctx.sessionState.lastDiagnosticOutcome = result.outcome;
ctx.sessionState.lastDiagnosticScore = result.score;
ctx.sessionState.lastDiagnosticCycle = result.qa_cycle.current_cycle;
}
// Hook rule reads ctx.sessionState — no filesystem, no race
4 safety gates required:
- Allow certifier when
current_cycle >= max_cycles (respect REJECT_ESCALATE → human-review-required.md exit path)
- Allow when
outcome === 'CONDITIONAL_FINAL' or 'CERTIFIED'
- Per-session deny ceiling (3 denies) → emergency passthrough to prevent cost-runaway loops
- Default-allow on any internal rule failure (never block on uncertainty)
Re-entry triggers (REOPEN/PROMOTE this issue if)
| Trigger |
Action |
| Task #121 bypass observed in 5-10 production canaries |
ESCALATE — empirically justified |
| Before OpenAI Phase 0.5 canary |
REVISIT — different model = different prompt interpretation |
| Before production deployment to paying clients |
STRONGLY CONSIDER — defense-in-depth for legal-grade work |
| Any regression where unremediated HIGH ships |
IMMEDIATE — proves prompt fix insufficient |
Estimated work (when triggered)
- ~200 LOC rule + ~80 LOC tests + 1-2 weeks observe canary + flip to enforce
- 3-5 days engineering total
- Must include: in-memory ctx state pattern, 4 safety gates, Prometheus counter, observe → enforce mode progression
- Must coordinate with:
memo-qa-diagnostic agent updates to write outcome into ctx (~30 LOC)
Status
DEFERRED PENDING EMPIRICAL EVIDENCE
Task #121 prompt fixes are the first line of defense. This hook rule is the architectural backup if prompts prove bypassable. Building speculatively before empirical evidence would be premature optimization with non-trivial complexity costs.
Context
Closes a deferred architectural decision surfaced during the SpaceX IPO E2E canary on 2026-05-27 (session
2026-05-27-1779903178in production DB).Branch context:
feature/wrapped-subagents-migrationRelated work: Task #121 (QA bypass prompt fix — three coordinated edits to certifier.js, orchestrator.md, remediation-agent.md, already applied + tested)
Related session: Production DB session
2026-05-27-1779903178— cycle 1 bypass + cycle 2 successful remediationTrigger artifact:
reports/2026-05-27-1779903178/qa-outputs/delivery-decision-v2.md(88 → 97 score progression after manual remediation override)What was bypassed
During SpaceX IPO canary cycle 1, the orchestrator dispatched
memo-qa-certifierdirectly aftermemo-qa-diagnosticreturnedoutcome: CONDITIONALwith 10 unresolved issues (3 HIGH includingDIM9-RL2-001— missing draft contract provision for HIGH-severity EU 2 GHz MSS regulatory finding). The orchestrator NEVER emittedmcp__subagents__run_memo_remediation_writerdespite:flags.env:118)remediation-plan.md+remediation-dispatch.mdThe certifier accepted 88/100 + zero CRITICALs and issued
CERTIFY_WITH_LIMITATIONS— shipping a memo with substantive legal gaps disclosed as 'limitations.'Empirical validation: when remediation was MANUALLY invoked in cycle 2, the same pipeline (same agents, same infrastructure) successfully remediated the issues and achieved 97/100 CERTIFIED status. The bypass was specifically the orchestrator's decision NOT to invoke remediation, not a missing capability.
Task #121 (already applied) — three-layer prompt fix
Three coordinated prompt edits address the bypass at the prompt level:
src/config/legalSubagents/agents/memo-qa-certifier.js— addedLOOP_FOR_REMEDIATIONdecision row, tightenedCERTIFY_WITH_LIMITATIONSto require no SUBSTANTIVE unresolved HIGHs, added SUBSTANTIVE-vs-EDITORIAL classification tableprompts/memorandum-orchestrator.md— added MANDATORY REMEDIATION ROUTING decision tableprompts/memorandum-synthesis/remediation-agent.md— narrowed DIRECT REMEDIATION PATH eligibilityStatus: shipped to branch
feature/wrapped-subagents-migration. Empirical validation pending in next canary.The case for ADDITIONAL hook engine enforcement
Prompts are persuasive instructions, not enforcement. They can be:
Defense-in-depth pattern already established in this codebase:
orchestratorCodeExecBlockhook rulelargeReadCaprule enforces 200KB Read cap structurally, not via promptperAgentToolAllowlist(observe mode) is the planned per-agent enforcement layerA
blockCertifierWithoutRemediationhook rule would make the SpaceX-pattern bypass structurally impossible at the runtime layer, not just prompt-level.The case AGAINST building NOW
Architectural risk investigation (4-agent Explore on 2026-05-27) surfaced concrete concerns:
🔴 Show-stopper risks (require resolution before building)
State files NOT written atomically —
qa-diagnostic-state.jsonuses directfs.writeFileSync().atomicWriteTextexists (src/wrappedSubagents/transcriptUtils.js:45-66) but isn't used for QA state files. Hook reading mid-write could see partial JSON.Write ordering race window — state file written DURING agent execution, NOT atomically with tool_result return. Hook may evaluate before state file is fully on disk.
🟡 High risks (mitigatable)
largeReadCapdoes selectivefs.stat(). A rule reading 2-3 paths is unprecedented in the engine.🟢 Addressable (standard mitigation)
Required design (if proceeding)
Use in-memory ctx state, NOT filesystem reads — eliminates risks 1 + 2:
4 safety gates required:
current_cycle >= max_cycles(respect REJECT_ESCALATE → human-review-required.md exit path)outcome === 'CONDITIONAL_FINAL'or'CERTIFIED'Re-entry triggers (REOPEN/PROMOTE this issue if)
Estimated work (when triggered)
memo-qa-diagnosticagent updates to write outcome into ctx (~30 LOC)Status
DEFERRED PENDING EMPIRICAL EVIDENCE
Task #121 prompt fixes are the first line of defense. This hook rule is the architectural backup if prompts prove bypassable. Building speculatively before empirical evidence would be premature optimization with non-trivial complexity costs.