Skip to content

[Architecture] Defense-in-depth hook rule for QA bypass — deferred pending Task #121 empirical validation #179

Description

@Number531

Context

Closes a deferred architectural decision surfaced during the SpaceX IPO E2E canary on 2026-05-27 (session 2026-05-27-1779903178 in production DB).

Branch context: feature/wrapped-subagents-migration
Related work: Task #121 (QA bypass prompt fix — three coordinated edits to certifier.js, orchestrator.md, remediation-agent.md, already applied + tested)
Related session: Production DB session 2026-05-27-1779903178 — cycle 1 bypass + cycle 2 successful remediation
Trigger artifact: reports/2026-05-27-1779903178/qa-outputs/delivery-decision-v2.md (88 → 97 score progression after manual remediation override)


What was bypassed

During SpaceX IPO canary cycle 1, the orchestrator dispatched memo-qa-certifier directly after memo-qa-diagnostic returned outcome: CONDITIONAL with 10 unresolved issues (3 HIGH including DIM9-RL2-001 — missing draft contract provision for HIGH-severity EU 2 GHz MSS regulatory finding). The orchestrator NEVER emitted mcp__subagents__run_memo_remediation_writer despite:

  • The agent being in the wrapped subagent allowlist (flags.env:118)
  • The diagnostic having written remediation-plan.md + remediation-dispatch.md
  • The diagnostic's outcome explicitly stating "TIER 2 STANDARD" remediation required

The certifier accepted 88/100 + zero CRITICALs and issued CERTIFY_WITH_LIMITATIONS — shipping a memo with substantive legal gaps disclosed as 'limitations.'

Empirical validation: when remediation was MANUALLY invoked in cycle 2, the same pipeline (same agents, same infrastructure) successfully remediated the issues and achieved 97/100 CERTIFIED status. The bypass was specifically the orchestrator's decision NOT to invoke remediation, not a missing capability.


Task #121 (already applied) — three-layer prompt fix

Three coordinated prompt edits address the bypass at the prompt level:

  1. src/config/legalSubagents/agents/memo-qa-certifier.js — added LOOP_FOR_REMEDIATION decision row, tightened CERTIFY_WITH_LIMITATIONS to require no SUBSTANTIVE unresolved HIGHs, added SUBSTANTIVE-vs-EDITORIAL classification table
  2. prompts/memorandum-orchestrator.md — added MANDATORY REMEDIATION ROUTING decision table
  3. prompts/memorandum-synthesis/remediation-agent.md — narrowed DIRECT REMEDIATION PATH eligibility

Status: shipped to branch feature/wrapped-subagents-migration. Empirical validation pending in next canary.


The case for ADDITIONAL hook engine enforcement

Prompts are persuasive instructions, not enforcement. They can be:

  • Ignored under adaptive thinking pressure (cost/latency optimization bias)
  • Eroded by context compaction in long sessions
  • Interpreted differently by different models (Sonnet 4.6 vs 4.7 vs GPT-5.4)
  • Subject to drift over future prompt edits without regression tests

Defense-in-depth pattern already established in this codebase:

A blockCertifierWithoutRemediation hook rule would make the SpaceX-pattern bypass structurally impossible at the runtime layer, not just prompt-level.


The case AGAINST building NOW

Architectural risk investigation (4-agent Explore on 2026-05-27) surfaced concrete concerns:

🔴 Show-stopper risks (require resolution before building)

  1. State files NOT written atomicallyqa-diagnostic-state.json uses direct fs.writeFileSync(). atomicWriteText exists (src/wrappedSubagents/transcriptUtils.js:45-66) but isn't used for QA state files. Hook reading mid-write could see partial JSON.

  2. Write ordering race window — state file written DURING agent execution, NOT atomically with tool_result return. Hook may evaluate before state file is fully on disk.

🟡 High risks (mitigatable)

  1. Filesystem reads in hook are a NEW pattern — only largeReadCap does selective fs.stat(). A rule reading 2-3 paths is unprecedented in the engine.
  2. Provider portability — Anthropic Messages API specific. OpenAI Responses API has no equivalent hook surface. Becomes dead code if Phase 0.5 GPT-5.4 migration proceeds.
  3. Recovery requires restart — no admin endpoint, no SIGHUP, no hot reload. False-positive bug = 30-60s downtime via flags.env + redeploy.

🟢 Addressable (standard mitigation)

  1. Handler throws silently swallowed → wrap in try/catch + Prometheus counter
  2. No Prometheus metrics yet → follow existing project pattern

Required design (if proceeding)

Use in-memory ctx state, NOT filesystem reads — eliminates risks 1 + 2:

// In runner.js after diagnostic returns:
if (toolName === 'mcp__subagents__run_memo_qa_diagnostic') {
  const result = JSON.parse(toolResult.content);
  ctx.sessionState.lastDiagnosticOutcome = result.outcome;
  ctx.sessionState.lastDiagnosticScore = result.score;
  ctx.sessionState.lastDiagnosticCycle = result.qa_cycle.current_cycle;
}

// Hook rule reads ctx.sessionState — no filesystem, no race

4 safety gates required:

  1. Allow certifier when current_cycle >= max_cycles (respect REJECT_ESCALATE → human-review-required.md exit path)
  2. Allow when outcome === 'CONDITIONAL_FINAL' or 'CERTIFIED'
  3. Per-session deny ceiling (3 denies) → emergency passthrough to prevent cost-runaway loops
  4. Default-allow on any internal rule failure (never block on uncertainty)

Re-entry triggers (REOPEN/PROMOTE this issue if)

Trigger Action
Task #121 bypass observed in 5-10 production canaries ESCALATE — empirically justified
Before OpenAI Phase 0.5 canary REVISIT — different model = different prompt interpretation
Before production deployment to paying clients STRONGLY CONSIDER — defense-in-depth for legal-grade work
Any regression where unremediated HIGH ships IMMEDIATE — proves prompt fix insufficient

Estimated work (when triggered)

  • ~200 LOC rule + ~80 LOC tests + 1-2 weeks observe canary + flip to enforce
  • 3-5 days engineering total
  • Must include: in-memory ctx state pattern, 4 safety gates, Prometheus counter, observe → enforce mode progression
  • Must coordinate with: memo-qa-diagnostic agent updates to write outcome into ctx (~30 LOC)

Status

DEFERRED PENDING EMPIRICAL EVIDENCE

Task #121 prompt fixes are the first line of defense. This hook rule is the architectural backup if prompts prove bypassable. Building speculatively before empirical evidence would be premature optimization with non-trivial complexity costs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium priority — post-launchinfrastructureBackend/infrastructure changes

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions