[Architecture] Defense-in-depth hook rule for QA bypass — deferred pending Task #121 empirical validation

## Context

Closes a deferred architectural decision surfaced during the SpaceX IPO E2E canary on 2026-05-27 (session `2026-05-27-1779903178` in production DB).

**Branch context**: `feature/wrapped-subagents-migration`
**Related work**: Task #121 (QA bypass prompt fix — three coordinated edits to certifier.js, orchestrator.md, remediation-agent.md, already applied + tested)
**Related session**: Production DB session `2026-05-27-1779903178` — cycle 1 bypass + cycle 2 successful remediation
**Trigger artifact**: `reports/2026-05-27-1779903178/qa-outputs/delivery-decision-v2.md` (88 → 97 score progression after manual remediation override)

---

## What was bypassed

During SpaceX IPO canary cycle 1, the orchestrator dispatched `memo-qa-certifier` directly after `memo-qa-diagnostic` returned `outcome: CONDITIONAL` with 10 unresolved issues (3 HIGH including `DIM9-RL2-001` — missing draft contract provision for HIGH-severity EU 2 GHz MSS regulatory finding). The orchestrator NEVER emitted `mcp__subagents__run_memo_remediation_writer` despite:

- The agent being in the wrapped subagent allowlist (`flags.env:118`)
- The diagnostic having written `remediation-plan.md` + `remediation-dispatch.md`
- The diagnostic's outcome explicitly stating "TIER 2 STANDARD" remediation required

The certifier accepted 88/100 + zero CRITICALs and issued `CERTIFY_WITH_LIMITATIONS` — shipping a memo with substantive legal gaps disclosed as 'limitations.'

Empirical validation: when remediation was MANUALLY invoked in cycle 2, the same pipeline (same agents, same infrastructure) successfully remediated the issues and achieved 97/100 CERTIFIED status. The bypass was **specifically the orchestrator's decision NOT to invoke remediation**, not a missing capability.

---

## Task #121 (already applied) — three-layer prompt fix

Three coordinated prompt edits address the bypass at the prompt level:

1. `src/config/legalSubagents/agents/memo-qa-certifier.js` — added `LOOP_FOR_REMEDIATION` decision row, tightened `CERTIFY_WITH_LIMITATIONS` to require no SUBSTANTIVE unresolved HIGHs, added SUBSTANTIVE-vs-EDITORIAL classification table
2. `prompts/memorandum-orchestrator.md` — added MANDATORY REMEDIATION ROUTING decision table
3. `prompts/memorandum-synthesis/remediation-agent.md` — narrowed DIRECT REMEDIATION PATH eligibility

**Status**: shipped to branch `feature/wrapped-subagents-migration`. Empirical validation pending in next canary.

---

## The case for ADDITIONAL hook engine enforcement

Prompts are persuasive instructions, not enforcement. They can be:
- Ignored under adaptive thinking pressure (cost/latency optimization bias)
- Eroded by context compaction in long sessions
- Interpreted differently by different models (Sonnet 4.6 vs 4.7 vs GPT-5.4)
- Subject to drift over future prompt edits without regression tests

Defense-in-depth pattern already established in this codebase:
- Task #105 replaced brittle Layer C prompt with `orchestratorCodeExecBlock` hook rule
- `largeReadCap` rule enforces 200KB Read cap structurally, not via prompt
- `perAgentToolAllowlist` (observe mode) is the planned per-agent enforcement layer

A `blockCertifierWithoutRemediation` hook rule would make the SpaceX-pattern bypass **structurally impossible** at the runtime layer, not just prompt-level.

---

## The case AGAINST building NOW

Architectural risk investigation (4-agent Explore on 2026-05-27) surfaced concrete concerns:

### 🔴 Show-stopper risks (require resolution before building)

1. **State files NOT written atomically** — `qa-diagnostic-state.json` uses direct `fs.writeFileSync()`. `atomicWriteText` exists (`src/wrappedSubagents/transcriptUtils.js:45-66`) but isn't used for QA state files. Hook reading mid-write could see partial JSON.

2. **Write ordering race window** — state file written DURING agent execution, NOT atomically with tool_result return. Hook may evaluate before state file is fully on disk.

### 🟡 High risks (mitigatable)

3. **Filesystem reads in hook are a NEW pattern** — only `largeReadCap` does selective `fs.stat()`. A rule reading 2-3 paths is unprecedented in the engine.
4. **Provider portability** — Anthropic Messages API specific. OpenAI Responses API has no equivalent hook surface. Becomes dead code if Phase 0.5 GPT-5.4 migration proceeds.
5. **Recovery requires restart** — no admin endpoint, no SIGHUP, no hot reload. False-positive bug = 30-60s downtime via flags.env + redeploy.

### 🟢 Addressable (standard mitigation)

6. Handler throws silently swallowed → wrap in try/catch + Prometheus counter
7. No Prometheus metrics yet → follow existing project pattern

---

## Required design (if proceeding)

**Use in-memory ctx state, NOT filesystem reads** — eliminates risks 1 + 2:

```javascript
// In runner.js after diagnostic returns:
if (toolName === 'mcp__subagents__run_memo_qa_diagnostic') {
  const result = JSON.parse(toolResult.content);
  ctx.sessionState.lastDiagnosticOutcome = result.outcome;
  ctx.sessionState.lastDiagnosticScore = result.score;
  ctx.sessionState.lastDiagnosticCycle = result.qa_cycle.current_cycle;
}

// Hook rule reads ctx.sessionState — no filesystem, no race
```

**4 safety gates required**:
1. Allow certifier when `current_cycle >= max_cycles` (respect REJECT_ESCALATE → human-review-required.md exit path)
2. Allow when `outcome === 'CONDITIONAL_FINAL'` or `'CERTIFIED'`
3. Per-session deny ceiling (3 denies) → emergency passthrough to prevent cost-runaway loops
4. Default-allow on any internal rule failure (never block on uncertainty)

---

## Re-entry triggers (REOPEN/PROMOTE this issue if)

| Trigger | Action |
|---|---|
| Task #121 bypass observed in 5-10 production canaries | ESCALATE — empirically justified |
| Before OpenAI Phase 0.5 canary | REVISIT — different model = different prompt interpretation |
| Before production deployment to paying clients | STRONGLY CONSIDER — defense-in-depth for legal-grade work |
| Any regression where unremediated HIGH ships | IMMEDIATE — proves prompt fix insufficient |

---

## Estimated work (when triggered)

- ~200 LOC rule + ~80 LOC tests + 1-2 weeks observe canary + flip to enforce
- 3-5 days engineering total
- Must include: in-memory ctx state pattern, 4 safety gates, Prometheus counter, observe → enforce mode progression
- Must coordinate with: `memo-qa-diagnostic` agent updates to write outcome into ctx (~30 LOC)

---

## Status

**DEFERRED PENDING EMPIRICAL EVIDENCE**

Task #121 prompt fixes are the first line of defense. This hook rule is the architectural backup if prompts prove bypassable. Building speculatively before empirical evidence would be premature optimization with non-trivial complexity costs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Architecture] Defense-in-depth hook rule for QA bypass — deferred pending Task #121 empirical validation #179

Context

What was bypassed

Task #121 (already applied) — three-layer prompt fix

The case for ADDITIONAL hook engine enforcement

The case AGAINST building NOW

🔴 Show-stopper risks (require resolution before building)

🟡 High risks (mitigatable)

🟢 Addressable (standard mitigation)

Required design (if proceeding)

Re-entry triggers (REOPEN/PROMOTE this issue if)

Estimated work (when triggered)

Status

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Trigger	Action
Task #121 bypass observed in 5-10 production canaries	ESCALATE — empirically justified
Before OpenAI Phase 0.5 canary	REVISIT — different model = different prompt interpretation
Before production deployment to paying clients	STRONGLY CONSIDER — defense-in-depth for legal-grade work
Any regression where unremediated HIGH ships	IMMEDIATE — proves prompt fix insufficient

[Architecture] Defense-in-depth hook rule for QA bypass — deferred pending Task #121 empirical validation #179

Description

Context

What was bypassed

Task #121 (already applied) — three-layer prompt fix

The case for ADDITIONAL hook engine enforcement

The case AGAINST building NOW

🔴 Show-stopper risks (require resolution before building)

🟡 High risks (mitigatable)

🟢 Addressable (standard mitigation)

Required design (if proceeding)

Re-entry triggers (REOPEN/PROMOTE this issue if)

Estimated work (when triggered)

Status

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions