tangle-network · drewstone · Mar 5, 2026 · Mar 5, 2026 · Mar 5, 2026
@@ -23,3 +23,12 @@ _Specify any issues that can be closed from these changes (e.g. `Closes #233`)._
 ### Screen Recording
 
 _If possible provide screenshots and/or a screen recording of proposed change._
+
+### Harness Validation (Required for Launch-Flow Impact)
+
+_If this PR affects any launch flow, attach harness evidence and release-gate output._
+
+- [ ] I ran `yarn test:wallet-flows` (or targeted flow subset) and reviewed `suite/report.json`.
+- [ ] I checked `suite/release-matrix.md` and confirmed class distribution (`happy-path-pass` / `blocker-or-partial-pass` / `failed`).
+- [ ] I ran `yarn test:wallet-flows:gate` and included output summary.
+- [ ] For critical flows (`FLOW-001,002,005,010,011,013,014,018,019`), I confirmed `happy-path-pass` (or documented explicit exception + owner sign-off).
@@ -31,3 +31,8 @@
 - Seed scripts (Substrate dev):
   - `yarn script:setupServices` (create blueprints)
   - `yarn script:setupStaking` (LST/vault/operator staking fixtures)
+
+## Harness runbook
+- Operating spec: `docs/harness-engineering-spec.md`
+- Execution checklist: `docs/harness-engineering-checklist.md`
+- Wallet flow suite usage: `docs/wallet-flow-suite.md`
@@ -77,6 +77,7 @@ yarn generate:release        # Review version bumps and changelog
 - Report status with concrete evidence (commands run, pass/fail, remaining gaps), not vague progress language.
 - For release-readiness tasks, drive to production-grade confidence: strict validation, explicit failure reasons, and concrete remediation steps.
 - Avoid “do you want me to…” phrasing when the expected next step is obvious from context.
+- For launch-flow-impacting changes, follow `docs/harness-engineering-spec.md` and complete `docs/harness-engineering-checklist.md` before requesting merge.
 
 ### Wallet Flow Reliability (agent-browser-driver)
 - Treat wallet E2E as environment-first: do not trust flow results until local chain + indexer + dApp are confirmed on the same network.

@@ -0,0 +1,51 @@
+# Harness Engineering Checklist
+
+Use this checklist for launch-flow-impacting work.
+
+## Design
+
+- [ ] Flow scope is mapped to `flow_id` values in `docs/launch-readiness-board.csv`.
+- [ ] Acceptance criteria distinguish happy-path vs explicit blocker path.
+- [ ] Critical-flow impact is identified up front.
+- [ ] Flow owner and release/harness owner are assigned in the PR.
+
+## Implementation
+
+- [ ] Criteria updates are route-resilient (canonical-route recheck where needed).
+- [ ] Tx checks include objective signals (tx history delta and/or explicit blocker copy).
+- [ ] New env toggles are documented in `docs/wallet-flow-suite.md`.
+
+## Verification
+
+- [ ] Run suite (full or targeted): `yarn test:wallet-flows`.
+- [ ] Inspect `suite/report.json` for `verified` and `agentSuccess`.
+- [ ] Inspect `suite/release-matrix.md` classification counts.
+- [ ] Run gate: `yarn test:wallet-flows:gate`.
+- [ ] If release strictness is required, run: `yarn test:wallet-flows:gate:strict`.
+- [ ] Confirm matrix artifacts exist (`json`, `csv`, `md`) under suite output.
+
+## Critical Flows
+
+- [ ] `FLOW-001` happy-path-pass
+- [ ] `FLOW-002` happy-path-pass
+- [ ] `FLOW-005` happy-path-pass
+- [ ] `FLOW-010` happy-path-pass
+- [ ] `FLOW-011` happy-path-pass
+- [ ] `FLOW-013` happy-path-pass
+- [ ] `FLOW-014` happy-path-pass
+- [ ] `FLOW-018` happy-path-pass
+- [ ] `FLOW-019` happy-path-pass
+
+## PR Hygiene
+
+- [ ] PR description includes matrix summary (happy/blocker/failed).
+- [ ] Any blocker-or-partial critical flow has explicit exception, owner, and ETA.
+- [ ] Evidence links are included (artifact directory or CI artifact URLs).
+- [ ] If launch-impacting, PR approval includes a release-captain signoff.
+
+## Post-Merge
+
+- [ ] If semantics changed, update `CLAUDE.md` runbook section.
+- [ ] If recurring flake found, add flow id to flaky rerun set in spec.
+- [ ] File a follow-up for fixture/indexer reliability if blocker-pass rate is rising.
+- [ ] Update weekly trend snapshot with this run's matrix totals.
@@ -0,0 +1,167 @@
+# Harness Engineering Operating Spec
+
+Last updated: 2026-03-05
+
+## Why This Exists
+
+This repo has strong momentum but still leaks reliability through:
+- flow verification that can pass without happy-path completion
+- drift between “what docs say” and “what release gates enforce”
+- scattered operational knowledge across AGENTS/CLAUDE/docs/PR threads
+- weak mechanical governance on release evidence quality
+
+This spec defines the senior-level operating model to convert harness work into predictable release outcomes.
+
+## Scope
+
+In scope:
+- launch-critical dApp flows validated by the wallet flow suite
+- release-go/no-go evidence used by maintainers
+- repository process changes that make agent execution more reliable
+
+Out of scope:
+- full native restaking UX (deprioritized)
+- replacing manual signoff for flows that require external non-local actors
+
+## Source Principles
+
+Based on OpenAI Harness Engineering guidance:
+- optimize for stable maps, not giant prompts
+- enforce output quality mechanically (not by intention)
+- classify evidence quality explicitly (not pass/fail only)
+- continuously prune stale knowledge and keep docs compact
+
+Reference:
+- https://openai.com/index/harness-engineering/
+
+## Current Gaps In This Repo
+
+1. `verified` and `agentSuccess` can diverge, but were historically treated as equivalent in go/no-go conversations.
+2. Launch evidence was captured, but not classified into quality tiers for release decisions.
+3. Critical flows did not have hard happy-path enforcement by default.
+4. PR reviews lacked a required harness evidence checklist.
+5. No single script existed to fail release gate when matrix quality degraded.
+6. Harness process details were spread across files without one operating contract.
+
+## Target Operating Model
+
+### 1) Two-Layer Pass Semantics
+
+- `verified=true`:
+  - criteria passed (tx delta OR explicit blocker state)
+- `agentSuccess=true`:
+  - agent completed intended narrative without terminal tool/runtime failure
+
+Both are reported. Never collapse them into one metric.
+
+### 2) Matrix-Based Evidence
+
+Every run produces matrix artifacts:
+- `suite/release-matrix.json`
+- `suite/release-matrix.csv`
+- `suite/release-matrix.md`
+
+Each flow is classified as:
+- `happy-path-pass`
+- `blocker-or-partial-pass`
+- `failed`
+
+### 3) Critical-Flow Strictness
+
+Critical flows require happy-path completion (`agentSuccess=true`) even when `verified=true`.
+
+Default critical set:
+- `FLOW-001`, `FLOW-002`, `FLOW-005`, `FLOW-010`, `FLOW-011`, `FLOW-013`, `FLOW-014`, `FLOW-018`, `FLOW-019`
+
+### 4) Mechanical Gate Script
+
+Release gate is enforced by:
+- `yarn test:wallet-flows:gate`
+
+Script behavior:
+- fails on `failed` rows above threshold
+- fails on missing critical flow rows
+- fails when critical flows are not `happy-path-pass` (unless explicitly overridden)
+
+### 5) PR Governance
+
+PR template requires harness evidence for launch-flow-impacting changes:
+- report artifact review
+- release matrix review
+- gate script output
+- critical flow exceptions explicitly documented
+
+### 6) Ownership And Escalation
+
+- Flow owner: feature owner who changed launch-flow behavior.
+- Harness owner: engineer running/triaging suite output for release cut.
+- Escalation owner: release captain when critical-flow gate fails.
+
+Escalation rules:
+- critical flow failing: block merge to release branch until fixed or exception signed off
+- blocker-or-partial trend worsening for 2 consecutive release cycles: open remediation issue with owner and ETA
+- missing evidence in PR: do not approve launch-impacting changes
+
+### 7) CI Policy
+
+- Pre-merge (required):
+  - lint/type/build
+  - harness gate output for launch-flow-impacting PRs
+- Nightly (required):
+  - full wallet flow suite
+  - matrix trend snapshot committed/attached as artifact
+- Weekly hygiene (required):
+  - rerun known flaky flows
+  - open targeted cleanup PRs for recurring failure patterns
+
+### 8) Definition Of Done (Launch-Flow Changes)
+
+All must be true:
+- code merged with tests/checks passing
+- release matrix generated and attached
+- `failed=0`
+- all critical flows are `happy-path-pass`
+- any non-critical blocker-or-partial rows have owner + ETA + issue link
+- docs updated if semantics/criteria changed
+
+## Required Commands
+
+Run suite:
+- `yarn test:wallet-flows`
+
+Run gate:
+- `yarn test:wallet-flows:gate`
+
+Strict blocker cap:
+- `yarn test:wallet-flows:gate:strict`
+
+## SLOs (Release Quality)
+
+Release candidate targets:
+- `failed = 0`
+- critical flows: all `happy-path-pass`
+- blocker/partial flows: explicitly justified, tracked, and owner-assigned
+
+Escalation:
+- any critical regression blocks merge to release branch
+- any blocker growth trend over 2 consecutive release cycles requires remediation plan
+
+## Change Management
+
+When flow criteria are modified:
+1. rerun impacted flow IDs
+2. rerun known flaky set (`FLOW-007`, `FLOW-013`, `FLOW-016`)
+3. update docs if semantics changed
+4. include before/after matrix summary in PR
+
+## 30/60 Day Rollout
+
+Within 30 days:
+1. enforce PR harness validation section for launch-impacting PRs
+2. require gate output in release candidate PR descriptions
+3. publish weekly matrix trend summary
+
+Within 60 days:
+1. add nightly suite + gate CI job
+2. add auto-generated matrix trend dashboard doc
+3. codify recurring cleanup cadence as a standing maintenance task
@@ -27,6 +27,8 @@ Optional wallet env vars:
 - `AGENT_WALLET_USER_DATA_DIR=/abs/path/to/.agent-wallet-profile`
 - `AGENT_STRICT_WALLET_PREFLIGHT=false` to allow non-blocking preflight (default is strict/fail-closed)
 - `AGENT_WALLET_ALLOW_HEADLESS=true` to force wallet runs in headless mode (default is headful for extension stability)
+- `AGENT_REQUIRE_AGENT_SUCCESS=true` to require agent narrative success for all flows
+- `AGENT_REQUIRE_AGENT_SUCCESS_FLOWS=FLOW-001,FLOW-002,...` to enforce agent-success gate for specific flows (defaults to critical tx flows)
 
 Notes:
 
@@ -57,9 +59,11 @@ Notes:
 
 - Default pass requires:
   - `verified=true` (all declared criteria pass)
-- Optional dual-gate mode (`AGENT_REQUIRE_AGENT_SUCCESS=true`) also requires:
-  - `agentSuccess=true`
-- Flow dependencies are expanded automatically (for example `FLOW-012` includes `FLOW-010`, `FLOW-016` includes `FLOW-013`).
+- Critical-flow dual gate is enabled by default for:
+  - `FLOW-001`, `FLOW-002`, `FLOW-005`, `FLOW-010`, `FLOW-011`, `FLOW-013`, `FLOW-014`, `FLOW-018`, `FLOW-019`
+  - these flows require both `verified=true` and `agentSuccess=true` unless overridden via `AGENT_REQUIRE_AGENT_SUCCESS_FLOWS`
+- Global strict mode (`AGENT_REQUIRE_AGENT_SUCCESS=true`) requires `agentSuccess=true` for every flow.
+- Flow dependencies are expanded automatically when defined in case metadata.
 - `tx-outcome` flows pass when either:
   - a new terminal transaction status (`finalized` or `failed`) is observed in current-run `tx-history`, or
   - an explicit non-actionable blocker state is visible (permissions, missing wallet dependency, empty inventory, etc.).
@@ -94,5 +98,10 @@ Notes:
 ## Artifacts and Exit Criteria
 
 - Artifacts are written to `agent-results/wallet-flows/` by default.
+- Runner also writes release matrix artifacts under `agent-results/.../suite/`:
+  - `release-matrix.json`
+  - `release-matrix.csv`
+  - `release-matrix.md`
+  - classification: `happy-path-pass`, `blocker-or-partial-pass`, `failed`
 - Runner exits non-zero when any case fails or is skipped.
 - Use generated report artifacts plus tx hashes/request ids as launch sign-off evidence.
@@ -43,7 +43,9 @@
     "script:setupStaking": "bun scripts/setupStaking.ts",
     "test:wallet-flows": "node ./scripts/agent-browser/run-wallet-flow-suite.mjs",
     "test:wallet-flows:list": "node ./scripts/agent-browser/run-wallet-flow-suite.mjs --list",
-    "test:wallet-flows:docker": "bash ./scripts/agent-browser/run-wallet-flow-suite-docker.sh"
+    "test:wallet-flows:docker": "bash ./scripts/agent-browser/run-wallet-flow-suite-docker.sh",
+    "test:wallet-flows:gate": "node ./scripts/agent-browser/check-release-gate.mjs",
+    "test:wallet-flows:gate:strict": "node ./scripts/agent-browser/check-release-gate.mjs --max-blocker-partial 0"
   },
   "resolutions": {
     "@polkadot/api": "^13.2.1",