Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,3 +23,12 @@ _Specify any issues that can be closed from these changes (e.g. `Closes #233`)._
### Screen Recording

_If possible provide screenshots and/or a screen recording of proposed change._

### Harness Validation (Required for Launch-Flow Impact)

_If this PR affects any launch flow, attach harness evidence and release-gate output._

- [ ] I ran `yarn test:wallet-flows` (or targeted flow subset) and reviewed `suite/report.json`.
- [ ] I checked `suite/release-matrix.md` and confirmed class distribution (`happy-path-pass` / `blocker-or-partial-pass` / `failed`).
- [ ] I ran `yarn test:wallet-flows:gate` and included output summary.
- [ ] For critical flows (`FLOW-001,002,005,010,011,013,014,018,019`), I confirmed `happy-path-pass` (or documented explicit exception + owner sign-off).
5 changes: 5 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,3 +31,8 @@
- Seed scripts (Substrate dev):
- `yarn script:setupServices` (create blueprints)
- `yarn script:setupStaking` (LST/vault/operator staking fixtures)

## Harness runbook
- Operating spec: `docs/harness-engineering-spec.md`
- Execution checklist: `docs/harness-engineering-checklist.md`
- Wallet flow suite usage: `docs/wallet-flow-suite.md`
1 change: 1 addition & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,7 @@ yarn generate:release # Review version bumps and changelog
- Report status with concrete evidence (commands run, pass/fail, remaining gaps), not vague progress language.
- For release-readiness tasks, drive to production-grade confidence: strict validation, explicit failure reasons, and concrete remediation steps.
- Avoid “do you want me to…” phrasing when the expected next step is obvious from context.
- For launch-flow-impacting changes, follow `docs/harness-engineering-spec.md` and complete `docs/harness-engineering-checklist.md` before requesting merge.

### Wallet Flow Reliability (agent-browser-driver)
- Treat wallet E2E as environment-first: do not trust flow results until local chain + indexer + dApp are confirmed on the same network.
Expand Down
51 changes: 51 additions & 0 deletions docs/harness-engineering-checklist.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# Harness Engineering Checklist

Use this checklist for launch-flow-impacting work.

## Design

- [ ] Flow scope is mapped to `flow_id` values in `docs/launch-readiness-board.csv`.
- [ ] Acceptance criteria distinguish happy-path vs explicit blocker path.
- [ ] Critical-flow impact is identified up front.
- [ ] Flow owner and release/harness owner are assigned in the PR.

## Implementation

- [ ] Criteria updates are route-resilient (canonical-route recheck where needed).
- [ ] Tx checks include objective signals (tx history delta and/or explicit blocker copy).
- [ ] New env toggles are documented in `docs/wallet-flow-suite.md`.

## Verification

- [ ] Run suite (full or targeted): `yarn test:wallet-flows`.
- [ ] Inspect `suite/report.json` for `verified` and `agentSuccess`.
- [ ] Inspect `suite/release-matrix.md` classification counts.
- [ ] Run gate: `yarn test:wallet-flows:gate`.
- [ ] If release strictness is required, run: `yarn test:wallet-flows:gate:strict`.
- [ ] Confirm matrix artifacts exist (`json`, `csv`, `md`) under suite output.

## Critical Flows

- [ ] `FLOW-001` happy-path-pass
- [ ] `FLOW-002` happy-path-pass
- [ ] `FLOW-005` happy-path-pass
- [ ] `FLOW-010` happy-path-pass
- [ ] `FLOW-011` happy-path-pass
- [ ] `FLOW-013` happy-path-pass
- [ ] `FLOW-014` happy-path-pass
- [ ] `FLOW-018` happy-path-pass
- [ ] `FLOW-019` happy-path-pass

## PR Hygiene

- [ ] PR description includes matrix summary (happy/blocker/failed).
- [ ] Any blocker-or-partial critical flow has explicit exception, owner, and ETA.
- [ ] Evidence links are included (artifact directory or CI artifact URLs).
- [ ] If launch-impacting, PR approval includes a release-captain signoff.

## Post-Merge

- [ ] If semantics changed, update `CLAUDE.md` runbook section.
- [ ] If recurring flake found, add flow id to flaky rerun set in spec.
- [ ] File a follow-up for fixture/indexer reliability if blocker-pass rate is rising.
- [ ] Update weekly trend snapshot with this run's matrix totals.
167 changes: 167 additions & 0 deletions docs/harness-engineering-spec.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,167 @@
# Harness Engineering Operating Spec

Last updated: 2026-03-05

## Why This Exists

This repo has strong momentum but still leaks reliability through:
- flow verification that can pass without happy-path completion
- drift between “what docs say” and “what release gates enforce”
- scattered operational knowledge across AGENTS/CLAUDE/docs/PR threads
- weak mechanical governance on release evidence quality

This spec defines the senior-level operating model to convert harness work into predictable release outcomes.

## Scope

In scope:
- launch-critical dApp flows validated by the wallet flow suite
- release-go/no-go evidence used by maintainers
- repository process changes that make agent execution more reliable

Out of scope:
- full native restaking UX (deprioritized)
- replacing manual signoff for flows that require external non-local actors

## Source Principles

Based on OpenAI Harness Engineering guidance:
- optimize for stable maps, not giant prompts
- enforce output quality mechanically (not by intention)
- classify evidence quality explicitly (not pass/fail only)
- continuously prune stale knowledge and keep docs compact

Reference:
- https://openai.com/index/harness-engineering/

## Current Gaps In This Repo

1. `verified` and `agentSuccess` can diverge, but were historically treated as equivalent in go/no-go conversations.
2. Launch evidence was captured, but not classified into quality tiers for release decisions.
3. Critical flows did not have hard happy-path enforcement by default.
4. PR reviews lacked a required harness evidence checklist.
5. No single script existed to fail release gate when matrix quality degraded.
6. Harness process details were spread across files without one operating contract.

## Target Operating Model

### 1) Two-Layer Pass Semantics

- `verified=true`:
- criteria passed (tx delta OR explicit blocker state)
- `agentSuccess=true`:
- agent completed intended narrative without terminal tool/runtime failure

Both are reported. Never collapse them into one metric.

### 2) Matrix-Based Evidence

Every run produces matrix artifacts:
- `suite/release-matrix.json`
- `suite/release-matrix.csv`
- `suite/release-matrix.md`

Each flow is classified as:
- `happy-path-pass`
- `blocker-or-partial-pass`
- `failed`

### 3) Critical-Flow Strictness

Critical flows require happy-path completion (`agentSuccess=true`) even when `verified=true`.

Default critical set:
- `FLOW-001`, `FLOW-002`, `FLOW-005`, `FLOW-010`, `FLOW-011`, `FLOW-013`, `FLOW-014`, `FLOW-018`, `FLOW-019`

### 4) Mechanical Gate Script

Release gate is enforced by:
- `yarn test:wallet-flows:gate`

Script behavior:
- fails on `failed` rows above threshold
- fails on missing critical flow rows
- fails when critical flows are not `happy-path-pass` (unless explicitly overridden)

### 5) PR Governance

PR template requires harness evidence for launch-flow-impacting changes:
- report artifact review
- release matrix review
- gate script output
- critical flow exceptions explicitly documented

### 6) Ownership And Escalation

- Flow owner: feature owner who changed launch-flow behavior.
- Harness owner: engineer running/triaging suite output for release cut.
- Escalation owner: release captain when critical-flow gate fails.

Escalation rules:
- critical flow failing: block merge to release branch until fixed or exception signed off
- blocker-or-partial trend worsening for 2 consecutive release cycles: open remediation issue with owner and ETA
- missing evidence in PR: do not approve launch-impacting changes

### 7) CI Policy

- Pre-merge (required):
- lint/type/build
- harness gate output for launch-flow-impacting PRs
- Nightly (required):
- full wallet flow suite
- matrix trend snapshot committed/attached as artifact
- Weekly hygiene (required):
- rerun known flaky flows
- open targeted cleanup PRs for recurring failure patterns

### 8) Definition Of Done (Launch-Flow Changes)

All must be true:
- code merged with tests/checks passing
- release matrix generated and attached
- `failed=0`
- all critical flows are `happy-path-pass`
- any non-critical blocker-or-partial rows have owner + ETA + issue link
- docs updated if semantics/criteria changed

## Required Commands

Run suite:
- `yarn test:wallet-flows`

Run gate:
- `yarn test:wallet-flows:gate`

Strict blocker cap:
- `yarn test:wallet-flows:gate:strict`

## SLOs (Release Quality)

Release candidate targets:
- `failed = 0`
- critical flows: all `happy-path-pass`
- blocker/partial flows: explicitly justified, tracked, and owner-assigned

Escalation:
- any critical regression blocks merge to release branch
- any blocker growth trend over 2 consecutive release cycles requires remediation plan

## Change Management

When flow criteria are modified:
1. rerun impacted flow IDs
2. rerun known flaky set (`FLOW-007`, `FLOW-013`, `FLOW-016`)
3. update docs if semantics changed
4. include before/after matrix summary in PR

## 30/60 Day Rollout

Within 30 days:
1. enforce PR harness validation section for launch-impacting PRs
2. require gate output in release candidate PR descriptions
3. publish weekly matrix trend summary

Within 60 days:
1. add nightly suite + gate CI job
2. add auto-generated matrix trend dashboard doc
3. codify recurring cleanup cadence as a standing maintenance task
15 changes: 12 additions & 3 deletions docs/wallet-flow-suite.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,8 @@ Optional wallet env vars:
- `AGENT_WALLET_USER_DATA_DIR=/abs/path/to/.agent-wallet-profile`
- `AGENT_STRICT_WALLET_PREFLIGHT=false` to allow non-blocking preflight (default is strict/fail-closed)
- `AGENT_WALLET_ALLOW_HEADLESS=true` to force wallet runs in headless mode (default is headful for extension stability)
- `AGENT_REQUIRE_AGENT_SUCCESS=true` to require agent narrative success for all flows
- `AGENT_REQUIRE_AGENT_SUCCESS_FLOWS=FLOW-001,FLOW-002,...` to enforce agent-success gate for specific flows (defaults to critical tx flows)

Notes:

Expand Down Expand Up @@ -57,9 +59,11 @@ Notes:

- Default pass requires:
- `verified=true` (all declared criteria pass)
- Optional dual-gate mode (`AGENT_REQUIRE_AGENT_SUCCESS=true`) also requires:
- `agentSuccess=true`
- Flow dependencies are expanded automatically (for example `FLOW-012` includes `FLOW-010`, `FLOW-016` includes `FLOW-013`).
- Critical-flow dual gate is enabled by default for:
- `FLOW-001`, `FLOW-002`, `FLOW-005`, `FLOW-010`, `FLOW-011`, `FLOW-013`, `FLOW-014`, `FLOW-018`, `FLOW-019`
- these flows require both `verified=true` and `agentSuccess=true` unless overridden via `AGENT_REQUIRE_AGENT_SUCCESS_FLOWS`
- Global strict mode (`AGENT_REQUIRE_AGENT_SUCCESS=true`) requires `agentSuccess=true` for every flow.
- Flow dependencies are expanded automatically when defined in case metadata.
- `tx-outcome` flows pass when either:
- a new terminal transaction status (`finalized` or `failed`) is observed in current-run `tx-history`, or
- an explicit non-actionable blocker state is visible (permissions, missing wallet dependency, empty inventory, etc.).
Expand Down Expand Up @@ -94,5 +98,10 @@ Notes:
## Artifacts and Exit Criteria

- Artifacts are written to `agent-results/wallet-flows/` by default.
- Runner also writes release matrix artifacts under `agent-results/.../suite/`:
- `release-matrix.json`
- `release-matrix.csv`
- `release-matrix.md`
- classification: `happy-path-pass`, `blocker-or-partial-pass`, `failed`
- Runner exits non-zero when any case fails or is skipped.
- Use generated report artifacts plus tx hashes/request ids as launch sign-off evidence.
4 changes: 3 additions & 1 deletion package.json
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,9 @@
"script:setupStaking": "bun scripts/setupStaking.ts",
"test:wallet-flows": "node ./scripts/agent-browser/run-wallet-flow-suite.mjs",
"test:wallet-flows:list": "node ./scripts/agent-browser/run-wallet-flow-suite.mjs --list",
"test:wallet-flows:docker": "bash ./scripts/agent-browser/run-wallet-flow-suite-docker.sh"
"test:wallet-flows:docker": "bash ./scripts/agent-browser/run-wallet-flow-suite-docker.sh",
"test:wallet-flows:gate": "node ./scripts/agent-browser/check-release-gate.mjs",
"test:wallet-flows:gate:strict": "node ./scripts/agent-browser/check-release-gate.mjs --max-blocker-partial 0"
},
"resolutions": {
"@polkadot/api": "^13.2.1",
Expand Down
Loading
Loading