feat: Workbench control plane + requirements-traceability gate by aria-inboxia · Pull Request #2 · veej/ai-assisted-coding

aria-inboxia · 2026-06-17T16:47:05Z

Problem

This repo's real asset is its AI-assisted-coding harness — a documented, spec-driven, test-first workflow (challenge-reqs → .feature test plan → tdd-implement → ship). But that workflow was driven entirely by hand, and its central proof-of-correctness artifact (tests/acceptance/test-plan.yaml) was an unverified honor system: status: covered was prose an agent typed about its own work, with nothing checking that requirements were actually covered by passing tests.

Solution

Two complementary additions (Foyer itself is untouched and stays at /):

1. Requirements Traceability Gate (tools/trace/) — reconciles three sources that previously drifted freely: spec R.N bullets, @R.N scenario tags, and the Playwright JSON results. pnpm trace:gen generates test-plan.yaml + TRACEABILITY.md; pnpm trace:check fails CI on an uncovered requirement, an orphan tag, a stale artifact, or a "covered" scenario that did not pass. It also bootstraps the previously-empty acceptance suite with 5 real scenarios for existing Foyer behaviour.

2. Workbench control plane (workbench/) — a localhost-only Node service + cockpit (a second Vite entry at /workbench/) that drives the four-phase workflow from an epic, running claude -p headless per phase and streaming it live over SSE. Per-phase least-privilege tool allowlists, acceptEdits (never bypass), budget caps + timeouts, per-epic git-worktree isolation, an audit journal, a daily cost cap, and independent gate-verified implementation (the agent's "done" is re-checked by the traceability gate, not trusted).

Hardened against a multi-agent adversarial review: prompt-injection data-fencing, env-secret scoping for the child, Origin guard, crash-safe SSE, graceful shutdown, crash recovery, a feature-qualified results join, and WCAG-AA pill contrast. The cockpit also tolerates transient SSE reconnects without surfacing a false error — terminal status comes only from the run's result/phase_status events, backed by an epic-polling backstop that runs only while the stream is genuinely silent — and the backend now persists the agent's full narration (capped) so the challenge questions render in full, with a clearly-labelled answer box beneath the gate that feeds the next phase.

Test plan

pnpm prettier-check && pnpm lint && pnpm knip && pnpm build — all green
pnpm test:dev — 46 unit/integration tests (auth, trace incl. edge cases, backend env/args/event-mapping, HTTP API incl. crash-recovery, SSE reducer incl. transient-drop handling)
pnpm test:acceptance — 5 acceptance scenarios pass; CI runs trace:check --structural (static job) and --results (after the acceptance run)

Evidence

CI is wired to enforce the gate on every PR. Verified end-to-end: a fresh epic run through the Challenge phase reaches awaiting_approval (resultSubtype: success, no error) with deliverable.report populated — the structured challenge report (codebase context + questions grouped by audience) that the cockpit now renders, confirming both the SSE-resilience and the answer-surfacing fixes. The stream-json contract and HTTP layer are covered by the 46-test suite. The cockpit is a backend-driven surface, so it's demonstrated by running it (start the backend + Vite, open /workbench/) rather than a static video — see workbench/README.md.

Adds a requirements-traceability gate and a localhost control plane that drives this repo's own four-phase AI workflow from an epic. - tools/trace: reconciles spec R.N bullets, @R.N scenario tags, and Playwright results into test-plan.yaml + TRACEABILITY.md; CI gate fails on uncovered / orphan / unproven / stale. Bootstraps the dormant acceptance suite with real scenarios for existing Foyer behavior. - workbench/server: zero-dep Node http + SSE backend that runs claude -p headless per phase (challenge -> test plan -> implement -> ship) with per-phase least-privilege tool allowlists, acceptEdits (never bypass), budget caps + timeouts, per-epic git-worktree isolation, audit journal, daily cost cap, and independent gate-verified implementation. - workbench (cockpit): second Vite entry at /workbench/ — epic intake, phase stepper with STOP gates, live SSE run view, traceability panel. Foyer stays at / and is untouched. - Hardened per a multi-agent adversarial review: prompt-injection fencing, env-secret scoping, Origin guard, crash-safe SSE, graceful shutdown, crash recovery, feature-qualified results join, WCAG pill contrast. - 46 unit/integration tests; CI wires the trace gate (structural + results).

/simplify cleanups (behaviour-preserving): - sse.js: extract pruneDead() shared by publish() and pingAll() - store.js: extract atomicWrite() shared by writeEpic() and addSpend() - EpicDetailView: use .find() instead of a manual loop to locate the running phase

A transient EventSource disconnect auto-reconnects and is no longer surfaced as a 'stream interrupted' error; terminal status now comes only from the result/phase_status events, backed by an epic-polling backstop in the detail view. The runner captures the full agent narration into deliverable.report so the cockpit shows the actual challenge questions (not just the closing summary), and the gate textarea is relabelled as the place to answer them.

…agnostic Two issues found in adversarial review of the prior commit: - The epic-poll backstop was gated on stream.status === "running", so a silent SSE drop *before* the first stream event (status still "idle") never started the poll — the exact case the backstop exists for. Gate on the stream not having settled instead (poll while idle or running). - finalize() attached deliverable.report on every run with narration, so on an error/timeout phase the partial narration masked the stderr diagnostic in deliverable.error. Skip report when status is "error".

aria-inboxia added 4 commits June 17, 2026 18:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Workbench control plane + requirements-traceability gate#2

feat: Workbench control plane + requirements-traceability gate#2
aria-inboxia wants to merge 4 commits into
veej:mainfrom
aria-inboxia:aria-inboxia/foyer-best-addition

aria-inboxia commented Jun 17, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

aria-inboxia commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

Test plan

Evidence

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

aria-inboxia commented Jun 17, 2026 •

edited

Loading