feat: Workbench control plane + requirements-traceability gate#2
Open
aria-inboxia wants to merge 4 commits into
Open
feat: Workbench control plane + requirements-traceability gate#2aria-inboxia wants to merge 4 commits into
aria-inboxia wants to merge 4 commits into
Conversation
Adds a requirements-traceability gate and a localhost control plane that drives this repo's own four-phase AI workflow from an epic. - tools/trace: reconciles spec R.N bullets, @R.N scenario tags, and Playwright results into test-plan.yaml + TRACEABILITY.md; CI gate fails on uncovered / orphan / unproven / stale. Bootstraps the dormant acceptance suite with real scenarios for existing Foyer behavior. - workbench/server: zero-dep Node http + SSE backend that runs claude -p headless per phase (challenge -> test plan -> implement -> ship) with per-phase least-privilege tool allowlists, acceptEdits (never bypass), budget caps + timeouts, per-epic git-worktree isolation, audit journal, daily cost cap, and independent gate-verified implementation. - workbench (cockpit): second Vite entry at /workbench/ — epic intake, phase stepper with STOP gates, live SSE run view, traceability panel. Foyer stays at / and is untouched. - Hardened per a multi-agent adversarial review: prompt-injection fencing, env-secret scoping, Origin guard, crash-safe SSE, graceful shutdown, crash recovery, feature-qualified results join, WCAG pill contrast. - 46 unit/integration tests; CI wires the trace gate (structural + results).
/simplify cleanups (behaviour-preserving): - sse.js: extract pruneDead() shared by publish() and pingAll() - store.js: extract atomicWrite() shared by writeEpic() and addSpend() - EpicDetailView: use .find() instead of a manual loop to locate the running phase
A transient EventSource disconnect auto-reconnects and is no longer surfaced as a 'stream interrupted' error; terminal status now comes only from the result/phase_status events, backed by an epic-polling backstop in the detail view. The runner captures the full agent narration into deliverable.report so the cockpit shows the actual challenge questions (not just the closing summary), and the gate textarea is relabelled as the place to answer them.
…agnostic Two issues found in adversarial review of the prior commit: - The epic-poll backstop was gated on stream.status === "running", so a silent SSE drop *before* the first stream event (status still "idle") never started the poll — the exact case the backstop exists for. Gate on the stream not having settled instead (poll while idle or running). - finalize() attached deliverable.report on every run with narration, so on an error/timeout phase the partial narration masked the stderr diagnostic in deliverable.error. Skip report when status is "error".
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
This repo's real asset is its AI-assisted-coding harness — a documented, spec-driven, test-first workflow (
challenge-reqs→.featuretest plan →tdd-implement→ship). But that workflow was driven entirely by hand, and its central proof-of-correctness artifact (tests/acceptance/test-plan.yaml) was an unverified honor system:status: coveredwas prose an agent typed about its own work, with nothing checking that requirements were actually covered by passing tests.Solution
Two complementary additions (Foyer itself is untouched and stays at
/):1. Requirements Traceability Gate (
tools/trace/) — reconciles three sources that previously drifted freely: specR.Nbullets,@R.Nscenario tags, and the Playwright JSON results.pnpm trace:gengeneratestest-plan.yaml+TRACEABILITY.md;pnpm trace:checkfails CI on an uncovered requirement, an orphan tag, a stale artifact, or a "covered" scenario that did not pass. It also bootstraps the previously-empty acceptance suite with 5 real scenarios for existing Foyer behaviour.2. Workbench control plane (
workbench/) — a localhost-only Node service + cockpit (a second Vite entry at/workbench/) that drives the four-phase workflow from an epic, runningclaude -pheadless per phase and streaming it live over SSE. Per-phase least-privilege tool allowlists,acceptEdits(never bypass), budget caps + timeouts, per-epic git-worktree isolation, an audit journal, a daily cost cap, and independent gate-verified implementation (the agent's "done" is re-checked by the traceability gate, not trusted).Hardened against a multi-agent adversarial review: prompt-injection data-fencing, env-secret scoping for the child, Origin guard, crash-safe SSE, graceful shutdown, crash recovery, a feature-qualified results join, and WCAG-AA pill contrast. The cockpit also tolerates transient SSE reconnects without surfacing a false error — terminal status comes only from the run's
result/phase_statusevents, backed by an epic-polling backstop that runs only while the stream is genuinely silent — and the backend now persists the agent's full narration (capped) so the challenge questions render in full, with a clearly-labelled answer box beneath the gate that feeds the next phase.Test plan
pnpm prettier-check && pnpm lint && pnpm knip && pnpm build— all greenpnpm test:dev— 46 unit/integration tests (auth, trace incl. edge cases, backend env/args/event-mapping, HTTP API incl. crash-recovery, SSE reducer incl. transient-drop handling)pnpm test:acceptance— 5 acceptance scenarios pass; CI runstrace:check --structural(static job) and--results(after the acceptance run)Evidence
CI is wired to enforce the gate on every PR. Verified end-to-end: a fresh epic run through the Challenge phase reaches
awaiting_approval(resultSubtype: success, no error) withdeliverable.reportpopulated — the structured challenge report (codebase context + questions grouped by audience) that the cockpit now renders, confirming both the SSE-resilience and the answer-surfacing fixes. Thestream-jsoncontract and HTTP layer are covered by the 46-test suite. The cockpit is a backend-driven surface, so it's demonstrated by running it (start the backend + Vite, open/workbench/) rather than a static video — seeworkbench/README.md.