Build:
- root workspace config
- contract package
- runtime-core package
- canonical schema catalog
- run bootstrap script
Acceptance criteria:
- repo contains documented module boundaries
- schemas parse cleanly
- example run request can create a deterministic run folder
- run folder includes
run-state.json,task.md,policy.md,evidence.md,events.jsonl,actions.jsonl
Build:
- run creation and phase transitions
- checkpoint creation
- policy evaluation
- approval queue
- rollback gate and secret redaction utilities
Acceptance criteria:
- mutating actions are denied without approval or snapshot
- every phase change emits an event
- approval requests persist to disk
- rollback metadata is created before risky transitions
Build:
- web lane adapter contract
- passive observation skeleton
- DOM, screenshot, accessibility, timing capture interfaces
- interaction log stubs
Acceptance criteria:
- baseline observation bundle persists to
screenshots/,models/, andevents.jsonl - web lane capabilities are discoverable from code and docs
- unsupported desktop/mobile requests fail closed
Build:
- graph store
- node and edge merge rules
- invalidation rules
- hypothesis lifecycle
- confidence propagation helpers
Acceptance criteria:
- graph file persists nodes and edges with confidence and evidence refs
- hypotheses can be proposed, validated, downgraded, and rejected
- contradiction updates confidence downward deterministically
Build:
- replay step store
- divergence detection
- weighted fidelity scoring
- hard gate evaluation
- perceptual mismatch placeholders
Acceptance criteria:
- replay steps include timestamps, state hashes, agent ids, evidence refs, and confidence
- verification produces a machine-readable report
- high score cannot pass with broken critical journeys
Build:
- orchestrator phase scheduler
- agent dispatch wrappers
- dashboard read models
- live run status streaming contract
Acceptance criteria:
- orchestrator can walk a run through setup, observation, modeling, verification, and finalization states
- dashboard backend can serve run overview, approvals, replay windows, and evidence index
- failures emit structured events and are surfaced by severity
Build:
- defect clustering
- repair plans
- repair validation
- final evidence export
Acceptance criteria:
- verification mismatches can be clustered into repair units
- repair results capture confidence delta and regression risk
- final evidence pack can be assembled deterministically
- desktop lane implementation
- richer verification and live diff overlays
- stronger state modeling and confidence calibration
- broader dashboard interactivity
- mobile lane implementation
- advanced repair automation
- richer artifact fusion
- robust archive and audit exports