126 lines (87 loc) · 3.11 KB

Implementation Roadmap

Milestone 1: Monorepo and Runtime Contracts

Build:

root workspace config
contract package
runtime-core package
canonical schema catalog
run bootstrap script

Acceptance criteria:

repo contains documented module boundaries
schemas parse cleanly
example run request can create a deterministic run folder
run folder includes run-state.json, task.md, policy.md, evidence.md, events.jsonl, actions.jsonl

Milestone 2: Run Engine and Safety Kernel

Build:

run creation and phase transitions
checkpoint creation
policy evaluation
approval queue
rollback gate and secret redaction utilities

Acceptance criteria:

mutating actions are denied without approval or snapshot
every phase change emits an event
approval requests persist to disk
rollback metadata is created before risky transitions

Milestone 3: MVP Web Observation Runtime

Build:

web lane adapter contract
passive observation skeleton
DOM, screenshot, accessibility, timing capture interfaces
interaction log stubs

Acceptance criteria:

baseline observation bundle persists to screenshots/, models/, and events.jsonl
web lane capabilities are discoverable from code and docs
unsupported desktop/mobile requests fail closed

Milestone 4: Cognitive Graph and Hypothesis Engine

Build:

graph store
node and edge merge rules
invalidation rules
hypothesis lifecycle
confidence propagation helpers

Acceptance criteria:

graph file persists nodes and edges with confidence and evidence refs
hypotheses can be proposed, validated, downgraded, and rejected
contradiction updates confidence downward deterministically

Milestone 5: Replay and Verification

Build:

replay step store
divergence detection
weighted fidelity scoring
hard gate evaluation
perceptual mismatch placeholders

Acceptance criteria:

replay steps include timestamps, state hashes, agent ids, evidence refs, and confidence
verification produces a machine-readable report
high score cannot pass with broken critical journeys

Milestone 6: Orchestrator and Dashboard Backend

Build:

orchestrator phase scheduler
agent dispatch wrappers
dashboard read models
live run status streaming contract

Acceptance criteria:

orchestrator can walk a run through setup, observation, modeling, verification, and finalization states
dashboard backend can serve run overview, approvals, replay windows, and evidence index
failures emit structured events and are surfaced by severity

Milestone 7: Repair Engine and MVP Closure

Build:

defect clustering
repair plans
repair validation
final evidence export

Acceptance criteria:

verification mismatches can be clustered into repair units
repair results capture confidence delta and regression risk
final evidence pack can be assembled deterministically

V2

desktop lane implementation
richer verification and live diff overlays
stronger state modeling and confidence calibration
broader dashboard interactivity

V3

mobile lane implementation
advanced repair automation
richer artifact fusion
robust archive and audit exports