Skip to content

Latest commit

 

History

History
126 lines (87 loc) · 3.11 KB

File metadata and controls

126 lines (87 loc) · 3.11 KB

Implementation Roadmap

Milestone 1: Monorepo and Runtime Contracts

Build:

  • root workspace config
  • contract package
  • runtime-core package
  • canonical schema catalog
  • run bootstrap script

Acceptance criteria:

  • repo contains documented module boundaries
  • schemas parse cleanly
  • example run request can create a deterministic run folder
  • run folder includes run-state.json, task.md, policy.md, evidence.md, events.jsonl, actions.jsonl

Milestone 2: Run Engine and Safety Kernel

Build:

  • run creation and phase transitions
  • checkpoint creation
  • policy evaluation
  • approval queue
  • rollback gate and secret redaction utilities

Acceptance criteria:

  • mutating actions are denied without approval or snapshot
  • every phase change emits an event
  • approval requests persist to disk
  • rollback metadata is created before risky transitions

Milestone 3: MVP Web Observation Runtime

Build:

  • web lane adapter contract
  • passive observation skeleton
  • DOM, screenshot, accessibility, timing capture interfaces
  • interaction log stubs

Acceptance criteria:

  • baseline observation bundle persists to screenshots/, models/, and events.jsonl
  • web lane capabilities are discoverable from code and docs
  • unsupported desktop/mobile requests fail closed

Milestone 4: Cognitive Graph and Hypothesis Engine

Build:

  • graph store
  • node and edge merge rules
  • invalidation rules
  • hypothesis lifecycle
  • confidence propagation helpers

Acceptance criteria:

  • graph file persists nodes and edges with confidence and evidence refs
  • hypotheses can be proposed, validated, downgraded, and rejected
  • contradiction updates confidence downward deterministically

Milestone 5: Replay and Verification

Build:

  • replay step store
  • divergence detection
  • weighted fidelity scoring
  • hard gate evaluation
  • perceptual mismatch placeholders

Acceptance criteria:

  • replay steps include timestamps, state hashes, agent ids, evidence refs, and confidence
  • verification produces a machine-readable report
  • high score cannot pass with broken critical journeys

Milestone 6: Orchestrator and Dashboard Backend

Build:

  • orchestrator phase scheduler
  • agent dispatch wrappers
  • dashboard read models
  • live run status streaming contract

Acceptance criteria:

  • orchestrator can walk a run through setup, observation, modeling, verification, and finalization states
  • dashboard backend can serve run overview, approvals, replay windows, and evidence index
  • failures emit structured events and are surfaced by severity

Milestone 7: Repair Engine and MVP Closure

Build:

  • defect clustering
  • repair plans
  • repair validation
  • final evidence export

Acceptance criteria:

  • verification mismatches can be clustered into repair units
  • repair results capture confidence delta and regression risk
  • final evidence pack can be assembled deterministically

V2

  • desktop lane implementation
  • richer verification and live diff overlays
  • stronger state modeling and confidence calibration
  • broader dashboard interactivity

V3

  • mobile lane implementation
  • advanced repair automation
  • richer artifact fusion
  • robust archive and audit exports