Hand the agent a Rust binary built with coverage instrumentation. Its job: find inputs that reach branches the corpus hasn't covered yet, and trip crashes byte-wise fuzzers would never reach. The agent reads the symbol table, reasons about likely format invariants, proposes mutated inputs, executes them through a sandboxed runner, observes the coverage bitmap, and updates its hypotheses — all coordinated by ARCP.
See PROMPT.md for the full design narrative.
make upbrings four containers green:ollama,arcp-runtime,harness-runner,arcp-client.- Within ~10s the dashboard shows
status: analyzing_binary→seeding_corpus→mutating. - The coverage gauge climbs as the agent finds inputs that reach new edges.
- A
thoughtevent appears with the model's hypothesis ("Symboldecode_array_lenreads 4 bytes — try varying that prefix."). - First crash on the bundled
target_cbordemo typically within 5–15 minutes on a laptop.artifact_refevent fires; the crash byte stream appears in thework-crashesvolume. - Stop with
make down.make upresumes from the saved corpus + coverage map. - Let it run to exhaustion —
Status::budget_exhaustedfires and the campaign closes cleanly.
┌───────────────────────┐
│ ollama │
└───────────▲───────────┘
│ http
┌────────────────┐ │
│ arcp-client │ │
│ (TUI) │ │
└───────┬────────┘ │
│ ws/arcp │
▼ │
┌──────────────────────────────────────────┐
│ arcp-runtime │
│ hosts `fuzz.explore` agent │
│ persists corpus + coverage bitmap │
└───────────────────┬──────────────────────┘
│ tool.call: harness.run
▼
┌──────────────────────────────────────────┐
│ harness-runner │
│ POST /run executes the target binary │
│ returns coverage bitmap + stderr │
└──────────────────────────────────────────┘
cp .env.example .env
make uprust/
├── crates/
│ ├── fuzzcommon/ shared types: Bitmap, FuzzState, RunArgs, Persona, ...
│ ├── arcp-stubs/ thin SDK surface — replace with the real `arcp` crate
│ ├── runtime/ hosts the `fuzz.explore` agent (PROMPT §5)
│ ├── runner/ POST /run HTTP sandbox (PROMPT §6)
│ └── client/ ratatui dashboard (PROMPT §7)
└── examples/
└── target_cbor/ deliberately-buggy CBOR-ish decoder used as the target
The workspace ships a fast, hermetic test suite — no docker, no Ollama, no network:
make test
# or
cargo test --workspace --no-fail-fast17 tests covering Bitmap::absorb, FuzzState::seed_from, crash dedup canonicalisation, strategy-prompt construction, idempotency-key derivation, and the target_cbor length-confusion bug at depth 6 (should_panic).
| Variable | Default | Effect |
|---|---|---|
FUZZ_RUN_TIMEOUT_MS |
250 |
Per-execution wall-clock cap. |
FUZZ_BUDGET_USD |
1.00 |
Inference cost cap. Fuzzer halts when reached. |
FUZZ_BUDGET_EXECS |
50000 |
Total harness executions. Not money — see below. |
FUZZ_WALLTIME_HOURS |
2 |
Outer wall-clock kill switch. |
RUNNER_SECCOMP |
strict |
Seccomp profile (see "Compromises" — currently unused). |
RUNTIME_STORE |
/data/fuzz.db |
Persistent corpus + bitmap. Lets make down && make up resume. |
OLLAMA_MODEL |
qwen2.5:1.5b-instruct |
See top-level README for alternatives. |
ARCP_SDK_VERSION |
latest |
Pin a specific crates.io release for reproducible builds. |
creditsis a runtime-defined currency (PROMPT §8) representing harness executions, not USD. Eachharness.rundecrements it. When either USD or credits hit zero the fuzzer halts cleanly withBUDGET_EXHAUSTED.
These deltas from PROMPT.md are intentional so cargo test --workspace stays fast and CI-friendly:
arcp-stubscrate. The PROMPT uses ergonomic types (Runtime::builder,JobContext::tool_call, etc.) that aren't published on crates.io today. We provide a thin local stub with the same shape. Search for// TODO: replace with real SDK API when published.- No seccomp.
seccompileris Linux-only and CI-flaky. The runner currently usestokio::process+ wall-clock timeout. Real seccomp is a one-file diff. - No libFuzzer counters. Re-instrumenting the target with
-Cinstrument-coverageand mmap-sharing the counter region is real-libFuzzer territory. We use a stub coverage proxy that hashes each distinct stderr line into a 1024-bit bitmap — correlates well enough with reaching new code paths to drive the agent loop againsttarget_cbor.
The JSON wire shape between runtime and runner is unchanged, so swapping in a real sandbox + real coverage is mechanical.
make up— full stack.make submit— kick a one-shot job inside the runtime container.make report— list crashes + corpus volumes.make test— run the workspace test suite.make verify—docker buildonly.make down— stop everything.