Skip to content

ph3on1x/adveloop

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

adveloop

GAN-inspired adversarial development loop — Planner directs Generator + Evaluator in fresh cmux panes until the work actually passes

License: MIT Version Agent Skills Claude Code cmux macOS

Ever merged AI-written code that passed every check but was subtly wrong?
That's what happens when the same agent writes and judges its own work.

InstallationUsageHow It WorksCode QualityExamples

adveloop runs a three-role adversarial loop for every deliverable you approve. A Planner (your interactive session) drafts concrete, testable outcomes; a Generator in a fresh cmux pane builds or fixes them; an Evaluator in another fresh pane exercises the result and returns pass/fail with evidence. Each pane is a new claude session — isolated context, skeptical review, hard gate per deliverable. Adapted from Anthropic's Harness design for long-running agentic apps.

Installation

Platform How to install
Claude Code claude plugin marketplace add ph3on1x/adveloop
claude plugin install adveloop

Note

Requires cmux (macOS 14.0+) and the /cmux skill — adveloop delegates every pane/signal operation to it. The context7 MCP server is strongly recommended: both panes are instructed to query current library/framework/API docs through context7 before writing or auditing. Without it, they fall back to training-data knowledge and stale API usage may slip through.

Usage

/adveloop [product brief, or path to a spec file; empty to resume]

The argument can be an inline brief, a path to a spec file (.md, .txt, etc.) whose contents become the brief, or empty to resume an existing run.

When to Use What

You're thinking... Use What happens
"Build this new feature and actually test it" /adveloop "minimal URL shortener with SQLite" Planner drafts build-mode deliverables; Generator writes code in a fresh pane; Evaluator exercises each result and returns pass/fail with evidence
"Audit this existing code and fix what's broken" /adveloop "review /login for XSS and fix issues found" Evaluator runs first against your code. If it passes, no Generator runs. If it fails, the verdict becomes the Generator's first feedback round.
"I have a longer spec already written up" /adveloop docs/specs/checkout-flow.md Planner reads the file, drafts deliverables from it, asks you to approve
"Pick up where I left off" /adveloop Reads .adveloop/deliverables.md, reports state per deliverable, offers Resume / Rewrite / Abort

Modes

Mode Starting point Use for
build Generator runs first, Evaluator verifies Greenfield features, new endpoints, new modules
review Evaluator audits existing code first — Generator only runs if the audit fails Audits, hardening passes, bug hunts, fixing existing code

The Planner infers the mode per deliverable from verbs in your brief. When a mode isn't clearly implied, it asks before the approval screen — no silent default. A single run can mix both modes.

How It Works

flowchart TD
    A["/adveloop [brief]"] --> B["Planner drafts<br>3–8 deliverables"]
    B --> C{"Mode per<br>deliverable"}
    C -- build --> D["Generator pane<br>(fresh claude)"]
    C -- review --> E["Evaluator pane<br>(audit existing code)"]
    D --> F["Evaluator pane<br>(verify Generator's claim)"]
    E -- pass --> I["Advance"]
    E -- fail --> G["feedback-0.json"]
    G --> D
    F -- pass --> I
    F -- fail --> H{"retry &lt; 3?"}
    H -- yes --> D
    H -- no --> J["Ask user:<br>retry / edit / skip / abort"]
    I --> K{"More<br>deliverables?"}
    K -- yes --> C
    K -- no --> L["Done"]

    style A fill:#1a1a2e,stroke:#e94560,color:#fff
    style L fill:#1a1a2e,stroke:#0f3460,color:#fff
    style H fill:#16213e,stroke:#e94560,color:#fff
    style J fill:#16213e,stroke:#e94560,color:#fff
Loading

Every pane is a fresh claude --dangerously-skip-permissions session spawned via cmux. Signals are namespaced with a run_id so stale signals from prior runs cannot unblock the current one. Dynamic content (deliverables, feedback) is written to disk first; the pane bootstrap only says "read this file and follow it" — shell injection via backticks or $(…) is impossible.

All harness state lives under .adveloop/ at your project root:

.adveloop/
├── deliverables.md            # approved list (Planner's ground truth)
└── tasks/
    └── <N>/
        ├── gen-task.md        # deliverable + prior feedback + signal name
        ├── gen-result.md      # Generator's summary
        ├── eval-task.md       # Generator summary + prior rounds + signal name
        ├── eval-result.json   # {"passed": bool, "evidence": str, "notes": str}
        └── feedback-<R>.json  # Evaluator verdict from failed round R

Add .adveloop/ to your .gitignore if you don't want harness metadata tracked — the skill offers to do this on first run.

Code Quality Stance

Both panes operate under an opinionated quality bar — these are failure conditions for the Evaluator, not stylistic preferences:

  • Native patterns only. Follow each framework's built-in mechanisms (routing, validation, config, migrations, testing, logging). Do not hand-roll parallels.
  • No hacks, workarounds, or monkey-patches. No suppressing type errors, lint warnings, or exceptions to make code pass. Fix root causes, not symptoms.
  • Simple, concise, robust. Less code is better when it's correct.
  • Look it up. The Generator queries context7 for library/framework/API docs before writing; the Evaluator cross-checks against current docs while auditing. WebSearch and Explore / general-purpose subagents handle other unknowns. Deprecated or incorrect API usage fails the deliverable.

Real-World Scenarios

Build a new feature with actual verification

> /adveloop "minimal URL shortener with SQLite"

Planner drafts 5 build-mode deliverables and shows them for approval.
You approve. For each deliverable:

  Generator pane (fresh session):
    writes routes, schema, tests; runs them; writes a summary.

  Evaluator pane (fresh session):
    curls every endpoint, feeds malformed input, verifies status codes
    match the deliverable, kills the dev server, returns pass/fail JSON.

Fail → feedback loops back to a new Generator round (up to 3).
Pass → advance to the next deliverable.

Audit existing code before shipping

> /adveloop "review /login for XSS and input validation; fix issues"

Planner drafts a review-mode deliverable.

Evaluator pane runs first against the current code:
  tries <script> payloads, malformed JSON, oversized input;
  returns: passed=false, evidence="Reflected <script> in error page",
           notes="src/auth/login.ts:42 — renders userInput without escaping".

That verdict becomes feedback-0.json.

Generator pane spawns with the verdict as feedback:
  patches the handler, adds tests, re-runs.

Evaluator spawns again, re-exercises, confirms pass — deliverable done.

Resume after a crash

> /adveloop

Planner reads .adveloop/deliverables.md and scans .adveloop/tasks/:
  1. Persist layer       [build]  — passed
  2. Auth middleware     [build]  — partial (gen-result.md exists, no eval)
  3. Rate limiter        [build]  — pending
  4. Harden /login       [review] — pending

Ask: Resume / Rewrite / Abort.
On Resume, #1 is skipped, #2 jumps straight to the Evaluator,
#3 and #4 run from their mode-appropriate entry points.

Acknowledgments

License

MIT

About

GAN-inspired adversarial development loop — Planner directs Generator + Evaluator in fresh cmux panes, gated by hard pass/fail per deliverable. Adapted from Anthropic's harness-design guidance for long-running agentic apps.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors