code-loops

Multi-agent development pipeline orchestrator. Turns a one-line task description into shipped, reviewed, documented code via 27 specialized AI agents running through a deterministic Python pipeline.

Status: Pre-1.0. Battle-tested on a personal Python project (Claude CLI integration). Schema designed to be project-agnostic — should work on any git repo with appropriate config; cross-project validation is in progress.

What it does

You hand code-loops a task (free-text or markdown file). It runs that task through a 12-stage pipeline of specialized agents:

PRD          → Business Analyst writes a structured product brief with NFR gate
Research     → 5 specialists scan codebase / prompts / incidents / data / AI surface in parallel
Design       → Software Architect drafts an RFC; perspective lenses critique;
               debate-arbiter judges convergence (theme-based, not bug-count)
Design Review→ Safety + Elegance + Hallucination + AI critics review;
               Architect responds; Review-Arbiter emits verdict
               (approved / needs_revision / redesign_needed)
Impl Plan    → Tech Lead decomposes design into atomic, file-disjoint subtasks
               (TDD-ordered, dependency-aware, optional wave grouping)
Implementation
             → For each subtask: optional Prompt Engineer / Dataset Curator / Eval Engineer →
               QA Engineer writes failing tests (locked chmod 444) →
               Software Engineer implements →
               Code Reviewer audits diff →
               Triage Engineer routes failures (max 3 attempts/target)
Validation   → Programmatic gate: pytest, ruff, file-coverage check (no LLM)
Regression   → Conditional eval-bench gate (no LLM). Off by default; opt-in
               via project.yaml. Runs the project's bench, compares each
               metric vs saved baseline, fails if any drops > threshold_pct.
               First run captures baseline.
Release Review → Release Manager gates semantic compliance vs PRD/RFC.
                 Can issue corrective_subtasks → engine re-enters Implementation
Release Docs → Tech Writer produces changelog + ADR + maintenance notes
               (flags brief.md staleness)
Auto Resurvey → If maintenance notes flagged staleness, project-surveyor
                regenerates projects/<name>/brief.md automatically (else skip)

At 4 stages you get a human-review checkpoint (approve / abort / revise with comment). Auto-loops handle two failure modes: critique detecting a patching anti-pattern bubbles back to design with a redesign signal; release-review detecting missing implementation appends corrective subtasks and re-enters Implementation.

⚠️ Python-only today. The validation stage and several agent prompts hardcode pytest + ruff. The orchestrator itself, project config, worktree management, and TDD loop are language-agnostic in shape — but a NodeJS / Go / Rust user would hit failures in the validation gate. Multi-language support (configurable test_command / lint_command per project) is the top roadmap item; see Roadmap.

Install

Prerequisites

Python ≥ 3.11
uv — Python package manager
Anthropic Claude CLI on your $PATH, authenticated. The orchestrator shells out to claude --print for every agent invocation.
git (worktrees require ≥ 2.5)

Option 1: install as a tool (recommended)

uv tool install git+https://github.com/icyberdeveloper/code-loops.git
code-loops --help

Now code-loops is on your $PATH everywhere. pipeline.yaml and the 27 agent prompts ship inside the wheel as package data.

Option 2: clone for development

git clone https://github.com/icyberdeveloper/code-loops.git
cd code-loops
uv sync
uv run code-loops --help

Use this if you want to modify agent prompts or pipeline.yaml.

Workspace

code-loops creates tasks/ and projects/ subdirectories in your current working directory. Run it from a directory you have write access to (e.g. ~/code-loops-workspace). Override via $CODE_LOOPS_WORKSPACE env var.

That's it for the orchestrator itself. Real cost lives in Anthropic API usage during pipeline runs (see Costs below).

Setup — bootstrap your project

code-loops operates on a target project — the codebase you want the pipeline to evolve. You bootstrap it once:

uv run code-loops init /absolute/path/to/your/project

This:

Creates projects/<name>/project.yaml with name + base_repo.
Invokes the project-surveyor agent against your repo (~$0.30–3.00, 1–6 min — depends on project size). Surveyor scans README / CLAUDE.md / source tree and writes projects/<name>/brief.md documenting architecture, layout, key modules, storage layer, RAG/vector search (if any), conventions, domain glossary, external integrations, and rules every downstream agent should follow.

The brief is auto-loaded into every code-loops agent via the {PROJECT_BRIEF} placeholder — agents work with full project context without you having to feed it manually.

If you have multiple projects, pass --project <name> on subsequent commands; with one project it's auto-selected.

To skip the surveyor LLM call (dev/cheap path):

uv run code-loops init /path/to/project --no-survey

Brief gets a placeholder — you can edit it manually or run uv run code-loops resurvey <name> later.

Usage

Create a task

Pass either a free-text description or a path to a .md file. The CLI auto-detects:

uv run code-loops new "Add /export-data command for weekly meeting export"
uv run code-loops new path/to/postmortems/2026-05-06_timeout_incident.md
uv run code-loops new ~/notes/feature_idea.md

Mode (feature vs from_problem) is auto-detected from path keywords (problem, postmortem, incident) and content markers (## Postmortem, ## Incident, ## Problem, ## Symptoms, etc — Russian equivalents also recognized for bilingual input).

Output: tasks/<NNNN>_<slug>/ with task.md and meta.yaml.

Run the pipeline

uv run code-loops run <task_id>

Or with explicit project: --project <name>. Pipeline streams progress to your terminal and pauses at human-review checkpoints.

Other commands

uv run code-loops projects              # list configured projects
uv run code-loops list                  # list all tasks (status + cost)
uv run code-loops status <task_id>      # single-task progress + cost
uv run code-loops commit <task_id>      # print branch + push instructions for a completed task
uv run code-loops cancel <task_id>      # mark task cancelled (artifacts preserved)
uv run code-loops resurvey <name>       # refresh brief.md after material project changes
uv run code-loops eval                  # pipeline-evaluator: meta-analysis over recent runs

Architecture

Layout

code-loops/
├── pyproject.toml          # package metadata + `code-loops` entry point
├── src/code_loops/         # the package (ships in the wheel as installed data)
│   ├── pipeline.yaml       #   ⭐ stage definitions (12 stages, types, role bindings)
│   ├── agents/             #   27 agent prompts in 6 family folders:
│   │   ├── strategy/       #     business-analyst, tech-lead
│   │   ├── research/       #     research-lead + 5 researchers
│   │   ├── architects/     #     software-architect + 7 architect-* (perspective, arbiters, 4 critics)
│   │   ├── engineering/    #     qa, software, code-reviewer, triage, prompt, eval engineers
│   │   ├── release/        #     release-manager, tech-writer
│   │   └── meta/           #     pipeline-evaluator, project-surveyor
│   ├── engine.py           #   orchestrator: loads pipeline, dispatches by stage type,
│   │                       #   handles auto-loops (redesign_needed, needs_more_work)
│   ├── runner.py           #   ClaudeRunner — claude --print subprocess wrapper
│   ├── project_loader.py   #   project.yaml loader + brief injection
│   ├── meta.py             #   per-task meta.yaml (status, cost, durations)
│   ├── worktree.py         #   git worktree mgmt + configurable test-file protection
│   ├── isolation.py        #   research-question slicing (one researcher = one tag)
│   ├── human_review.py     #   checkpoint UI (approve/abort/revise)
│   ├── eval_aggregator.py  #   cross-run aggregation for pipeline-evaluator
│   ├── cli.py              #   typer entry point (exposed as `code-loops` command)
│   └── stages/             #   stage handlers (one per `type:` in pipeline.yaml):
│       ├── prompt.py, parallel.py, debate_writer.py, debate_critique.py,
│       ├── impl_planner.py, subtask_iterator.py, action.py,
│       ├── final_validation.py, regression_check.py, final_review.py,
│       ├── tech_writer.py,
│       └── auto_resurvey.py  # Stage 12 — conditional brief.md refresh
├── examples/               # starter templates (project.yaml)
├── scripts/                # CI helpers (e.g. check_no_leakage.sh)
└── tests/                  # 214 pytest tests (orchestrator only)

# WORKSPACE (created in user's CWD when running code-loops):
<workspace>/
├── projects/<name>/        # per-project config + auto-generated brief
│   ├── project.yaml        #   name + base_repo + optional test_infrastructure
│   └── brief.md            #   auto-generated project knowledge
├── tasks/<NNNN>_<slug>/    # per-task workspaces
│   ├── task.md, meta.yaml
│   ├── prd/, research_plan/, research/, design/, design_review/,
│   │   impl_plan/, implementation/, validation/, release_review/, docs/
│   └── worktree/wt/        # git worktree off base_repo
└── _eval/                  # pipeline-evaluator reports

Pipeline definition (`pipeline.yaml`)

Each stage declares name (semantic id), type (engine handler), prompts, inputs, outputs, and optional human_review: true, max_rounds: N. See pipeline.yaml for the full 12-stage definition; top of file documents the schema. A defaults: block sets model + effort for all stages (override per-role if needed).

Project profile (`projects/<name>/project.yaml`)

Minimal schema (defaults preserve sensible behavior):

project:
  name: my-project
  base_repo: /absolute/path/to/your/project

# Optional — markdown file with project-specific architecture/conventions.
# Auto-generated by `code-loops resurvey` (project-surveyor agent).
brief_file: brief.md

# Optional — test infrastructure config (defaults below preserve prior behavior).
test_infrastructure:
  enabled: true                    # false → skip test_writer entirely (manual-QA projects)
  test_paths: [tests]              # dirs the coder MUST NOT touch
  lock_strategy: chmod_444_dir     # | none (no chmod, only git-diff guard)

See examples/project.yaml for full annotated template.

Agent prompts (`agents/`)

25 markdown files, one per agent. Each has:

Role identity opening line
## Project context block with {PROJECT_BRIEF} placeholder (auto-substituted at load time)
Domain-specific scan plan / output schema / rules

Customize freely — agents are markdown, not code. Pipeline.yaml binds agents to roles by file path.

Customize for your project

Different test infrastructure

Edit projects/<name>/project.yaml:

test_infrastructure:
  enabled: true
  test_paths: [src/test, e2e]      # multiple test dirs
  lock_strategy: chmod_444_dir     # | none

For projects with embedded tests (Rust #[cfg(test)], Go *_test.go colocated): set lock_strategy: none — git-diff guard remains active as a safety net even without chmod. Glob-based locking (chmod_444_glob) and embedded-test detection are deferred until real non-Python projects exercise the pipeline.

Refresh project brief

After material project changes (new modules, new dependencies, new conventions, renamed dirs), regenerate the brief:

uv run code-loops resurvey <name>

The tech-writer stage automatically flags resurvey need in tasks/<id>/docs/maintenance_notes.md after each task ships.

Tune agents

Every agent prompt is a markdown file under agents/. To change agent behavior: edit the file, run a task, observe. No engine restart needed. For systematic A/B comparison run code-loops eval — pipeline-evaluator detects prompt diffs in git and computes Cohen's d / p-values across recent runs.

Add a new project

uv run code-loops init /path/to/another/project --name backend-api

Each project lives in its own projects/<name>/ dir. Use --project backend-api on subsequent commands to disambiguate.

Costs

Typical per-task spend (Opus-4.7 at max effort, 2026 pricing):

Stage	Cost (typical)	Notes
PRD	$0.05–0.15	Single Opus call
Research plan	$0.05–0.10	Single Opus call
Research (5 parallel)	$0.30–1.00	5 specialists, deep scans
Design (RFC debate)	$1–4	2–5 rounds × (writer + perspectives + arbiter)
Design Review (critique)	$0.60–2.50	1–3 rounds × (4 critics + responder + arbiter)
Impl Plan	$0.20–0.50	Tech-lead decomposition
Implementation	$1–5	per subtask × subtask count, depends on fix-loop iterations
Validation	$0.00	Programmatic only
Regression	$0.00	Programmatic; off by default. When on, runs project's eval bench.
Release Review	$0.30–1	Single release-manager call
Release Docs	$0.05–0.20	tech-writer
Auto Resurvey	$0–3	$0 if brief stays accurate (typical); $0.30–3 if tech-writer flagged staleness
Total per task	$3–18	Varies hugely with task complexity

Project-surveyor on init: $0.30–3.00 once per project (re-runs only when Stage 11 auto-fires or you call code-loops resurvey manually).

To reduce cost: override model to claude-sonnet-4-6 per-stage in pipeline.yaml for stages where Opus is overkill (research / debate critics / facilitator). The pipeline-evaluator (Mode B) helps identify which stages tolerate downgrade.

Observability & quality monitoring

Every run writes tasks/<id>/meta.yaml with per-stage cost, duration, attempts, and verdicts. Run:

uv run code-loops eval --last 20

…to invoke the pipeline-evaluator agent (Mode B). It aggregates recent runs and produces a report covering:

Convergence rate — % runs reaching release_review.approved first-pass
Per-stage retry rate — debate rounds, fix-router bounces
Code-quality scorecard trend — 5-axis weighted (correctness / maintainability / performance / security / best-practices)
A/B prompt comparison — when agents/<role>.md changed in git, computes χ² + Welch's t-test + Cohen's d across before/after runs
Hallucination rate per agent — scans cited file paths and grep- verifies they exist in the target project
Context-length × degradation tracking — flags stages exceeding 70% of model's safe context range (RULER thresholds for Claude Opus 4.5 ~100K, Sonnet 4.5 ~80K)
Pass@k tracking — for AI-touching subtasks with golden eval files

Reports land at _eval/report_<timestamp>.md.

Roadmap

Done:

Full 12-stage pipeline with auto-loops (redesign_needed, needs_more_work, auto-resurvey) and conditional regression gate
27 agents in 6 families with {PROJECT_BRIEF} injection
Configurable test infrastructure (Python tests/ default; pluggable)
init / resurvey / projects / eval CLI commands
195 pytest tests covering orchestrator + worktree + agents

Next:

Multi-language support — today the validation stage and several agent prompts hardcode pytest + ruff. Move test/lint/typecheck commands into project.yaml so NodeJS / Go / Rust / etc projects can configure their own. (#1)
chmod_444_glob (Go-style *_test.go), git_diff_only (Rust embedded tests), surveyor auto-detect of test paths into project.yaml
Real parallel subtask execution (currently sequential even for wave-marked subtasks; deferred until measured wall-clock pain)
Shared prompt blocks (language rule, Iron Law, revision mode) — deferred until first production runs measure real drift

Not planned:

Built-in support for non-Anthropic LLM providers — we use Claude CLI exclusively. PRs welcome if there's interest.

Contributing

This is a personal tool that grew into something potentially useful. PRs welcome but expect:

Strict pre-commit gate: uv run pytest && uv run ruff check .
New stage handlers / agent prompts must include tests + a rationale in the PR description (why this addition vs simpler alternatives).
Backward-compat preserved by default — additive changes only unless there's a clear migration path.

License

MIT — see LICENSE.

Acknowledgments

Pipeline shape inspired by patterns from obra/superpowers, awesome-ai-dev-prompts, xfstudio/skills, Fandry96/k3-agentic-skills, and Agent Skills for Context Engineering.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

code-loops

What it does

Install

Prerequisites

Option 1: install as a tool (recommended)

Option 2: clone for development

Workspace

Setup — bootstrap your project

Usage

Create a task

Run the pipeline

Other commands

Architecture

Layout

Pipeline definition (`pipeline.yaml`)

Project profile (`projects/<name>/project.yaml`)

Agent prompts (`agents/`)

Customize for your project

Different test infrastructure

Refresh project brief

Tune agents

Add a new project

Costs

Observability & quality monitoring

Roadmap

Contributing

License

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.github/workflows		.github/workflows
examples		examples
projects		projects
scripts		scripts
src/code_loops		src/code_loops
tasks		tasks
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

code-loops

What it does

Install

Prerequisites

Option 1: install as a tool (recommended)

Option 2: clone for development

Workspace

Setup — bootstrap your project

Usage

Create a task

Run the pipeline

Other commands

Architecture

Layout

Pipeline definition (pipeline.yaml)

Project profile (projects/<name>/project.yaml)

Agent prompts (agents/)

Customize for your project

Different test infrastructure

Refresh project brief

Tune agents

Add a new project

Costs

Observability & quality monitoring

Roadmap

Contributing

License

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Pipeline definition (`pipeline.yaml`)

Project profile (`projects/<name>/project.yaml`)

Agent prompts (`agents/`)

Packages