Composable Claude Code skills that enforce a correctness-oriented development workflow. Spec before you code. Test before you implement. Never let an agent grade its own work.
AI coding assistants are fast but sloppy. They write code that works for the happy path, skip edge cases, and silently introduce bugs that don't surface until production. The same model that wrote the code will review it and say "looks good" — because it's confirming its own decisions.
Correctless fixes this by structuring the workflow so that every phase is executed by a different agent with a different lens:
- The spec agent asks "what does correct mean?" before any code exists
- The test agent writes tests from the spec without knowing the implementation plan
- The implementation agent makes the tests pass without having written them
- The QA agent hunts for bugs with neither the test author's nor the implementer's blind spots
- The verification agent checks spec-to-code correspondence without insider knowledge of the implementation
Same model, same weights — but the framing determines what the agent finds. A "review this code" prompt produces weaker results than "find the ways this code fails under concurrent access."
For web apps, APIs, CLI tools, and everyday development. Five skills, lightweight specs, enforced TDD with agent separation.
/cspec → /creview → /ctdd → /cverify → /cdocs
~5 minutes of overhead per feature. You get specs before code, enforced TDD, a skeptical review pass, and living documentation. No formal methods, no convergence audits, no threat modeling.
For security-critical infrastructure, network proxies, financial systems, and anything where a bug is a vulnerability. Ten skills, formal modeling, multi-agent adversarial review, convergence-based auditing.
/cspec → /cmodel → /creview-spec → /ctdd → /cverify → /cupdate-arch → /cdocs → /caudit
~15-30 minutes of overhead per feature. You get everything in Lite plus: formal Alloy modeling, STRIDE threat analysis, multi-agent adversarial spec review, mutation testing, drift debt tracking, external model cross-checking, and a postmortem feedback loop that makes the workflow improve over time.
| Building... | Use |
|---|---|
| A SaaS dashboard, API, CLI tool, content site | Lite |
| Something that handles user auth or payments | Lite, upgrade to Full when scope grows |
| A network proxy, security tool, or infrastructure | Full |
| A prototype or exploration | Neither — just code |
You can upgrade from Lite to Full incrementally. Existing specs, antipatterns, and architecture docs carry over.
Every feature starts with a short spec that defines what "correct" means — testable rules, not vague goals. The spec agent reads your architecture docs and known bug patterns, asks directed questions, then produces a structured document.
A fresh agent that didn't write the spec reads it cold and looks for what's missing: unstated assumptions, untestable rules, edge cases, known antipatterns. In Full, this is a four-agent adversarial team.
Hooks block source code edits until tests exist. The test agent writes from the spec's perspective. A separate implementation agent makes the tests pass. A third QA agent reviews both. Phase transitions are gated — you can't skip RED or advance with failing tests.
A fresh agent checks that the implementation actually matches the spec. Rule coverage matrix. Dependency audit. Architecture compliance. In Full: mutation testing, prohibition checks, drift detection.
Architecture docs, agent context files, and antipattern checklists stay current. The antipatterns list grows from real bugs — each post-merge issue feeds back into future spec reviews.
In Claude Code:
/plugin marketplace add joshft/correctless
Then install the version you want:
# Lite — 5 skills, lightweight specs, enforced TDD
/plugin install correctless-lite
# OR Full — 10 skills, formal modeling, adversarial review, auditing
/plugin install correctless
Then run setup:
/csetup
The /csetup skill auto-detects your language (Go, TypeScript, Python, Rust), test runner, and available tools. It registers hooks, creates templates, and generates a config file.
git clone https://github.com/joshft/correctless.git .claude/skills/workflow
cd .claude/skills/workflow && ./setupThis installs everything (both Lite and Full skills). The config determines which mode is active — Lite by default. To enable Full mode, add "intensity": "standard" (or "high" / "critical") to the workflow section of .claude/workflow-config.json and re-run ./setup.
Plugin install: Claude Code's plugin update doesn't always pull the latest code. To update reliably:
/plugin uninstall correctless
/plugin marketplace remove correctless
/plugin marketplace add joshft/correctless
/plugin install correctless
Then restart Claude Code.
Git clone install: Just pull:
cd .claude/skills/workflow && git pull && ./setupgit checkout -b feature/my-feature
/cspec
Via plugin (recommended):
/plugin marketplace add joshft/correctless
/plugin install correctless-lite # or: /plugin install correctless
/csetup
Via git clone:
git clone https://github.com/joshft/correctless.git .claude/skills/workflow
cd .claude/skills/workflow && ./setupReview the generated .claude/workflow-config.json and fill in ARCHITECTURE.md with at least a few entries about your project's patterns.
Run these in Claude Code (the CLI or IDE extension). Each command is a slash command:
/csetup # Initialize project (detect language, register hooks, create templates)
/cspec # Start a new feature — creates a spec with testable rules
/creview # Skeptical review of the spec by a fresh agent
/ctdd # Enforced TDD: RED (tests) → GREEN (impl) → QA
/cverify # Check implementation matches spec
/cdocs # Update project documentation
/cstatus # Show current phase and suggested next steps
The state machine enforces ordering. You can't run /ctdd without a reviewed spec, and you can't implement without failing tests.
Enable Full mode by adding "intensity": "standard" to your config's workflow section and re-running /csetup.
/cspec # Enhanced spec with typed invariants, STRIDE, templates
/cmodel # Generate and run Alloy formal model (if formal_model: true)
/creview-spec # Multi-agent adversarial review (replaces /creview)
/ctdd # Same TDD + mutation testing + tdd-verify phase
/cverify # Enhanced: mutation testing, drift detection, cross-spec impact
/caudit # Multi-round convergence audit (run on audit/* branch)
/cupdate-arch # Maintain ARCHITECTURE.md after features land
/cdocs # Enhanced: Mermaid diagrams, fact-checking subagent
/cpostmortem # Analyze post-merge bugs, feed back into workflow
The workflow state machine runs through a shell script. You rarely need to call it directly — the skills handle transitions — but it's useful for checking status and recovering from issues:
# Check where you are
.claude/skills/workflow/hooks/workflow-advance.sh status
# See all active workflows across branches
.claude/skills/workflow/hooks/workflow-advance.sh status-all
# If the gate is blocking a legitimate edit (edge case, pattern matching bug)
.claude/skills/workflow/hooks/workflow-advance.sh override "reason"
# Understand why a file is blocked
.claude/skills/workflow/hooks/workflow-advance.sh diagnose "src/api/register.ts"
# Spec turned out to be wrong mid-implementation
.claude/skills/workflow/hooks/workflow-advance.sh spec-update "R-002 was ambiguous about Unicode"
# Nuclear option — remove all workflow state for current branch
.claude/skills/workflow/hooks/workflow-advance.sh resetHere's what a real session looks like in Claude Code:
you: git checkout -b feature/user-registration
you: /cspec
claude: I'll read ARCHITECTURE.md and the antipatterns list first...
What are you building? I have a few questions:
1. What does the registration endpoint accept and return?
2. What happens on invalid input or duplicate emails?
3. How is the password stored?
you: POST /register takes email + password, returns 201 with JWT.
Invalid input → 400 with field errors. Duplicate → 409.
Bcrypt, cost 12.
claude: Here's the spec with 5 rules: [writes docs/specs/user-registration.md]
R-001: valid registration returns 201 + JWT
R-002: invalid email returns 400 with field error
...
Does this look right?
you: Add a rule for what happens when the DB is down.
claude: Added R-006: database unavailable returns 503.
Spec approved? [advances to review phase]
you: /creview
claude: [fresh agent reads the spec cold]
Three findings:
1. R-001 doesn't specify JWT expiry — untestable as written
2. Missing edge case: email with unicode characters
3. R-005 says bcrypt but no minimum cost factor
Proposed rewrites: [shows changes]
you: Accept all three. Skip the unicode one for now.
claude: Spec updated. [advances to tdd-tests phase]
you: /ctdd
claude: Starting RED phase with a separate test agent...
[test agent writes register.test.ts with tests for R-001 through R-006]
Tests written. Running tests — they fail as expected.
[advances to tdd-impl]
Starting GREEN phase with a separate implementation agent...
[impl agent writes register.ts, user.repo.ts]
All tests pass.
Tests pass. Consider running /simplify to clean up before QA.
you: skip simplify, go to QA
claude: Starting QA with a third agent...
[QA agent reviews implementation against spec]
1 finding (BLOCKING): R-005 test checks bcrypt prefix but not cost factor.
Test would pass with cost 1.
you: fix it
claude: [returns to GREEN, strengthens test, re-runs QA]
QA clean. Workflow complete.
you: /cverify
claude: Rule Coverage: 6/6 covered
Dependencies: +bcrypt@5.1.0 (password hashing)
Architecture: ✓ follows validation-at-boundary pattern
Overall: PASS
you: /cdocs
claude: Updated AGENT_CONTEXT.md with /register endpoint.
Suggested ARCHITECTURE.md entry for auth pattern. Approve?
you: yes
claude: Done. Branch is ready to merge.
Full mode at high intensity on a Go security proxy:
you: git checkout -b feature/localhost-inspection
you: /cspec
claude: Reading ARCHITECTURE.md, antipatterns, drift debt...
This feature touches trust boundary TB-001. Loading network-protocol
and concurrency invariant templates.
Questions (batched — you clearly know this domain):
1. What's the adversary model for localhost inspection?
2. How do you prevent routing loops when intercepting loopback traffic?
3. What's the failure mode — fail-open or fail-closed?
you: Adversary can send traffic with spoofed source IP. Loops prevented
via SO_MARK on all outbound. Fail-closed — if inspection fails,
drop the connection.
claude: [writes full spec with INV-001 through INV-006, PRH-001, STRIDE analysis]
Recommending intensity: critical (touches trust boundary, security invariants)
you: keep it at high
claude: Spec ready. Since formal_model is false, advancing to review-spec.
you: /creview-spec
claude: [spawns 4 agents: red team, assumptions, testability, design contracts]
Self-assessment agent: INV-003 (buffer containment) is hardest to test.
Red Team: "INV-001 doesn't cover IPv6 loopback (::1). Add INV-007."
Assumptions: "Spec assumes CAP_NET_ADMIN — not in ARCHITECTURE.md."
Testability: "INV-003 needs a concrete test approach — propose PBT."
Design Contracts: "New goroutine needs shutdown path per ABS-002."
2 agents agree on the IPv6 gap. Accept?
you: accept all findings
claude: Spec updated with INV-007 and INV-008. [advances to tdd-tests]
you: /ctdd
claude: RED phase (test agent, Opus)...
[writes tests for all 8 invariants + prohibition grep check]
Tests fail as expected. [advances to impl]
GREEN phase (impl agent, Sonnet)...
[implements inspection, dialer integration, buffer lifecycle]
All tests + race detector pass.
Consider running /simplify before QA.
you: /simplify
claude: [simplify agents clean up 3 issues]
you: continue
claude: QA round 1 (third agent)...
2 findings: 1 high (mutation: removing SO_MARK check doesn't fail a test),
1 medium (error message includes buffer address).
[fix round → QA round 2]
QA round 2: clean. [advances to tdd-verify]
Verification phase: tests pass, coverage 84% on new packages,
all findings resolved, no test weakening detected.
[advances to done]
you: /cverify
claude: Invariant Coverage: 8/8 covered (2 weak — INV-003, INV-006)
Mutation Testing: 10 mutations, 9 killed, 1 survivor on INV-006
Prohibition: ✓ no direct net.Dial outside pkg/dialer/
Cross-spec impact: detection-rules INV-002 references inspection buffer — still holds
Complexity: 180 LOC actual vs 150 estimated (within budget)
PASS with 1 finding (strengthen INV-006 test)
you: /cupdate-arch
claude: Proposing TB-003: Loopback Interception boundary.
[presents entry] Approve?
you: yes
you: /cdocs
claude: Updated AGENT_CONTEXT.md, created docs/features/localhost-inspection.md,
updated docs/diagrams/data-flow.mermaid. Cleaned up state file.
Ready to merge.
Later, before a release:
you: git checkout -b audit/security-2026-04-15
you: /caudit security
claude: Audit round 1: spawning 5 agents (encoding bypass, protocol abuse,
config manipulation, detection bypass, regression hunter)...
4 findings: 1 critical (race in buffer cleanup under concurrent close),
2 high, 1 medium.
you: fix all critical and high
claude: [fix round with TDD for each finding, auto-commits]
Audit round 2: fresh agent team...
0 critical, 0 high, 1 medium (same as round 1 — persistent).
Converged. Writing regression tests for fixed findings.
Updating antipatterns: AP-007 (buffer cleanup race).
| Language | Test Runner | Mutation Tool | PBT Library |
|---|---|---|---|
| Go | go test |
go-mutesting | rapid |
| TypeScript | jest/vitest | Stryker | fast-check |
| Python | pytest | mutmut | hypothesis |
| Rust | cargo test | cargo-mutants | proptest |
Mutation testing and property-based testing are Full-only features. Lite works with any language that has a test runner.
| Lite | Full | |
|---|---|---|
| Skills | 5 | 10 |
| Spec format | 5 sections, simple rules | 12+ sections, typed invariants |
| Review | Single-pass, one agent | 4-agent adversarial team |
| TDD enforcement | Hooks + agent separation | Hooks + allowed-tools + agent separation |
| Formal modeling | No | Alloy (optional) |
| Threat analysis | No | STRIDE at high/critical intensity |
| Mutation testing | No | Deterministic tools + LLM fallback |
| Convergence audit | No | Multi-round with fresh agents |
| External model review | No | Configurable (Codex, Gemini CLIs) |
| Feedback loop | Manual antipatterns | Postmortem skill + meta-verification |
| Overhead per feature | ~5 min | ~15-30 min |
- Claude Code CLI
- A Claude Max subscription ($100/mo or $200/mo plan). Correctless spawns multiple agents per feature — a spec review alone can use 4+ parallel agents, and the TDD workflow spawns separate agents for test writing, implementation, and QA. The standard Pro plan will hit rate limits quickly. The $200/mo Max plan with higher rate limits is recommended, especially for Full mode.
- A project with a test runner
Optional (Full only):
- Alloy Analyzer for formal modeling
- Mutation testing tool for your language
- External model CLIs (Codex, Gemini) for cross-checking
correctless/
├── .claude-plugin/
│ └── marketplace.json # Marketplace manifest (lists both plugins)
│
├── correctless-lite/ # Lite plugin (install via /plugin install)
│ ├── .claude-plugin/plugin.json
│ ├── setup # Install script
│ ├── hooks/
│ │ ├── workflow-gate.sh # PreToolUse hook — phase-based edit gating
│ │ └── workflow-advance.sh # State machine
│ ├── skills/
│ │ ├── c-setup/SKILL.md # /csetup — initialize project
│ │ ├── c-spec/SKILL.md # /cspec — feature specification
│ │ ├── c-review/SKILL.md # /creview — skeptical single-pass review
│ │ ├── c-tdd/SKILL.md # /ctdd — enforced TDD with agent separation
│ │ ├── c-verify/SKILL.md # /cverify — post-implementation verification
│ │ ├── c-docs/SKILL.md # /cdocs — living documentation
│ │ └── c-status/SKILL.md # /cstatus — show phase and next steps
│ └── templates/ # Config and doc templates
│
├── correctless-full/ # Full plugin (install via /plugin install)
│ ├── .claude-plugin/plugin.json
│ ├── setup # Install script (Full mode)
│ ├── hooks/ # Same hooks, support both modes
│ ├── skills/
│ │ ├── c-setup/SKILL.md # /csetup — initialize with intensity selection
│ │ ├── c-spec/SKILL.md # /cspec — typed invariants, STRIDE, templates
│ │ ├── c-model/SKILL.md # /cmodel — Alloy formal modeling
│ │ ├── c-review-spec/SKILL.md # /creview-spec — multi-agent adversarial review
│ │ ├── c-tdd/SKILL.md # /ctdd — TDD + mutation testing
│ │ ├── c-verify/SKILL.md # /cverify — mutation testing, drift detection
│ │ ├── c-audit/SKILL.md # /caudit — convergence-based auditing
│ │ ├── c-update-arch/SKILL.md # /cupdate-arch — maintain ARCHITECTURE.md
│ │ ├── c-docs/SKILL.md # /cdocs — living documentation
│ │ ├── c-postmortem/SKILL.md # /cpostmortem — post-merge bug analysis
│ │ └── c-status/SKILL.md # /cstatus — show phase and next steps
│ ├── templates/
│ │ └── invariants/ # 6 invariant templates for /cspec
│ └── helpers/ # PBT library guides (Go, Python, TS, Rust)
│
├── hooks/ # Source hooks (used by git-clone install)
├── skills/ # Source skills (used by git-clone install)
├── templates/ # Source templates
├── helpers/ # Source PBT helpers
├── setup # Source setup script
├── README.md
├── correctless.md # Full spec
└── correctless-lite.md # Lite spec
Early release. Both Lite and Full implementations are functional. Setup, hooks, state machine, and all skill prompts are complete and tested. Real-world usage will surface rough edges — file issues as you find them.
MIT