Skip to content

IanJSaul/CorrectLess

 
 

Repository files navigation

Correctless

Composable Claude Code skills that enforce a correctness-oriented development workflow. Spec before you code. Test before you implement. Never let an agent grade its own work.

The Problem

AI coding assistants are fast but sloppy. They write code that works for the happy path, skip edge cases, and silently introduce bugs that don't surface until production. The same model that wrote the code will review it and say "looks good" — because it's confirming its own decisions.

Correctless fixes this by structuring the workflow so that every phase is executed by a different agent with a different lens:

  • The spec agent asks "what does correct mean?" before any code exists
  • The test agent writes tests from the spec without knowing the implementation plan
  • The implementation agent makes the tests pass without having written them
  • The QA agent hunts for bugs with neither the test author's nor the implementer's blind spots
  • The verification agent checks spec-to-code correspondence without insider knowledge of the implementation

Same model, same weights — but the framing determines what the agent finds. A "review this code" prompt produces weaker results than "find the ways this code fails under concurrent access."

Two Versions

Correctless Lite

For web apps, APIs, CLI tools, and everyday development. Five skills, lightweight specs, enforced TDD with agent separation.

/cspec → /creview → /ctdd → /cverify → /cdocs

~5 minutes of overhead per feature. You get specs before code, enforced TDD, a skeptical review pass, and living documentation. No formal methods, no convergence audits, no threat modeling.

Full spec →

Correctless (Full)

For security-critical infrastructure, network proxies, financial systems, and anything where a bug is a vulnerability. Ten skills, formal modeling, multi-agent adversarial review, convergence-based auditing.

/cspec → /cmodel → /creview-spec → /ctdd → /cverify → /cupdate-arch → /cdocs → /caudit

~15-30 minutes of overhead per feature. You get everything in Lite plus: formal Alloy modeling, STRIDE threat analysis, multi-agent adversarial spec review, mutation testing, drift debt tracking, external model cross-checking, and a postmortem feedback loop that makes the workflow improve over time.

Full spec →

Which One?

Building... Use
A SaaS dashboard, API, CLI tool, content site Lite
Something that handles user auth or payments Lite, upgrade to Full when scope grows
A network proxy, security tool, or infrastructure Full
A prototype or exploration Neither — just code

You can upgrade from Lite to Full incrementally. Existing specs, antipatterns, and architecture docs carry over.

How It Works

1. Spec Before Code

Every feature starts with a short spec that defines what "correct" means — testable rules, not vague goals. The spec agent reads your architecture docs and known bug patterns, asks directed questions, then produces a structured document.

2. Skeptical Review

A fresh agent that didn't write the spec reads it cold and looks for what's missing: unstated assumptions, untestable rules, edge cases, known antipatterns. In Full, this is a four-agent adversarial team.

3. Enforced TDD

Hooks block source code edits until tests exist. The test agent writes from the spec's perspective. A separate implementation agent makes the tests pass. A third QA agent reviews both. Phase transitions are gated — you can't skip RED or advance with failing tests.

4. Verification

A fresh agent checks that the implementation actually matches the spec. Rule coverage matrix. Dependency audit. Architecture compliance. In Full: mutation testing, prohibition checks, drift detection.

5. Living Documentation

Architecture docs, agent context files, and antipattern checklists stay current. The antipatterns list grows from real bugs — each post-merge issue feeds back into future spec reviews.

Quick Start

Via Plugin Marketplace (recommended)

In Claude Code:

/plugin marketplace add joshft/correctless

Then install the version you want:

# Lite — 5 skills, lightweight specs, enforced TDD
/plugin install correctless-lite

# OR Full — 10 skills, formal modeling, adversarial review, auditing
/plugin install correctless

Then run setup:

/csetup

The /csetup skill auto-detects your language (Go, TypeScript, Python, Rust), test runner, and available tools. It registers hooks, creates templates, and generates a config file.

Via Git Clone (alternative)

git clone https://github.com/joshft/correctless.git .claude/skills/workflow
cd .claude/skills/workflow && ./setup

This installs everything (both Lite and Full skills). The config determines which mode is active — Lite by default. To enable Full mode, add "intensity": "standard" (or "high" / "critical") to the workflow section of .claude/workflow-config.json and re-run ./setup.

Updating

Plugin install: Claude Code's plugin update doesn't always pull the latest code. To update reliably:

/plugin uninstall correctless
/plugin marketplace remove correctless
/plugin marketplace add joshft/correctless
/plugin install correctless

Then restart Claude Code.

Git clone install: Just pull:

cd .claude/skills/workflow && git pull && ./setup

After Install

git checkout -b feature/my-feature
/cspec

Usage

Installation

Via plugin (recommended):

/plugin marketplace add joshft/correctless
/plugin install correctless-lite          # or: /plugin install correctless
/csetup

Via git clone:

git clone https://github.com/joshft/correctless.git .claude/skills/workflow
cd .claude/skills/workflow && ./setup

Review the generated .claude/workflow-config.json and fill in ARCHITECTURE.md with at least a few entries about your project's patterns.

Lite Commands

Run these in Claude Code (the CLI or IDE extension). Each command is a slash command:

/csetup               # Initialize project (detect language, register hooks, create templates)
/cspec                # Start a new feature — creates a spec with testable rules
/creview              # Skeptical review of the spec by a fresh agent
/ctdd                 # Enforced TDD: RED (tests) → GREEN (impl) → QA
/cverify              # Check implementation matches spec
/cdocs                # Update project documentation
/cstatus              # Show current phase and suggested next steps

The state machine enforces ordering. You can't run /ctdd without a reviewed spec, and you can't implement without failing tests.

Full Commands (in addition to Lite)

Enable Full mode by adding "intensity": "standard" to your config's workflow section and re-running /csetup.

/cspec                # Enhanced spec with typed invariants, STRIDE, templates
/cmodel               # Generate and run Alloy formal model (if formal_model: true)
/creview-spec         # Multi-agent adversarial review (replaces /creview)
/ctdd                 # Same TDD + mutation testing + tdd-verify phase
/cverify              # Enhanced: mutation testing, drift detection, cross-spec impact
/caudit               # Multi-round convergence audit (run on audit/* branch)
/cupdate-arch         # Maintain ARCHITECTURE.md after features land
/cdocs                # Enhanced: Mermaid diagrams, fact-checking subagent
/cpostmortem           # Analyze post-merge bugs, feed back into workflow

State Management

The workflow state machine runs through a shell script. You rarely need to call it directly — the skills handle transitions — but it's useful for checking status and recovering from issues:

# Check where you are
.claude/skills/workflow/hooks/workflow-advance.sh status

# See all active workflows across branches
.claude/skills/workflow/hooks/workflow-advance.sh status-all

# If the gate is blocking a legitimate edit (edge case, pattern matching bug)
.claude/skills/workflow/hooks/workflow-advance.sh override "reason"

# Understand why a file is blocked
.claude/skills/workflow/hooks/workflow-advance.sh diagnose "src/api/register.ts"

# Spec turned out to be wrong mid-implementation
.claude/skills/workflow/hooks/workflow-advance.sh spec-update "R-002 was ambiguous about Unicode"

# Nuclear option — remove all workflow state for current branch
.claude/skills/workflow/hooks/workflow-advance.sh reset

Typical Session (Lite)

Here's what a real session looks like in Claude Code:

you:    git checkout -b feature/user-registration

you:    /cspec

claude: I'll read ARCHITECTURE.md and the antipatterns list first...
        What are you building? I have a few questions:
        1. What does the registration endpoint accept and return?
        2. What happens on invalid input or duplicate emails?
        3. How is the password stored?

you:    POST /register takes email + password, returns 201 with JWT.
        Invalid input → 400 with field errors. Duplicate → 409.
        Bcrypt, cost 12.

claude: Here's the spec with 5 rules: [writes docs/specs/user-registration.md]
        R-001: valid registration returns 201 + JWT
        R-002: invalid email returns 400 with field error
        ...
        Does this look right?

you:    Add a rule for what happens when the DB is down.

claude: Added R-006: database unavailable returns 503.
        Spec approved? [advances to review phase]

you:    /creview

claude: [fresh agent reads the spec cold]
        Three findings:
        1. R-001 doesn't specify JWT expiry — untestable as written
        2. Missing edge case: email with unicode characters
        3. R-005 says bcrypt but no minimum cost factor
        Proposed rewrites: [shows changes]

you:    Accept all three. Skip the unicode one for now.

claude: Spec updated. [advances to tdd-tests phase]

you:    /ctdd

claude: Starting RED phase with a separate test agent...
        [test agent writes register.test.ts with tests for R-001 through R-006]
        Tests written. Running tests — they fail as expected.
        [advances to tdd-impl]

        Starting GREEN phase with a separate implementation agent...
        [impl agent writes register.ts, user.repo.ts]
        All tests pass.
        Tests pass. Consider running /simplify to clean up before QA.

you:    skip simplify, go to QA

claude: Starting QA with a third agent...
        [QA agent reviews implementation against spec]
        1 finding (BLOCKING): R-005 test checks bcrypt prefix but not cost factor.
        Test would pass with cost 1.

you:    fix it

claude: [returns to GREEN, strengthens test, re-runs QA]
        QA clean. Workflow complete.

you:    /cverify

claude: Rule Coverage: 6/6 covered
        Dependencies: +bcrypt@5.1.0 (password hashing)
        Architecture: ✓ follows validation-at-boundary pattern
        Overall: PASS

you:    /cdocs

claude: Updated AGENT_CONTEXT.md with /register endpoint.
        Suggested ARCHITECTURE.md entry for auth pattern. Approve?

you:    yes

claude: Done. Branch is ready to merge.

Typical Session (Full)

Full mode at high intensity on a Go security proxy:

you:    git checkout -b feature/localhost-inspection

you:    /cspec

claude: Reading ARCHITECTURE.md, antipatterns, drift debt...
        This feature touches trust boundary TB-001. Loading network-protocol
        and concurrency invariant templates.

        Questions (batched — you clearly know this domain):
        1. What's the adversary model for localhost inspection?
        2. How do you prevent routing loops when intercepting loopback traffic?
        3. What's the failure mode — fail-open or fail-closed?

you:    Adversary can send traffic with spoofed source IP. Loops prevented
        via SO_MARK on all outbound. Fail-closed — if inspection fails,
        drop the connection.

claude: [writes full spec with INV-001 through INV-006, PRH-001, STRIDE analysis]
        Recommending intensity: critical (touches trust boundary, security invariants)

you:    keep it at high

claude: Spec ready. Since formal_model is false, advancing to review-spec.

you:    /creview-spec

claude: [spawns 4 agents: red team, assumptions, testability, design contracts]
        Self-assessment agent: INV-003 (buffer containment) is hardest to test.

        Red Team: "INV-001 doesn't cover IPv6 loopback (::1). Add INV-007."
        Assumptions: "Spec assumes CAP_NET_ADMIN — not in ARCHITECTURE.md."
        Testability: "INV-003 needs a concrete test approach — propose PBT."
        Design Contracts: "New goroutine needs shutdown path per ABS-002."

        2 agents agree on the IPv6 gap. Accept?

you:    accept all findings

claude: Spec updated with INV-007 and INV-008. [advances to tdd-tests]

you:    /ctdd

claude: RED phase (test agent, Opus)...
        [writes tests for all 8 invariants + prohibition grep check]
        Tests fail as expected. [advances to impl]

        GREEN phase (impl agent, Sonnet)...
        [implements inspection, dialer integration, buffer lifecycle]
        All tests + race detector pass.
        Consider running /simplify before QA.

you:    /simplify

claude: [simplify agents clean up 3 issues]

you:    continue

claude: QA round 1 (third agent)...
        2 findings: 1 high (mutation: removing SO_MARK check doesn't fail a test),
        1 medium (error message includes buffer address).
        [fix round → QA round 2]
        QA round 2: clean. [advances to tdd-verify]

        Verification phase: tests pass, coverage 84% on new packages,
        all findings resolved, no test weakening detected.
        [advances to done]

you:    /cverify

claude: Invariant Coverage: 8/8 covered (2 weak — INV-003, INV-006)
        Mutation Testing: 10 mutations, 9 killed, 1 survivor on INV-006
        Prohibition: ✓ no direct net.Dial outside pkg/dialer/
        Cross-spec impact: detection-rules INV-002 references inspection buffer — still holds
        Complexity: 180 LOC actual vs 150 estimated (within budget)
        PASS with 1 finding (strengthen INV-006 test)

you:    /cupdate-arch

claude: Proposing TB-003: Loopback Interception boundary.
        [presents entry] Approve?

you:    yes

you:    /cdocs

claude: Updated AGENT_CONTEXT.md, created docs/features/localhost-inspection.md,
        updated docs/diagrams/data-flow.mermaid. Cleaned up state file.
        Ready to merge.

Later, before a release:

you:    git checkout -b audit/security-2026-04-15

you:    /caudit security

claude: Audit round 1: spawning 5 agents (encoding bypass, protocol abuse,
        config manipulation, detection bypass, regression hunter)...
        4 findings: 1 critical (race in buffer cleanup under concurrent close),
        2 high, 1 medium.

you:    fix all critical and high

claude: [fix round with TDD for each finding, auto-commits]
        Audit round 2: fresh agent team...
        0 critical, 0 high, 1 medium (same as round 1 — persistent).
        Converged. Writing regression tests for fixed findings.
        Updating antipatterns: AP-007 (buffer cleanup race).

Language Support

Language Test Runner Mutation Tool PBT Library
Go go test go-mutesting rapid
TypeScript jest/vitest Stryker fast-check
Python pytest mutmut hypothesis
Rust cargo test cargo-mutants proptest

Mutation testing and property-based testing are Full-only features. Lite works with any language that has a test runner.

Comparison

Lite Full
Skills 5 10
Spec format 5 sections, simple rules 12+ sections, typed invariants
Review Single-pass, one agent 4-agent adversarial team
TDD enforcement Hooks + agent separation Hooks + allowed-tools + agent separation
Formal modeling No Alloy (optional)
Threat analysis No STRIDE at high/critical intensity
Mutation testing No Deterministic tools + LLM fallback
Convergence audit No Multi-round with fresh agents
External model review No Configurable (Codex, Gemini CLIs)
Feedback loop Manual antipatterns Postmortem skill + meta-verification
Overhead per feature ~5 min ~15-30 min

Requirements

  • Claude Code CLI
  • A Claude Max subscription ($100/mo or $200/mo plan). Correctless spawns multiple agents per feature — a spec review alone can use 4+ parallel agents, and the TDD workflow spawns separate agents for test writing, implementation, and QA. The standard Pro plan will hit rate limits quickly. The $200/mo Max plan with higher rate limits is recommended, especially for Full mode.
  • A project with a test runner

Optional (Full only):

  • Alloy Analyzer for formal modeling
  • Mutation testing tool for your language
  • External model CLIs (Codex, Gemini) for cross-checking

What's in the Repo

correctless/
├── .claude-plugin/
│   └── marketplace.json              # Marketplace manifest (lists both plugins)
│
├── correctless-lite/                  # Lite plugin (install via /plugin install)
│   ├── .claude-plugin/plugin.json
│   ├── setup                          # Install script
│   ├── hooks/
│   │   ├── workflow-gate.sh           # PreToolUse hook — phase-based edit gating
│   │   └── workflow-advance.sh        # State machine
│   ├── skills/
│   │   ├── c-setup/SKILL.md          # /csetup — initialize project
│   │   ├── c-spec/SKILL.md           # /cspec — feature specification
│   │   ├── c-review/SKILL.md         # /creview — skeptical single-pass review
│   │   ├── c-tdd/SKILL.md            # /ctdd — enforced TDD with agent separation
│   │   ├── c-verify/SKILL.md         # /cverify — post-implementation verification
│   │   ├── c-docs/SKILL.md           # /cdocs — living documentation
│   │   └── c-status/SKILL.md         # /cstatus — show phase and next steps
│   └── templates/                     # Config and doc templates
│
├── correctless-full/                  # Full plugin (install via /plugin install)
│   ├── .claude-plugin/plugin.json
│   ├── setup                          # Install script (Full mode)
│   ├── hooks/                         # Same hooks, support both modes
│   ├── skills/
│   │   ├── c-setup/SKILL.md          # /csetup — initialize with intensity selection
│   │   ├── c-spec/SKILL.md           # /cspec — typed invariants, STRIDE, templates
│   │   ├── c-model/SKILL.md          # /cmodel — Alloy formal modeling
│   │   ├── c-review-spec/SKILL.md    # /creview-spec — multi-agent adversarial review
│   │   ├── c-tdd/SKILL.md            # /ctdd — TDD + mutation testing
│   │   ├── c-verify/SKILL.md         # /cverify — mutation testing, drift detection
│   │   ├── c-audit/SKILL.md          # /caudit — convergence-based auditing
│   │   ├── c-update-arch/SKILL.md    # /cupdate-arch — maintain ARCHITECTURE.md
│   │   ├── c-docs/SKILL.md           # /cdocs — living documentation
│   │   ├── c-postmortem/SKILL.md     # /cpostmortem — post-merge bug analysis
│   │   └── c-status/SKILL.md         # /cstatus — show phase and next steps
│   ├── templates/
│   │   └── invariants/                # 6 invariant templates for /cspec
│   └── helpers/                       # PBT library guides (Go, Python, TS, Rust)
│
├── hooks/                             # Source hooks (used by git-clone install)
├── skills/                            # Source skills (used by git-clone install)
├── templates/                         # Source templates
├── helpers/                           # Source PBT helpers
├── setup                              # Source setup script
├── README.md
├── correctless.md                     # Full spec
└── correctless-lite.md                # Lite spec

Status

Early release. Both Lite and Full implementations are functional. Setup, hooks, state machine, and all skill prompts are complete and tested. Real-world usage will surface rough edges — file issues as you find them.

License

MIT

About

Correctness-oriented development workflow skills for Claude Code

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Shell 100.0%