Skip to content

joshft/correctless

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

151 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Correctless

OpenSSF Scorecard CI License: MIT Skills: 28 Version: 3.0.0

Composable Claude Code skills that enforce a correctness-oriented development workflow. Spec before you code. Test before you implement. Never let an agent grade its own work.

Built with Correctless.

The Problem

AI coding assistants are fast but sloppy. They write code that works for the happy path, skip edge cases, and silently introduce bugs that don't surface until production. The same model that wrote the code will review it and say "looks good" — because it's confirming its own decisions.

Correctless fixes this by structuring the workflow so that every phase is executed by a different agent with a different lens:

  • The spec agent asks "what does correct mean?" and researches current best practices before any code exists
  • The review agent reads the spec cold and checks for security gaps, unstated assumptions, and untestable rules
  • The test agent writes tests from the spec without knowing the implementation plan
  • The test auditor checks whether those tests would actually catch bugs or just pass against mocks
  • The implementation agent makes the tests pass without having written them
  • The QA agent hunts for bugs with neither the test author's nor the implementer's blind spots
  • The verification agent checks spec-to-code correspondence without insider knowledge

Same model — but the framing determines what the agent finds.

Quick Start

You need Claude Code and a Claude Max subscription ($100-200/mo).

Install

/plugin marketplace add joshft/correctless
/plugin install correctless
/csetup
Alternative: Git clone
git clone https://github.com/joshft/correctless.git .claude/skills/workflow
.claude/skills/workflow/setup
/csetup

Standard intensity by default. To increase: add "intensity": "high" or "critical" to the workflow section of .correctless/config/workflow-config.json.

First Feature

git checkout -b feature/my-feature
/cspec

Update

/plugin uninstall correctless
/plugin marketplace remove correctless
/plugin marketplace add joshft/correctless
/plugin install correctless

Then restart Claude Code. Git clone users: cd .claude/skills/workflow && git pull && ./setup

One Plugin, Three Intensity Levels

Correctless ships as a single plugin with 28 skills. You choose the intensity that matches your project's risk profile. Seven skills are gated behind intensity thresholds — they check your project's workflow.intensity setting and warn if invoked below their minimum.

Intensity Overhead What You Get Best For
standard ~10-15 min 19 core skills: spec, review, TDD, verify, docs, debug, refactor, release SaaS, APIs, CLI tools, content sites
high ~30-60 min + adversarial spec review, convergence auditing, architecture tracking Auth, payments, sensitive data
critical ~1-2 hours + Alloy formal modeling, live red team assessment Security infrastructure, crypto, proxies

Skills like /cpostmortem and /cdevadv are available at all intensity levels — they're about learning from the past, not adding rigor to the present.

Put another way: Standard intensity is like having someone next to you going through a checklist to make sure your project has some sanity. Critical intensity is like taking your Claude Max subscription tokens, setting them on fire, collecting the ash, and using it to create a tiny diamond.

How It Works

The Standard Workflow

graph LR
    A["/cspec<br/>Write spec"] --> B["/creview<br/>Skeptical review"]
    B --> C["/ctdd"]
    C --> D["/cverify<br/>Rule coverage"]
    D --> E["/cdocs<br/>Update docs"]
    E --> F["Merge"]

    subgraph "/ctdd — Enforced TDD"
        direction LR
        C1["RED<br/>Write tests"] --> C2["Test Audit<br/>Would tests catch bugs?"]
        C2 --> C3["GREEN<br/>Implement"]
        C3 --> C4["/simplify"]
        C4 --> C5["QA<br/>Hostile review"]
        C5 -.->|"Issues found"| C3
    end

    C --- C1

    style A fill:#339af0,color:#fff
    style B fill:#e599f7,color:#000
    style C1 fill:#ff6b6b,color:#fff
    style C3 fill:#51cf66,color:#fff
    style C5 fill:#ffd43b,color:#000
    style F fill:#51cf66,color:#fff
Loading

Each box is a separate agent. The test writer doesn't know the implementation plan. The QA agent didn't write the tests. A PreToolUse hook blocks source code edits until tests exist — this isn't a suggestion, it's enforced by bash. See the Standard Workflow Guide for state machine diagrams, hook architecture, and phase gating details.

The Critical Workflow

graph LR
    A["/cspec<br/>Typed invariants"] --> B["/cmodel<br/>Alloy modeling"]
    B --> C["/creview-spec<br/>5-agent adversarial"]
    C --> D["/ctdd<br/>+ mutation testing"]
    D --> E["/cverify<br/>+ drift detection"]
    E --> F["/cupdate-arch"]
    F --> G["/cdocs"]
    G --> H["/caudit<br/>Olympics"]

    style B fill:#ff922b,color:#fff
    style C fill:#e599f7,color:#000
    style H fill:#ff922b,color:#fff
Loading

Intensity Detection

You don't have to pick intensity manually for every feature. /cspec evaluates signals in your spec — file paths touching auth/payments, STRIDE threat keywords, compliance references, antipattern history — and recommends the right intensity. You confirm or override.

graph LR
    A["Feature request"] --> B["/cspec"]
    B --> C{"Intensity<br/>detection"}
    C -->|"CRUD endpoint"| D["standard"]
    C -->|"Auth + payments"| E["high"]
    C -->|"Crypto + HIPAA"| F["critical"]
    D --> G["Review + TDD"]
    E --> H["+ /caudit, /creview-spec"]
    F --> I["+ /cmodel, /credteam"]

    style D fill:#51cf66,color:#fff
    style E fill:#ffd43b,color:#000
    style F fill:#ff6b6b,color:#fff
Loading

Defense in Depth

Prompt-level instructions fade as context fills — enforcement that depends on the model is a suggestion. Correctless uses four independent layers:

graph TD
    A["Agent wants to<br/>edit source file<br/>during QA phase"] --> B["Layer 1: Gate<br/>PreToolUse hook"]
    B -->|"BLOCK"| C["Edit prevented<br/>Model can't bypass bash"]
    B -->|"ALLOW<br/>(Bash slip-through)"| D["File modified"]
    D --> E["Layer 2: Audit Trail<br/>PostToolUse hook"]
    E --> F["Logged with phase context<br/>Alert shown to user"]
    F --> G["/cwtf reads trail<br/>reports deviations"]

    P["Layer 3: Path-scoped rules<br/>(higher-adherence advisory)"] -.->|"Loaded when a<br/>scoped file is opened"| A
    H["Layer 4: Skill Instructions<br/>(prompt-level, advisory)"] -.->|"Subject to<br/>context fade"| A

    style B fill:#ff6b6b,color:#fff
    style C fill:#ff6b6b,color:#fff
    style E fill:#ffd43b,color:#000
    style P fill:#ffa94d,color:#000
    style H fill:#dee2e6,color:#000
Loading

Fresh agents per phase add resilience: each phase spawns a new agent at 0% context with fresh instructions via context: fork. A QA agent at 0% follows hostile-lens instructions perfectly. A single agent at 70% may have forgotten it was supposed to be hostile.

The Compounding Effect

Escaped bugs become antipatterns. Antipatterns become spec rules. Spec rules become tests. Six months in, the workflow knows your project's failure modes better than any individual developer.

graph LR
    A["Bug escapes"] --> B["/cpostmortem"]
    B --> C["antipatterns.md<br/>(class fix)"]
    B --> D["CLAUDE.md<br/>(learning)"]
    C --> E["/cspec reads<br/>antipatterns"]
    D --> F["Every future<br/>session loads"]
    E --> G["Feature N+1<br/>prevents<br/>same bug class"]
    F --> G

    style B fill:#ff922b,color:#fff
    style G fill:#51cf66,color:#fff
Loading

Skills

Core Workflow

Skill When to Use What It Does
/csetup First run, or re-run for health check 19-point health check, convention mining, project scaffolding
/cspec Starting a new feature Testable rules with research agent, intensity detection
/creview After /cspec Skeptical review + OWASP security checklist
/ctdd After review approves spec RED, test audit, GREEN, /simplify, QA — all enforced
/cverify After /ctdd completes Spec-to-code verification, drift detection
/cdocs After /cverify Update README, AGENT_CONTEXT, ARCHITECTURE, feature docs
/carchitect New project or missing architecture doc Structured ARCHITECTURE.md — reverse-engineer or greenfield, entrypoints YAML
/cauto After spec review approved Orchestrate full pipeline: TDD, verify, docs, PR — with flexible phase resume
/crelease Ready to tag a version Version bump, changelog, sanity checks, annotated tag

Code Quality

Skill When to Use What It Does
/cquick Small, well-understood changes TDD without the ceremony — scope-guarded at 50 LOC / 3 files
/crefactor Restructuring without behavior change Characterization tests, behavioral equivalence, agent separation
/cdebug Stuck on a bug Root cause, hypothesis, git bisect, TDD fix, class fix
/cpr-review Reviewing an incoming PR Architecture, security, tests, antipatterns, dep bumps

Open Source

Skill When to Use What It Does
/ccontribute Contributing to another project Learn conventions first, match patterns, pre-flight, generate PR
/cmaintain Reviewing a contribution Scope check, conventions, maintenance burden, pre-written comments

Observability

Skill When to Use What It Does
/cstatus Anytime Current phase, next steps, problem detection
/chelp Need a quick reference Workflow pipeline, all commands
/csummary After a feature or mid-feature What the workflow caught, by phase
/cmetrics Monthly or for ROI analysis Token cost, bugs caught, session analytics, trends
/cwtf Suspect agents took shortcuts Did agents actually follow instructions?
/cexplain Onboarding or exploring a codebase Guided mermaid diagrams, prose walkthroughs, HTML export

Analysis

Skill When to Use What It Does
/cpostmortem After a bug escapes Trace which phase missed it, add antipattern + class fix
/cdevadv Periodic deep analysis Devil's advocate — challenge architecture and strategy

Intensity-Gated

Skill Min Intensity What It Does
/caudit high Olympics convergence audit (QA / Hacker / Performance presets)
/creview-spec high 5-agent adversarial spec review
/cupdate-arch high Keep ARCHITECTURE.md current after features land
/cmodel critical Alloy formal modeling for state machines and protocols
/credteam critical Live adversarial red team with source code access

Platform Integration

Correctless hooks into Claude Code's infrastructure for real-time feedback and long-term learning. All features below are automatic after /csetup.

Hooks

graph TB
    subgraph "Claude Code Hooks"
        A["PreToolUse"] --> H["sensitive-file-guard.sh<br/>Secret protection"]
        A --> B["workflow-gate.sh<br/>Phase enforcement"]
        C["PostToolUse"] --> D["audit-trail.sh<br/>Adherence feedback"]
        C --> G["auto-format.sh<br/>Auto-formatting"]
        E["Statusline"] --> F["statusline.sh<br/>Phase + cost + context"]
    end

    H -->|"block/allow"| M[".env, keys, credentials"]
    B -->|"block/allow"| I["Every file edit"]
    D -->|"alerts"| J["Real-time violations"]
    G -->|"formats"| L["Edited files"]
    F -->|"live display"| K["Always visible"]

    style H fill:#e64980,color:#fff
    style B fill:#ff6b6b,color:#fff
    style D fill:#ffd43b,color:#000
    style G fill:#74c0fc,color:#000
    style F fill:#51cf66,color:#fff
Loading
Hook Runs Purpose
sensitive-file-guard.sh Before every file edit Blocks writes to .env, credentials, keys, certificates — fail-closed, no overrides
workflow-gate.sh Before every file edit Blocks writes that violate the current phase (RED blocks source, QA blocks everything)
audit-trail.sh After every tool call Logs modifications with phase context, alerts on violations
token-tracking.sh After Agent tool completion Logs subagent token usage, cost, and duration to JSONL for /cmetrics analysis
auto-format.sh After Edit/Write/MultiEdit Runs project formatter (Prettier, Black, gofmt, etc.) with allowlist validation
statusline.sh Continuously Shows phase, QA round, cost, context %, lines delta
workflow-advance.sh On command State machine — validates transitions, enforces gates

Statusline

The statusline shows your workflow state at a glance:

project/  feature/auth  Opus  34%  RED  QA:R0  $0.42  +87/-12

Phase (color-coded), QA round count, session cost, feature cost (background-cached), lines delta, context usage with warnings at 70%.

Real-Time Adherence Feedback

The audit trail hook monitors every modification and alerts immediately:

  • tdd-qa: Source file modified — middleware.ts (this phase should be read-only)
  • GREEN: Test file edited — auth.test.ts (logged in test-edit-log)
  • QA: Read middleware.ts (3 of 7 modified files reviewed) (high+ intensity)

Session Analytics

/cmetrics reads Claude Code's session data for exact token costs, outcome rates, and a Correctless vs Freeform comparison table.

Compounding Learning

Postmortem findings, conventions, and audit learnings append to CLAUDE.md and load into every future session automatically. The spec agent just knows that "auth features in this project need middleware ordering checks" without being told.

Git Integration (opt-in)

  • Git trailers in commit messages: Spec:, Rules-covered:, Verified-by:
  • Git notes attaching verification summaries to commits
  • Git bisect in /cdebug for automated regression finding

MCP Servers (opt-in)

/csetup offers to configure two MCP servers that improve analysis across all skills:

  • Serena — symbol-level code queries (call graphs, references, symbol lookup). 15 skills use it for precise analysis with 40-60% token savings on larger projects.
  • Context7 — current library documentation on demand. The /cspec research agent gets real docs for real versions.

Both are free, open source, run locally, and fall back silently when unavailable.

Output Redaction

External-facing skills (/cpr-review, /ccontribute, /cmaintain) automatically redact paths, credentials, hostnames, and session IDs before posting.

Project Health Check

/csetup runs 19 checks across 6 categories on first run:

Category Checks
Security Hardcoded secrets, API keys in source, .env committed, missing .gitignore patterns
Code Quality Linter configured, formatter configured, dependency audit
Testing Test runner detected, test files exist, coverage configured
CI/CD CI pipeline exists, runs tests, runs linter
Documentation README exists, ARCHITECTURE.md, CONTRIBUTING.md
Git Hygiene .gitignore present, no large binaries, branch protection

For existing projects, setup also mines your codebase for conventions (commit message style, test patterns, import ordering) and bootstraps architecture documentation.

State Management

Check your workflow with /cstatus or the statusline. For advanced debugging:

.correctless/hooks/workflow-advance.sh diagnose "file" # Why a file is blocked
.correctless/hooks/workflow-advance.sh override "why"  # Temporary gate bypass (10 tool calls)
.correctless/hooks/workflow-advance.sh spec-update "why"  # Spec was wrong mid-TDD
.correctless/hooks/workflow-advance.sh reset           # Remove all state for current branch

Quick Fixes During an Active Workflow

If the gate is blocking a typo fix:

.correctless/hooks/workflow-advance.sh override "quick bugfix: fixing typo in error message"

Bypasses the gate for 10 tool calls. When no workflow is active, the gate allows all edits freely.

Language Support

Language Test Runner Mutation Tool PBT Library
Go go test go-mutesting rapid
TypeScript jest/vitest Stryker fast-check
Python pytest mutmut hypothesis
Rust cargo test cargo-mutants proptest

Mutation testing and PBT helpers are available at high+ intensity. Standard intensity works with any language that has a test runner.

Requirements

  • Claude Code CLI
  • A Claude Max subscription ($100/mo or $200/mo plan). Correctless spawns multiple agents per feature — $200/mo is recommended at high+ intensity.
  • A project with a test runner
  • jq (JSON processor) — required for all hooks. Install: brew install jq (macOS), apt install jq (Ubuntu)
  • Bash 4+ — required for hooks. macOS ships Bash 3.2 by default; install modern bash: brew install bash

Optional (high+/critical):

  • Alloy Analyzer for formal modeling
  • Mutation testing tool for your language
  • Isolated environment (Docker/VPS) for red team assessments

Good to Know

Opt-in per feature. Correctless is passive when no workflow is active. Start a workflow with /cspec on a feature branch — skip it on branches where you don't need it. All normal Claude Code behavior is unchanged outside active workflows.

CI unchanged. Correctless runs entirely inside Claude Code sessions via local hooks. It does not modify your CI pipeline, add CI steps, or require any CI changes.

After merge. Delete the feature branch and start fresh with a new branch + /cspec. Workflow state files in .correctless/artifacts/ are branch-scoped and harmless — they provide history for /cmetrics and /csummary.

Uninstall

To fully remove Correctless from a project:

# Remove the plugin
/plugin uninstall correctless

# Clean up project files
rm -rf .correctless/

# Remove hook entries from .claude/settings.json (edit manually or delete the file)
# Remove the "## Correctless" section from CLAUDE.md if present

Glossary

Term Meaning
Agent separation Each workflow phase runs in a fresh Claude session. The test writer doesn't know the implementation plan; the QA agent didn't write the tests. Prevents confirmation bias.
Instance fix Fix the one bug here and now.
Class fix Fix the entire category of this bug — add a structural test that prevents recurrence.
Convergence Run multiple audit rounds until findings stabilize (no new critical/high issues).
Drift Code that no longer matches documented architecture. Detected by /cverify, tracked in drift-debt.json.
Antipattern A known bug class from your project's history. Stored in .correctless/antipatterns.md, checked by every future spec and review.
Spec A document defining what "correct" means for a feature: testable rules, edge cases, security assumptions. A spec that can't be tested is incomplete.
Invariant A rule that must always be true: "auth tokens expire after 24 hours." Specs are lists of invariants.
Intensity The configured rigor level: standard, high, or critical. Higher intensity unlocks more skills but costs more tokens and time.
Mini-audit After QA clears, four adversarial specialist agents (cross-component interaction, hostile input, resource bounds, upgrade compatibility) hunt for issues the QA lens misses. Runs at the end of /ctdd.
Mutation testing Introduce small bugs into code and check if tests catch them. If a test passes with a mutation, that test is weak. Available at high+ intensity.
STRIDE Threat modeling framework: Spoofing, Tampering, Repudiation, Information Disclosure, Denial of Service, Elevation of Privilege.
RED / GREEN TDD phases. RED = write tests that fail. GREEN = write code to make tests pass.

Status

Correctless 3.0.0 — Early release. 28 skills, 3 intensity levels, ~5,000 automated tests, 8 hooks. Real-world usage ongoing — file issues as you find them.

License

MIT

About

The agent that writes the code never reviews it. Spec-driven TDD with agent separation, adversarial QA, and dynamic rigor for Claude Code.

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages