The hard part of software was never the code.
A research-driven, spec-first, SDK-first, local-first development methodology for the agentic era. Copy one directory into any project and the methodology is active: research informs decisions, specifications first, SDK second, tests third, code last. Build for the smallest viable model, expand to frontier as needed. Documentation is the primary engineering artifact. The SDK is the constraint surface. Tests are the mold. Code is the casting.
Read the Manifesto | Browse the Playbook | MIT License
Codex Automata is a development methodology for the agentic era. It inverts the traditional pipeline to Research → Documentation → SDK → Tests → Code:
- Research informs decisions. Agents investigate technologies, patterns, and the landscape before specifying.
- Documentation comes first. Specify the system before implementing it.
- SDK comes second. Express the architecture as compilable building blocks. The SDK is the constraint surface.
- Tests come third. Derive tests from the specification, written against SDK interfaces. The tests are the mold.
- Code comes last. Agents fill the mold within the SDK boundary. The code is the casting.
The methodology rests on ten core principles: Specification First, SDK as Constraint Surface, Local-First, Research as Foundation, Tests as Molds, Code as Casting, Modularity and Bounded Contexts, Continuous Flow, Quality Gates, and Intentional Divergence. Build local-first: design for the smallest viable model, expand to frontier as needed.
This repository contains the harness (the thing you copy into projects) and reference material (the methodology documentation you read).
Clone this repo somewhere permanent, then initialize any project:
# Clone once
git clone https://github.com/0xhackerfren/Codex-Automata.git D:\tools\Codex-Automata
# Initialize a new project
D:\tools\Codex-Automata\scripts\init.ps1 -TargetPath D:\projects\my-new-appOn Linux or macOS:
git clone https://github.com/0xhackerfren/Codex-Automata.git ~/tools/Codex-Automata
~/tools/Codex-Automata/scripts/init.sh ~/projects/my-new-appCopy the contents of harness/ into your project root:
xcopy /s /e /h harness\* D:\projects\my-new-app\On Linux or macOS:
cp -r harness/. ~/projects/my-new-app/After initialization, your project contains:
my-project/
AGENTS.md Agent instructions (active in Cursor automatically)
PLAYBOOK.md Phase-by-phase methodology guide
.cursor/ Cursor IDE rules, skills, subagents, hooks
.github/ PR template, issue templates, CI workflow
agent/ Detailed agent operating rules
templates/ 22+ templates: spec, test plan, ADR, contract, task, review, gap assessment, design identity,
block registry, SDK design, deployment checklist, incident postmortem, retrospective, security audit,
brownfield audit, guardrail config, and more
Design identity: aesthetic direction, typography, color system, copy voice, accessibility commitments, anti-patterns (user-facing surfaces)
Block registry: project-level index of SDK building blocks
docs/ Empty, ready for your project specifications
sdk/ Empty, ready for SDK constraint surface (types, interfaces)
tests/ Empty, ready for test plans and test code
tasks/ Empty, ready for agent task definitions
review/ Empty, ready for human review records
src/ Empty, ready for source code
scripts/ Enforcement scripts (divergence gate, spec check, commit lint)
Open the project in Cursor IDE and the methodology enforces itself:
- Rules (
.cursor/rules/) load automatically into every agent session. They enforce spec-first development, test-before-code constraints, and interface contract discipline. - Skills (
.cursor/skills/) provide ten guided workflows across all pipeline phases:/project-intake,/spec-writing,/sdk-design,/test-molding,/code-casting,/review,/product-testing,/recovery,/brownfield-onboarding, and/quick-change(abbreviated path for small changes within existing coverage). - Subagents (
.cursor/agents/) handle six specialized tasks: spec review, test derivation, code casting, SDK design, product testing, and security audit. They run in isolated contexts and can work in parallel. - Hooks (
.cursor/hooks.json) enforce methodology constraints: prompt hooks remind agents to check spec/test coverage, command hooks warn on hardcoded visual values and require human approval for SDK/contract changes. - Enforcement scripts (
scripts/) run in CI or locally:divergence-gatescans for slop fingerprints,spec-checkverifies spec-before-code,commit-lintvalidates commit traceability. - Guardrails classify agent actions into three tiers—AUTO (proceed autonomously), LOG (proceed with audit trail), and APPROVE (require human approval before execution)—complementing quality gates and human oversight.
- AGENTS.md provides root-level instructions that any AI agent picks up automatically.
While .cursor/ provides Cursor-specific integration (rules, skills, subagents, hooks), the AGENTS.md file and agent/ directory work with any tool that reads agent instructions—Claude Code, GitHub Copilot, OpenAI Codex, Windsurf, and others.
- Read
PLAYBOOK.mdfor the phase-by-phase guide. - Copy
templates/project-intake-template.mdtodocs/intake.mdand fill it in. - Or type
/project-intakein Cursor chat to start guided setup. - Follow the phases: Phase 0 Intake, Phase 1 Architecture, Phase 2 Specification, Phase 3 SDK Design, Phase 4 Test Molding, Phase 5 Code Casting, Phase 6 Review, Phase 6b Product Testing, Phase 7 Deployment. For bug fixes and small changes already covered by spec, SDK, and tests, use the quick-change workflow or
/quick-changein Cursor.
codex-automata/
|
|-- README.md This file
|-- MANIFESTO.md The full Codex Automata philosophy
|
|-- harness/ THE HARNESS (copy into your projects)
| |-- AGENTS.md Root agent operating instructions
| |-- PLAYBOOK.md Phase-by-phase methodology guide
| |-- .cursor/ Cursor IDE integration
| | |-- rules/ Auto-applied rules (.mdc)
| | |-- skills/ Invocable workflows (/skill-name)
| | |-- agents/ Custom subagent definitions
| | |-- hooks.json Event-driven enforcement
| |-- .github/ GitHub CI and templates
| |-- agent/ Detailed agent operating rules (spec writing, SDK design, test molding, code casting, review)
| |-- scripts/ Enforcement scripts (divergence gate, spec check, commit lint)
| |-- templates/ 22+ project templates (spec, test, ADR, contract, task, review, gap assessment, design identity,
| | block registry, SDK design, deployment checklist, incident postmortem, retrospective, security audit,
| | brownfield audit, guardrail config, and more)
| |-- docs/ Empty project docs directory
| |-- sdk/ Empty SDK constraint surface directory
| |-- tests/ Empty project tests directory
| |-- tasks/ Empty agent tasks directory
| |-- review/ Empty human review directory
| |-- src/ Empty source code directory
|
|-- reference/ METHODOLOGY DOCS (read, don't copy)
| |-- principles.md Ten core principles explained
| |-- adoption-profiles.md Essential, Standard, and Complete adoption profiles
| |-- workflow.md End-to-end workflow reference
| |-- architecture.md Architecture patterns and guidance
| |-- kanban.md Flow-based project management
| |-- agent-operating-model.md How agents operate
| |-- recovery.md Recovery protocol for closing gaps
| |-- product-testing.md Agentic product testing reference
| |-- quick-change.md Abbreviated workflow for small changes within existing coverage
| |-- multi-agent.md Multi-agent orchestration and SDK-as-coordination-surface
| |-- iteration.md Iteration protocol: loops, budgets, and stop conditions
| |-- brownfield-onboarding.md Adopting the methodology on existing codebases
| |-- guardrails.md Three-tier agent action classification (AUTO/LOG/APPROVE)
| |-- property-based-testing.md Advanced molds: specification-derived invariants with PBT
| |-- cost-awareness.md Token budgets, model tiering, and cost optimization
| |-- accessibility.md WCAG constraints, accessibility molds, and testing profiles
| |-- glossary.md Terminology reference (18 reference documents)
|
|-- examples/ WORKED EXAMPLES (read, don't copy)
| |-- task-manager/ Greenfield example: spec, tests, tasks, review, design identity, SDK types, context state, block registry
| |-- brownfield-api/ Brownfield example: Express.js bookmarks API—audit, retroactive spec, SDK extraction, tests, gap assessment
|
|-- scripts/ AUTOMATION
| |-- init.ps1 PowerShell init script
| |-- init.sh Bash init script
|
|-- .github/ CI and templates for THIS repo
Note: .github/ exists at both the repo root (for the Codex Automata project itself) and inside harness/ (the copy shipped into initialized projects).
Research --> Documentation --> SDK --> Tests --> Code
(docs) (constraint) (mold) (casting)
- Research grounds specifications and architectural decisions in evidence before documentation is written.
- Specifications are the primary engineering artifact. They define what the system must do.
- SDK is the constraint surface. It expresses the architecture as compilable building blocks that constrain all downstream work.
- Tests are the mold. Derived from specifications and written against SDK interfaces, they constrain the shape of the implementation. Property-based testing extends molds with specification-derived invariants verified against random inputs (fast-check, Hypothesis, proptest).
- Code is the casting. Agents pour implementation into the mold, within the SDK boundary, until all tests pass.
If the casting is defective, fix the mold. If the mold is wrong, fix the specification. If new building blocks are needed, extend the SDK through the specification pipeline. Do not debug the implementation directly.
Phase 6b runs after review and before deployment. After the code is assembled, AI agents verify the product by operating it as real users. Each agent receives a user profile (technical literacy, domain knowledge, constraints) and a goal-oriented objective ("as a first-time user, create an account and reach the dashboard"). The agent navigates the application through the UI, and the journey produces measurable signals:
- Click count and navigation depth measure friction.
- Backtracking and dead ends measure discoverability.
- Error encounters measure input guidance quality.
- Abandonment flags critical usability failures.
UX budgets set quantitative thresholds for each metric. Product tests run as quality gates in CI, staging, and production canaries. User profiles can include accessibility constraints (screen reader, keyboard-only, reduced motion) per accessibility guidance. See reference/product-testing.md for the full reference.
Real projects discover gaps after code exists: a production incident exposes an unspecified failure mode, a review reveals missing tests, or a new team member finds a module with no contract tests. Codex Automata defines a formal recovery protocol for these situations.
Recovery follows the same pipeline as forward work (Research → Documentation → SDK → Tests → Code), applied retroactively:
- Audit the gap using
templates/gap-assessment-template.md. - Patch the spec from domain knowledge (not from the existing code).
- Patch the SDK if the constraint surface lacks types for the affected behavior.
- Patch the mold by deriving tests from the specification against SDK interfaces.
- Recast the implementation if the new tests fail.
- Re-review the complete recovery unit.
Recovery tasks are first-class kanban work items, not invisible tech debt. See reference/recovery.md for the full protocol.
Agents working in a Codex Automata project follow strict rules (enforced by .cursor/rules/ and AGENTS.md):
- Do not write tests or implementation before the specification and SDK exist.
- Do not introduce abstractions outside the SDK constraint surface.
- Keep context lean. Do not assume frontier capabilities or unlimited context windows.
- Do not expand scope without updating the specification.
- Do not silently change interface contracts or SDK interfaces.
- Do not bypass failing tests.
- Prefer small, atomic commits traceable to specification sections.
- Surface ambiguity instead of guessing.
See agent/AGENT_RULES.md (inside the harness) for the complete operating manual.
Humans own the upstream work:
- Architecture and decomposition. Break the system into bounded contexts.
- Specification writing. Define what the system must do, precisely. For user-facing surfaces, the design identity captures aesthetic direction and accessibility commitments (WCAG as specification constraint).
- SDK design. Express the architecture as compilable building blocks.
- Test design. Derive the mold from the specification against SDK interfaces.
- Review. Verify castings match the mold, SDK, and intent.
- Product testing. Define user profiles, objectives, and UX budgets; agents exercise the assembled product as real users (Phase 6b).
- Deployment decisions. Ship and observe.
Teams can adopt Codex Automata at different depths. Three profiles—Essential, Standard, and Complete—match team size and project scale. All profiles include the quick-change workflow for bug fixes and small changes within existing spec, SDK, and test coverage. See reference/adoption-profiles.md for guidance on which profile fits your situation.
Read MANIFESTO.md for the full philosophy, or browse reference/ (18 documents) for detailed methodology documentation—including multi-agent orchestration, the iteration protocol, agent guardrails, and brownfield onboarding.
MIT. See LICENSE.