Codex Automata

The hard part of software was never the code.

A research-driven, spec-first, SDK-first, local-first development methodology for the agentic era. Copy one directory into any project and the methodology is active: research informs decisions, specifications first, SDK second, tests third, code last. Build for the smallest viable model, expand to frontier as needed. Documentation is the primary engineering artifact. The SDK is the constraint surface. Tests are the mold. Code is the casting.

Read the Manifesto | Browse the Playbook | MIT License

What Is This?

Codex Automata is a development methodology for the agentic era. It inverts the traditional pipeline to Research → Documentation → SDK → Tests → Code:

Research informs decisions. Agents investigate technologies, patterns, and the landscape before specifying.
Documentation comes first. Specify the system before implementing it.
SDK comes second. Express the architecture as compilable building blocks. The SDK is the constraint surface.
Tests come third. Derive tests from the specification, written against SDK interfaces. The tests are the mold.
Code comes last. Agents fill the mold within the SDK boundary. The code is the casting.

The methodology rests on ten core principles: Specification First, SDK as Constraint Surface, Local-First, Research as Foundation, Tests as Molds, Code as Casting, Modularity and Bounded Contexts, Continuous Flow, Quality Gates, and Intentional Divergence. Build local-first: design for the smallest viable model, expand to frontier as needed.

This repository contains the harness (the thing you copy into projects) and reference material (the methodology documentation you read).

Quickstart

Option 1: Init script (recommended)

Clone this repo somewhere permanent, then initialize any project:

# Clone once
git clone https://github.com/0xhackerfren/Codex-Automata.git D:\tools\Codex-Automata

# Initialize a new project
D:\tools\Codex-Automata\scripts\init.ps1 -TargetPath D:\projects\my-new-app

On Linux or macOS:

git clone https://github.com/0xhackerfren/Codex-Automata.git ~/tools/Codex-Automata
~/tools/Codex-Automata/scripts/init.sh ~/projects/my-new-app

Option 2: Manual copy

Copy the contents of harness/ into your project root:

xcopy /s /e /h harness\* D:\projects\my-new-app\

On Linux or macOS:

cp -r harness/. ~/projects/my-new-app/

What You Get

After initialization, your project contains:

my-project/
  AGENTS.md              Agent instructions (active in Cursor automatically)
  PLAYBOOK.md            Phase-by-phase methodology guide
  .cursor/               Cursor IDE rules, skills, subagents, hooks
  .github/               PR template, issue templates, CI workflow
  agent/                 Detailed agent operating rules
  templates/             22+ templates: spec, test plan, ADR, contract, task, review, gap assessment, design identity,
                         block registry, SDK design, deployment checklist, incident postmortem, retrospective, security audit,
                         brownfield audit, guardrail config, and more
                         Design identity: aesthetic direction, typography, color system, copy voice, accessibility commitments, anti-patterns (user-facing surfaces)
                         Block registry: project-level index of SDK building blocks
  docs/                  Empty, ready for your project specifications
  sdk/                   Empty, ready for SDK constraint surface (types, interfaces)
  tests/                 Empty, ready for test plans and test code
  tasks/                 Empty, ready for agent task definitions
  review/                Empty, ready for human review records
  src/                   Empty, ready for source code
  scripts/               Enforcement scripts (divergence gate, spec check, commit lint)

Open the project in Cursor IDE and the methodology enforces itself:

Rules (.cursor/rules/) load automatically into every agent session. They enforce spec-first development, test-before-code constraints, and interface contract discipline.
Skills (.cursor/skills/) provide ten guided workflows across all pipeline phases: /project-intake, /spec-writing, /sdk-design, /test-molding, /code-casting, /review, /product-testing, /recovery, /brownfield-onboarding, and /quick-change (abbreviated path for small changes within existing coverage).
Subagents (.cursor/agents/) handle six specialized tasks: spec review, test derivation, code casting, SDK design, product testing, and security audit. They run in isolated contexts and can work in parallel.
Hooks (.cursor/hooks.json) enforce methodology constraints: prompt hooks remind agents to check spec/test coverage, command hooks warn on hardcoded visual values and require human approval for SDK/contract changes.
Enforcement scripts (scripts/) run in CI or locally: divergence-gate scans for slop fingerprints, spec-check verifies spec-before-code, commit-lint validates commit traceability.
Guardrails classify agent actions into three tiers—AUTO (proceed autonomously), LOG (proceed with audit trail), and APPROVE (require human approval before execution)—complementing quality gates and human oversight.
AGENTS.md provides root-level instructions that any AI agent picks up automatically.

While .cursor/ provides Cursor-specific integration (rules, skills, subagents, hooks), the AGENTS.md file and agent/ directory work with any tool that reads agent instructions—Claude Code, GitHub Copilot, OpenAI Codex, Windsurf, and others.

First Steps After Init

Read PLAYBOOK.md for the phase-by-phase guide.
Copy templates/project-intake-template.md to docs/intake.md and fill it in.
Or type /project-intake in Cursor chat to start guided setup.
Follow the phases: Phase 0 Intake, Phase 1 Architecture, Phase 2 Specification, Phase 3 SDK Design, Phase 4 Test Molding, Phase 5 Code Casting, Phase 6 Review, Phase 6b Product Testing, Phase 7 Deployment. For bug fixes and small changes already covered by spec, SDK, and tests, use the quick-change workflow or /quick-change in Cursor.

Repository Structure

codex-automata/
|
|-- README.md                This file
|-- MANIFESTO.md             The full Codex Automata philosophy
|
|-- harness/                 THE HARNESS (copy into your projects)
|   |-- AGENTS.md            Root agent operating instructions
|   |-- PLAYBOOK.md          Phase-by-phase methodology guide
|   |-- .cursor/             Cursor IDE integration
|   |   |-- rules/           Auto-applied rules (.mdc)
|   |   |-- skills/          Invocable workflows (/skill-name)
|   |   |-- agents/          Custom subagent definitions
|   |   |-- hooks.json       Event-driven enforcement
|   |-- .github/             GitHub CI and templates
|   |-- agent/               Detailed agent operating rules (spec writing, SDK design, test molding, code casting, review)
|   |-- scripts/             Enforcement scripts (divergence gate, spec check, commit lint)
|   |-- templates/           22+ project templates (spec, test, ADR, contract, task, review, gap assessment, design identity,
|   |                        block registry, SDK design, deployment checklist, incident postmortem, retrospective, security audit,
|   |                        brownfield audit, guardrail config, and more)
|   |-- docs/                Empty project docs directory
|   |-- sdk/                 Empty SDK constraint surface directory
|   |-- tests/               Empty project tests directory
|   |-- tasks/               Empty agent tasks directory
|   |-- review/              Empty human review directory
|   |-- src/                 Empty source code directory
|
|-- reference/               METHODOLOGY DOCS (read, don't copy)
|   |-- principles.md        Ten core principles explained
|   |-- adoption-profiles.md Essential, Standard, and Complete adoption profiles
|   |-- workflow.md           End-to-end workflow reference
|   |-- architecture.md      Architecture patterns and guidance
|   |-- kanban.md             Flow-based project management
|   |-- agent-operating-model.md  How agents operate
|   |-- recovery.md           Recovery protocol for closing gaps
|   |-- product-testing.md    Agentic product testing reference
|   |-- quick-change.md       Abbreviated workflow for small changes within existing coverage
|   |-- multi-agent.md        Multi-agent orchestration and SDK-as-coordination-surface
|   |-- iteration.md          Iteration protocol: loops, budgets, and stop conditions
|   |-- brownfield-onboarding.md  Adopting the methodology on existing codebases
|   |-- guardrails.md         Three-tier agent action classification (AUTO/LOG/APPROVE)
|   |-- property-based-testing.md  Advanced molds: specification-derived invariants with PBT
|   |-- cost-awareness.md     Token budgets, model tiering, and cost optimization
|   |-- accessibility.md      WCAG constraints, accessibility molds, and testing profiles
|   |-- glossary.md           Terminology reference (18 reference documents)
|
|-- examples/                WORKED EXAMPLES (read, don't copy)
|   |-- task-manager/        Greenfield example: spec, tests, tasks, review, design identity, SDK types, context state, block registry
|   |-- brownfield-api/      Brownfield example: Express.js bookmarks API—audit, retroactive spec, SDK extraction, tests, gap assessment
|
|-- scripts/                 AUTOMATION
|   |-- init.ps1             PowerShell init script
|   |-- init.sh              Bash init script
|
|-- .github/                 CI and templates for THIS repo

Note: .github/ exists at both the repo root (for the Codex Automata project itself) and inside harness/ (the copy shipped into initialized projects).

The Core Pipeline

Research --> Documentation --> SDK --> Tests --> Code
              (docs)         (constraint)  (mold)    (casting)

Research grounds specifications and architectural decisions in evidence before documentation is written.
Specifications are the primary engineering artifact. They define what the system must do.
SDK is the constraint surface. It expresses the architecture as compilable building blocks that constrain all downstream work.
Tests are the mold. Derived from specifications and written against SDK interfaces, they constrain the shape of the implementation. Property-based testing extends molds with specification-derived invariants verified against random inputs (fast-check, Hypothesis, proptest).
Code is the casting. Agents pour implementation into the mold, within the SDK boundary, until all tests pass.

If the casting is defective, fix the mold. If the mold is wrong, fix the specification. If new building blocks are needed, extend the SDK through the specification pipeline. Do not debug the implementation directly.

Product Testing

Phase 6b runs after review and before deployment. After the code is assembled, AI agents verify the product by operating it as real users. Each agent receives a user profile (technical literacy, domain knowledge, constraints) and a goal-oriented objective ("as a first-time user, create an account and reach the dashboard"). The agent navigates the application through the UI, and the journey produces measurable signals:

Click count and navigation depth measure friction.
Backtracking and dead ends measure discoverability.
Error encounters measure input guidance quality.
Abandonment flags critical usability failures.

UX budgets set quantitative thresholds for each metric. Product tests run as quality gates in CI, staging, and production canaries. User profiles can include accessibility constraints (screen reader, keyboard-only, reduced motion) per accessibility guidance. See reference/product-testing.md for the full reference.

When Gaps Are Discovered

Real projects discover gaps after code exists: a production incident exposes an unspecified failure mode, a review reveals missing tests, or a new team member finds a module with no contract tests. Codex Automata defines a formal recovery protocol for these situations.

Recovery follows the same pipeline as forward work (Research → Documentation → SDK → Tests → Code), applied retroactively:

Audit the gap using templates/gap-assessment-template.md.
Patch the spec from domain knowledge (not from the existing code).
Patch the SDK if the constraint surface lacks types for the affected behavior.
Patch the mold by deriving tests from the specification against SDK interfaces.
Recast the implementation if the new tests fail.
Re-review the complete recovery unit.

Recovery tasks are first-class kanban work items, not invisible tech debt. See reference/recovery.md for the full protocol.

How Agents Operate

Agents working in a Codex Automata project follow strict rules (enforced by .cursor/rules/ and AGENTS.md):

Do not write tests or implementation before the specification and SDK exist.
Do not introduce abstractions outside the SDK constraint surface.
Keep context lean. Do not assume frontier capabilities or unlimited context windows.
Do not expand scope without updating the specification.
Do not silently change interface contracts or SDK interfaces.
Do not bypass failing tests.
Prefer small, atomic commits traceable to specification sections.
Surface ambiguity instead of guessing.

See agent/AGENT_RULES.md (inside the harness) for the complete operating manual.

How Humans Operate

Humans own the upstream work:

Architecture and decomposition. Break the system into bounded contexts.
Specification writing. Define what the system must do, precisely. For user-facing surfaces, the design identity captures aesthetic direction and accessibility commitments (WCAG as specification constraint).
SDK design. Express the architecture as compilable building blocks.
Test design. Derive the mold from the specification against SDK interfaces.
Review. Verify castings match the mold, SDK, and intent.
Product testing. Define user profiles, objectives, and UX budgets; agents exercise the assembled product as real users (Phase 6b).
Deployment decisions. Ship and observe.

Adoption Profiles

Teams can adopt Codex Automata at different depths. Three profiles—Essential, Standard, and Complete—match team size and project scale. All profiles include the quick-change workflow for bug fixes and small changes within existing spec, SDK, and test coverage. See reference/adoption-profiles.md for guidance on which profile fits your situation.

Read MANIFESTO.md for the full philosophy, or browse reference/ (18 documents) for detailed methodology documentation—including multi-agent orchestration, the iteration protocol, agent guardrails, and brownfield onboarding.

License

MIT. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.github		.github
examples		examples
harness		harness
reference		reference
scripts		scripts
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFESTO.md		MANIFESTO.md
README.md		README.md
ROADMAP.md		ROADMAP.md
SECURITY.md		SECURITY.md
VERSION		VERSION

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Codex Automata

What Is This?

Quickstart

Option 1: Init script (recommended)

Option 2: Manual copy

What You Get

First Steps After Init

Repository Structure

The Core Pipeline

Product Testing

When Gaps Are Discovered

How Agents Operate

How Humans Operate

Adoption Profiles

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Codex Automata

What Is This?

Quickstart

Option 1: Init script (recommended)

Option 2: Manual copy

What You Get

First Steps After Init

Repository Structure

The Core Pipeline

Product Testing

When Gaps Are Discovered

How Agents Operate

How Humans Operate

Adoption Profiles

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages