Claude Code Toolkit is a complete agent-driven workflow system for Claude Code and Codex. It gives Claude and Codex domain-specific expertise, step-by-step workflows, and automated quality gates. The result is Claude working like a team of Go, Python, Kubernetes, review, and content specialists instead of one generalist.
It starts from the moment you type a request. You don't pick agents, configure workflows, or learn any internal concepts. You just say what you want done.
/do debug this Go test
In Claude Code, the smart router command is /do. In Codex, use $do.
A router reads your intent and selects a Go specialist agent paired with a systematic debugging methodology. The agent creates a branch, gathers evidence before guessing, runs through phased diagnosis, applies a fix, executes tests, reviews its own work, and presents a PR. You describe the problem. The system handles everything else.
This works because the toolkit separates what you know from what the system knows. Agents carry domain expertise (Go idioms, Python conventions, Kubernetes patterns). Skills enforce process methodology (TDD cycles, debugging phases, review waves). Hooks automate quality gates that fire on lifecycle events. Scripts handle deterministic operations where you want predictable output, not LLM judgment. The router ties it all together, classifying requests, selecting the right combination, and dispatching.
The result: consistent, domain-specific output across Go, Python, TypeScript, infrastructure, content, and more. No configuration required. A first-time user and a power user get the same quality results. The power user just understands why.
A game built entirely by Claude Code using these agents, skills, and pipelines. Every step from design through implementation.
Kapture.2026-03-25.at.07.40.47.mp4
Requires Claude Code installed and working (claude --version should print a version number). Codex CLI is also supported: the installer mirrors toolkit skills into ~/.codex/skills and toolkit agents into ~/.codex/agents so Codex can use the same skill and agent library (codex --version should print a version number if you want Codex support too).
git clone https://github.com/notque/claude-code-toolkit.git ~/claude-code-toolkit
cd ~/claude-code-toolkit
./install.sh --symlinkThe installer links agents, skills, hooks, commands, and scripts into ~/.claude/, where Claude Code loads extensions from. It also mirrors skills into ~/.codex/skills and agents into ~/.codex/agents for Codex. Use --symlink to get updates via git pull, or run without it for a stable copy.
Verify the install with:
python3 ~/.claude/scripts/install-doctor.py check
python3 ~/.claude/scripts/install-doctor.py inventoryIf you update the repo later and want Codex to see newly added skills, rerun ./install.sh --symlink from the repo root.
Command entry points:
- Claude Code:
/do - Codex:
$do
Detailed setup: docs/start-here.md
The toolkit mirrors agents, skills, and a curated subset of hooks into ~/.codex/ so they work with the OpenAI Codex CLI alongside Claude Code. The mirror runs automatically on every install.sh; no flags required.
What mirrors
- Agents: every agent under
agents/(andprivate-agents/if present) is copied or symlinked to~/.codex/agents/. - Skills: every skill under
skills/,private-skills/, andprivate-voices/*/skill/goes to~/.codex/skills/. - Hooks: a Phase 1 allowlist of 6 hooks goes to
~/.codex/hooks/with a matching generated~/.codex/hooks.json. The allowlist coversSessionStartinjectors (KAIROS briefing, operator context, team config, rules distill), aStoprecorder (session learning), and aPostToolUseBash scanner. Seescripts/codex-hooks-allowlist.txt. - Feature flag:
install.shsets[features] codex_hooks = truein~/.codex/config.tomlusing a TOML merge that preserves existing sections.
What does not mirror (yet)
- Edit/Write interceptors (ADR enforcement, creation protocol, config protection, plan gate, rename sweep, and similar) are Phase 2 and blocked on upstream openai/codex#16732. Codex PreToolUse and PostToolUse only fire for the
Bashtool today, so any hook guardingWriteorEditwould register but never run. - Codex does not support
PreCompact,SubagentStop,Notification, orSessionEndevents; hooks on those events stay Claude Code only. - Windows: Codex hook support is disabled upstream.
- Codex CLI v0.114.0 or later is required for hook activation.
install.shwarns if the installed version is below that, and prints a neutral note if thecodexbinary is not found.
Opting out
There is no opt-out flag. The mirror is harmless when Codex CLI is not installed: ~/.codex/ entries sit unused until you install Codex. To skip the hooks portion, delete ~/.codex/hooks/ and ~/.codex/hooks.json after install; the toolkit does not recreate them on normal use.
Reference: adr/182-codex-hooks-mirror.md. Upstream Phase 2 tracker: openai/codex#16732.
-
Routing. You type a request. The router entry point is
/doin Claude Code and$doin Codex. It classifies intent, selects a domain agent and a workflow skill, and dispatches. No menus, no configuration. -
Planning. For non-trivial work, the system creates a plan before touching code. Plans have phases, gates, and saved artifacts at each step.
-
Execution. A domain-specific agent handles the work using its skill's methodology. Go work gets Go idioms. Python work gets Python conventions. Reviews get multi-wave specialist panels.
-
Quality gates. Hooks fire automatically: anti-rationalization checks on code modifications, error learning after failures, context injection at session start. Quality is structural, not advisory.
-
Verification. Tests run. Deterministic scripts validate what LLM judgment cannot. The system does not claim completion without evidence.
-
Delivery. Changes land on a feature branch. PRs include lint checks, test runs, and review gates. Nothing merges without CI passing.
Agents carry domain-specific expertise. Not thin wrappers that say "you are an expert," but concrete knowledge: version-specific idiom tables, anti-pattern catalogs with detection commands, error-to-fix mappings from real incidents.
Software Engineering
- Go, Python, TypeScript, PHP, Kotlin, Swift, Node.js, React Native
- Database design, data pipelines, SQLite/Peewee ORM
- Kubernetes, Ansible, Prometheus/Grafana, OpenSearch, RabbitMQ, OpenStack
Code Review
- Multi-perspective review (newcomer, senior, pedant, contrarian, user advocate)
- Domain-specific review (ADR compliance, business logic, structural)
- Playbook-enhanced review with adversarial verification
Frontend & Creative
- React portfolios, Next.js e-commerce, UI/UX design
- PixiJS combat rendering, Rive skeletal animation, combat visual effects
- Performance optimization, TypeScript debugging
Infrastructure
- Pipeline orchestration, project coordination, research coordination
- System upgrades, toolkit governance, technical documentation
- MCP server development, Perses observability platform
Skills enforce methodology. They are phased workflows with gates that prevent skipping steps.
Development Workflows: test-driven development, systematic debugging, feature lifecycle, subagent-driven development, pair programming
Code Quality: parallel code review (3 simultaneous reviewers), systematic code review, code cleanup, universal quality gates, linting
Content & Research: voice-validated writing, research pipelines, content calendars, SEO optimization, topic brainstorming, content repurposing
Operations: PR workflow with quality gates, git commit flow, GitHub Actions checks, cron job auditing, service health checks, Kubernetes debugging
Meta: skill evaluation, agent comparison, A/B testing, toolkit evolution, reference enrichment, routing table management
Event-driven automation that fires on session start, before/after tool use, at compaction, and on stop. Error learning, context injection, quality enforcement, and anti-rationalization all run automatically.
Deterministic Python utilities for mechanical operations: INDEX generation, learning database management, voice validation, routing manifests, reference validation. LLMs orchestrate; programs execute.
I just want to use it | Install in 2 minutes, learn a few commands. Done.
I do knowledge work | Content pipelines, research workflows, community moderation. No code required.
I'm a developer | Architecture, extension points, how to add your own agents and skills.
I'm an AI power user | Routing tables, pipeline architecture, hook system, the learning database.
I'm an AI agent | Machine-dense component inventory. Tables, file paths, schemas, routing rules.
I'm on LinkedIn | 🚀 Thought leadership. Agree? 👇
The toolkit is built on tested principles, not aspirations. Key ideas:
- Zero-expertise operation. The system requires no specialized knowledge from the user. Say what you want done. The system handles the rest.
- LLMs orchestrate, programs execute. If a process is deterministic and measurable, use a script. Reserve LLM judgment for design decisions and contextual diagnosis.
- Tokens are cheap, quality is expensive. Dispatch parallel review agents, run validation scripts, create plans before executing. Quality gates run on every invocation regardless of token budget.
- Agents carry the knowledge, not the model. Agent quality is proportional to the specificity of attached knowledge, not the confidence of attached tone. We tested this empirically with A/B experiments.
- Anti-rationalization as infrastructure. The toolkit enforces verification structurally. Quality gates are built into the pipeline, not left to discipline.
- Everything should be a pipeline. Complex work decomposes into phases. Phases have gates. Gates prevent cascading failures.
Read the full design philosophy: PHILOSOPHY.md
MIT. See LICENSE.
