Turn Claude Code and OpenAI Codex into two host surfaces over one auditable Thoth runtime with persistent state, mechanical verification, and dashboard visibility.
Persistent truth + validation scripts + autonomous execution loops = agent work you can actually inspect, recover, and trust.
Thoth bridges a human-facing audit dashboard with an AI execution engine through shared persistent project state.
Thoth is an Agent Project OS for research and engineering workflows. It does not stop at prompting conventions: it installs a persistent project layer with state documents, validation scripts, generated structure, and a dashboard humans can use to see what is actually true.
At the center of Thoth are two connected systems:
- Audit System: truth, evidence, decisions, recoverability
- Execution System: tasks, loops, verification, delegation
Thoth now treats .thoth as the only runtime authority and exposes two
official host surfaces:
- Claude:
/thoth:* - Codex:
$thoth <command>
Both surfaces project from the same host-neutral command specification and write through the same runtime ledger shape.
Most agent workflows disappear into chat history:
- plans drift
- decisions are lost
- evidence is hard to recover
- execution and review happen in different places
- humans cannot easily see current project truth
Thoth solves that by materializing agent work into a real operating layer:
- persistent project state in files
- mechanical validation and consistency checks
- explicit execution and governance modes
- dashboard-visible progress and health
- recoverable project memory outside the chat window
Add the Thoth marketplace and install the plugin:
claude plugin marketplace add SeeleAI/Thoth --scope user
claude plugin install thoth@thoth --scope userUse project or local scope instead of user if you want a narrower install.
Claude /thoth:* commands execute the repo-local Thoth CLI through the plugin
bridge before Claude summarizes the result. On the first run, Claude may ask
you to approve scripts/thoth-claude-command.sh; you can either approve it
once, add a project-local allow rule in .claude/settings.local.json, or set a
global allow rule in ~/.claude/settings.json if you want Thoth to work
without approval prompts across all projects.
Codex uses the marketplace source as the install and enable step:
codex plugin marketplace add SeeleAI/ThothUpdate an existing Codex install:
codex plugin marketplace upgrade thothgit clone https://github.com/SeeleAI/Thoth.git
cd Thoth
claude plugin add "$(pwd)"
codex plugin marketplace add "$(pwd)"Open the repository you want to manage with Thoth and run:
/thoth:init
$thoth init
This audits the current repository first, then adopts or scaffolds the project
operating layer with a recorded migration bundle. It can start from a blank
repository or an already-drifted repo that already contains docs, .agent-os/,
or partial Thoth state. The managed layer includes:
.agent-os/state and governance documents.thoth/runtime authority tree- strict
Decision -> Contract -> generated Taskplanning authority - repo-level verdict ledger under
.thoth/project/verdicts/ - strict sync / doctor validation scripts
tools/dashboard/backend and frontend- project-local helper scripts and tests
Use /thoth:init inside Claude Code and $thoth init inside Codex. Both routes
write the same .thoth authority tree.
Typical first actions:
/thoth:status
/thoth:run --task-id <task_id>
/thoth:loop --task-id <task_id>
/thoth:dashboard
$thoth status
$thoth run --task-id <task_id>
$thoth loop --task-id <task_id>
$thoth dashboard
| Capability | Commands | What it does |
|---|---|---|
| Bootstrap | /thoth:init |
Audits the current repository and adopts/scaffolds the managed Thoth project layer |
| Single-task execution | /thoth:run |
Executes one focused task with validation, sync, and commit discipline |
| Autonomous iteration | /thoth:loop |
Runs task-mode or metric-mode loops with verification and rollback logic |
| Governance | /thoth:discuss, /thoth:review |
Separates planning and review from code execution while preserving conclusions |
| Visibility | /thoth:status, /thoth:dashboard, /thoth:report |
Surfaces current truth through structured output, dashboard views, and reports |
| Integrity checks | /thoth:doctor, /thoth:sync |
Audits project persistence, reference health, and synchronization |
| Plugin evolution | /thoth:extend |
Safely evolves the plugin itself under test gates |
Claude still supports Codex delegation on the main public commands:
/thoth:run --executor codex .../thoth:loop --executor codex .../thoth:review --executor codex ...
Codex also has its own official single-entry public surface:
$thoth init
$thoth run
$thoth loop
$thoth review
$thoth status
The Codex plugin is packaged through .codex-plugin/plugin.json and exposes one
public skill bundle rooted at .agents/skills/thoth/.
Both surfaces share the same runtime rules:
.thothis the only authority- execution planning is strict:
Decision -> Contract -> compiler-generated Task runandloopexecute only by--task-id; free-form execution is intentionally rejectedrunandloopare durable by default- attach / resume / watch / stop operate on the same run ledger
- dashboard reads
.thoth/runs/*, not host session state
Thoth operates in two layers.
The Thoth repository provides:
- public command definitions in
commands/ - internal contracts in
contracts/ - internal agents in
agents/ - automation hooks in
hooks/ - management scripts in
scripts/ - deployable project templates in
templates/
This layer defines how the operating system behaves.
When you run /thoth:init in a target repository, Thoth generates a persistent
project layer with:
- state docs:
.agent-os/ - runtime authority:
.thoth/ - planning authority:
.thoth/project/decisions,.thoth/project/contracts,.thoth/project/tasks - repo-level verdict authority:
.thoth/project/verdicts - strict sync / doctor validation tooling
- dashboard backend and frontend
- project-local scripts and tests
That is the core idea: Thoth does not only tell the agent what to do; it installs the project substrate that makes the work inspectable and recoverable.
Thoth is designed around distinct operating modes rather than one overloaded assistant surface.
/thoth:runfor one focused change/thoth:loopfor iterative execution with decision logic--executor codexfor delegated Codex work under Thoth control
/thoth:discussfor docs, config, and task-state changes without touching code/thoth:reviewfor first-principles critique outside the active implementation path
/thoth:statusprints a structured project snapshot/thoth:dashboardstarts the human-facing dashboard/thoth:doctoraudits project health and consistency/thoth:syncaligns generated views and references/thoth:reportbuilds progress reports from recorded state
- Audit-first: no silent completion claims without evidence
- Execution with verification: loops must validate, not just act
- Recoverable state: important truth must live in files, not only in chat
- Dashboard visibility: humans need an operating view, not raw agent traces
- Script-backed behavior: the system relies on contracts and scripts, not pure improvisation
- Tested infrastructure: golden-data-driven tests protect the operating layer
Thoth now ships a heavy self-test orchestration entrypoint that exercises the real CLI, real temporary repositories, real dashboard processes, fault injection, and optional host-native Codex / Claude matrices.
Run the daily process-real gate:
python scripts/selftest.py --tier hardRun the heavy gate with dashboard browser validation and host-real matrices:
python scripts/selftest.py --tier heavy --hosts autoThe runner writes a machine-readable summary plus artifacts for command transcripts, ledger snapshots, dashboard payloads, and browser traces.
Thoth is currently:
- generated from a host-neutral public command spec
- published as both a Claude plugin surface and an official Codex plugin surface
- backed by a durable
.thoth/runs/*ledger plus machine-local supervisor registry - observable in the dashboard through shared runtime summaries
Run the test suite from the repository root:
pytest -qBranch policy:
- Do day-to-day development on
dev - Treat
mainas the stable integration and release branch - Do not directly modify
mainfor normal feature or code development - Promote changes from
devintomaindeliberately, withcherry-pickas the default path - Do not commit normal development work straight onto
main; land it ondevfirst and promote reviewed code later
Current repository contents include:
- plugin metadata in
.codex-plugin/ - plugin metadata in
.claude-plugin/ - Codex skill metadata in
.agents/skills/ - public commands in
commands/ - internal contracts in
contracts/ - internal agents in
agents/ - scripts in
scripts/ - dashboard and project templates in
templates/ - unit and integration tests in
tests/
Thoth is now a standalone open-source project. Contributions that improve the operating model, validation logic, dashboard experience, runtime abstractions, or documentation are welcome.
When changing behavior, update tests and contracts together so the system stays trustworthy as it evolves.
MIT. See LICENSE.
