Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
1116b84
gm: Phase 0 foundations -- harness config, agent, session, pkg/harnes…
mcavage-docker May 13, 2026
80b452a
gm: Phase 1 -- runtime harness branch + Claude Code adapter
mcavage-docker May 13, 2026
916141f
gm: Phase 2 -- Codex CLI harness adapter
mcavage-docker May 13, 2026
0c3dbd4
gm: Phase 2 -- OpenCode CLI harness adapter
mcavage-docker May 13, 2026
60f442e
gm: Phase 2 -- ACP adapter (Copilot + OpenClaw)
mcavage-docker May 13, 2026
f580bc8
gm: merge Phase 2 Codex adapter
mcavage-docker May 13, 2026
ae90477
gm: merge Phase 2 OpenCode adapter
mcavage-docker May 13, 2026
1b89bfc
gm: merge Phase 2 ACP adapter
mcavage-docker May 13, 2026
f9fc41a
gm: Phase 3 -- sandbox hardening, example YAML
mcavage-docker May 13, 2026
35f8fb3
gm: security + code review fixes
mcavage-docker May 13, 2026
166cc66
gm: docs -- harness-backed agents, CHANGELOG, config version bump
mcavage-docker May 13, 2026
15a5c45
gm: fix 3 bugs found during UAT
mcavage-docker May 13, 2026
a70951e
gm: fix codex adapter flags and error code inference
mcavage-docker May 13, 2026
4269acf
gm: UAT fixes -- cost tracking, AgentInfo from RunStart, codex flags
mcavage-docker May 14, 2026
57c3cda
gm: partial -- add stream_event types and state to claude adapter (in…
mcavage-docker May 14, 2026
4fd2dda
gm: streaming text deltas + codex cost sentinel + TUI sidebar fix
mcavage-docker May 14, 2026
79f4a6c
gm: align pkg/harness types to github.com/rumpl/harness
mcavage-docker May 14, 2026
7aea896
gm: rewrite claude adapter to implement rumpl/harness.Provider
mcavage-docker May 14, 2026
b3f43bb
gm: rewrite codex adapter to implement rumpl/harness.Provider
mcavage-docker May 14, 2026
9305a19
gm: fix replay/record.go for new harness.Event type
mcavage-docker May 14, 2026
33cd48c
gm: fix copilot + openclaw adapters for new harness types
mcavage-docker May 14, 2026
529c48e
gm: fix harness_delegation.go for new harness types
mcavage-docker May 14, 2026
5fafa39
gm: go mod tidy after adding github.com/rumpl/harness
mcavage-docker May 14, 2026
9a0b80c
gm: isolate harness subprocesses in new process group to prevent TUI …
mcavage-docker May 14, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
719 changes: 719 additions & 0 deletions .gm-agent-team/eng/cross-harness-orchestration/arch-review.md

Large diffs are not rendered by default.

1,030 changes: 1,030 additions & 0 deletions .gm-agent-team/eng/cross-harness-orchestration/arch-spec-v2.md

Large diffs are not rendered by default.

940 changes: 940 additions & 0 deletions .gm-agent-team/eng/cross-harness-orchestration/arch-spec.md

Large diffs are not rendered by default.

229 changes: 229 additions & 0 deletions .gm-agent-team/eng/cross-harness-orchestration/consistency-check.md

Large diffs are not rendered by default.

696 changes: 696 additions & 0 deletions .gm-agent-team/eng/cross-harness-orchestration/dx-review-arch.md

Large diffs are not rendered by default.

754 changes: 754 additions & 0 deletions .gm-agent-team/eng/cross-harness-orchestration/dx-review.md

Large diffs are not rendered by default.

903 changes: 903 additions & 0 deletions .gm-agent-team/eng/cross-harness-orchestration/impl-plan-v2.md

Large diffs are not rendered by default.

680 changes: 680 additions & 0 deletions .gm-agent-team/eng/cross-harness-orchestration/impl-plan.md

Large diffs are not rendered by default.

979 changes: 979 additions & 0 deletions .gm-agent-team/eng/cross-harness-orchestration/prd-v2.md

Large diffs are not rendered by default.

636 changes: 636 additions & 0 deletions .gm-agent-team/eng/cross-harness-orchestration/prd.md

Large diffs are not rendered by default.

30 changes: 30 additions & 0 deletions .gm-agent-team/eng/cross-harness-orchestration/status.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
{
"feature": "cross-harness-orchestration",
"task": "Cross-harness orchestration layer: dispatch tasks to Claude Code, Codex, OpenCode, Copilot CLI, OpenClaw as subagents from docker-agent orchestrator",
"branch": "gm/cross-harness-orchestration",
"is_git": true,
"repo_root": "/Users/mcavage/dev/docker-agent",
"baseline": {
"builds": true,
"tests_pass": false,
"pre_existing_failures": ["pkg/config TestCheckRequiredEnvVars", "pkg/teamloader TestLoadExamples (dmr/unload_on_switch)"],
"test_count_approx": 800
},
"stages": {
"prd": { "status": "complete", "artifact": "prd-v2.md" },
"arch-review": { "status": "complete", "artifact": "arch-review.md", "verdict": "revise" },
"dx-review": { "status": "complete", "artifact": "dx-review.md", "verdict": "suggestions" },
"prd-v2": { "status": "complete", "artifact": "prd-v2.md" },
"arch-spec": { "status": "complete", "artifact": "arch-spec-v2.md" },
"impl-plan": { "status": "complete", "artifact": "impl-plan-v2.md" },
"consistency": { "status": "complete", "artifact": "consistency-check.md", "verdict": "pass-after-revision" },
"dx-review-2": { "status": "complete", "artifact": "dx-review-arch.md", "verdict": "suggestions-incorporated" },
"impl": { "status": "pending", "units_done": 0, "units_total": null },
"code-review": { "status": "pending", "verdict": null },
"qa": { "status": "pending", "verdict": null },
"security": { "status": "pending", "verdict": null },
"verification": { "status": "pending" }
},
"escalated": false,
"pr_url": null
}
235 changes: 235 additions & 0 deletions .gm-agent-team/strategy/opencode-vs-docker-agent/assessment.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,235 @@
# docker-agent vs opencode: Migration Assessment

**Date:** 2026-05-13
**Scope:** Can docker-agent replace our opencode GM team setup? Can it wrap CLI harnesses as agent backends?

---

## TL;DR

docker-agent is architecturally superior for multi-agent orchestration and distribution. It cannot today wrap CLI harnesses (codex, claude CLI, aider) as *model backends* — only as tools. Migration is feasible but costs 3-4 weeks without the CLI harness feature, more with it. Recommend: don't migrate the GM team now; build one greenfield docker-agent for a new use case to learn the system, then decide.

---

## 1. Feature Parity

### docker-agent has that opencode doesn't

- **OCI registry**: push/pull agent configs as OCI artifacts — `docker agent run agentcatalog/pirate`
- **serve mcp**: expose any agent as an MCP tool callable by other agents or systems
- **serve a2a**: A2A protocol server for agent-to-agent interop
- **serve api**: HTTP API server
- **background_agents**: true parallel sub-agent dispatch (non-blocking)
- **Fallback models**: automatic failover to secondary models on error
- **Rule-based routing**: route to different models based on example phrases
- **Hooks system**: before_llm_call, pre_tool_use, post_tool_use, on_agent_switch, subagent_stop — all scriptable
- **Context compaction**: automatic summarization when context fills
- **Session persistence**: SQLite, resumable sessions
- **Lifecycle management**: MCP server auto-restart with backoff
- **Deferred toolsets**: OAuth flows deferred until first user interaction
- **A2A + MCP composability**: agents can be both consumers and producers of MCP/A2A

### opencode has that docker-agent doesn't (or does worse)

- **First-class file editing primitives**: Read (with line numbers), Edit (exact-match replacement), Write (with safety guards), Grep, Glob — tuned for code. docker-agent's `filesystem` toolset is coarser.
- **Mature TUI**: opencode's TUI is purpose-built for coding workflows. docker-agent's TUI is more general.
- **CLI harness as model**: opencode's provider abstraction is more open to extension in practice. docker-agent has no subprocess/CLI model provider.
- **Implicit skill injection**: opencode's skill system auto-injects based on task context. docker-agent's skills require explicit `run_skill` tool calls.

---

## 2. Multi-Agent Orchestration

**Yes, docker-agent can replicate our GM → specialist pattern.** The primitives are actually better:

- `transfer_task`: sequential delegation, blocks parent, child gets fresh context window
- `run_background_agent`: parallel dispatch, non-blocking, coordinator polls for results
- `handoff`: linear pipeline (agent A → agent B → agent C)
- Each agent gets its own model, instruction, toolsets
- Sub-agents inherit excluded tools from parent (no recursive loops)
- `subagent_stop` hook fires on every child completion — good for logging/routing

**What maps cleanly:**
- GM agent with `sub_agents: [architect, engineer, code-reviewer, ...]` — direct equivalent
- Different model per agent — direct equivalent
- Fresh context window per delegation — direct equivalent (newSubSession builds a clean session)
- Skills as system prompt injection — direct equivalent via `run_skill` tool

**What doesn't map cleanly:**
- Our GM's "auto-load skill based on task" pattern requires explicit `run_skill` calls in docker-agent. The skill isn't auto-injected; the GM has to decide to call it.
- Our parallel wave execution (multiple subagents in parallel) maps to `run_background_agent`, but the coordinator pattern is different — the GM has to poll, not await.

---

## 3. CLI Harness Wrapping — The Key Question

**Short answer: Not possible today. Possible with ~2-4 weeks of Go work.**

### What "CLI harness as model" means

Instead of docker-agent calling `https://api.openai.com/v1/chat/completions`, it would:
1. Serialize the chat history to stdin (or a temp file)
2. Spawn `codex --exec` (or `claude`, `aider`, etc.)
3. Read the response from stdout
4. Parse tool calls from the CLI's output format
5. Return the result to the runtime loop

### Why it doesn't exist today

The model provider system (`pkg/model/provider/`) has concrete implementations for:
- OpenAI (chat completions + responses API)
- Anthropic
- Gemini
- Bedrock
- Vertex AI
- DMR (Docker Model Runner)
- Custom OpenAI-compatible (via `base_url`)

There is no `subprocess` or `cli` provider. The provider interface requires implementing a streaming chat completion contract — it's not designed for subprocess I/O.

### Workarounds available today

**Option A: `script` toolset** — expose `codex --exec "..."` as a *tool* the LLM can call. The LLM (e.g., Claude) decides when to invoke codex, gets the output back as a tool result. This is NOT the same as codex being the model — Claude is still the reasoning engine, codex is just a tool it can use.

**Option B: `base_url` custom provider** — if the CLI harness exposes an OpenAI-compatible HTTP server (e.g., `codex serve --port 8080`), point docker-agent at it. This works if the CLI supports it. Most don't out of the box.

**Option C: `serve mcp` composition** — run a separate docker-agent instance configured with the CLI harness as a tool, expose it as an MCP server, call it from the main agent. Adds latency and complexity.

### Implementation path for true CLI harness support

Add `pkg/model/provider/cli/` implementing the provider interface:

```go
type CLIProvider struct {
Command string
Args []string
StdinFormat string // "prompt", "json", "openai"
StdoutFormat string // "text", "json", "openai"
Timeout time.Duration
}
```

Config would look like:
```yaml
models:
codex-model:
provider: cli
model: codex
provider_opts:
command: codex
args: ["--exec", "--json-output"]
stdin_format: openai_messages
stdout_format: openai_response
timeout: 600s

agents:
codex-engineer:
model: codex-model
description: "Codex CLI as the engineer"
toolsets:
- type: filesystem
```

**Complexity:** Medium. The provider interface is well-defined. The hard parts are:
1. Tool call serialization/deserialization (each CLI has its own format)
2. Streaming (most CLIs don't stream in a parseable way)
3. Error handling (exit codes, stderr, timeouts)
4. Session state (some CLIs are stateful, some aren't)

Estimate: 1-2 weeks for a basic working implementation, 3-4 weeks for production-quality with tool interop.

---

## 4. Skills System

**Highly compatible. Port is mostly mechanical.**

Our skills are markdown files. docker-agent's skills system:
- Loads markdown files from disk (`skills: true` or `skills: ["local"]`)
- Injects the skill content as the system prompt for a sub-session
- Agent calls `run_skill` tool with the skill name
- Sub-session runs with the skill as its system prompt

**What maps directly:**
- Skill markdown content — unchanged
- Skill directory structure — minor path adjustments
- Skill invocation — GM calls `run_skill("architect")` instead of `Task(subagent_type="architect")`

**What doesn't map:**
- Our skills include `<skill_files>` sections that reference scripts. docker-agent skills don't have a native "bundled resources" concept — you'd need to reference absolute paths or use the `add_prompt_files` config.
- Our skill system auto-loads based on task context. docker-agent requires explicit `run_skill` calls.
- The `skill` tool in opencode is a first-class primitive. In docker-agent, `run_skill` is a built-in tool that the GM must know to call.

---

## 5. Model Diversity

**Yes, fully supported.** Each agent gets its own model:

```yaml
agents:
gm:
model: anthropic/claude-opus-4-7
engineer:
model: openai/gpt-5.5
code-reviewer:
model: google/gemini-3.1-pro
architect:
model: anthropic/claude-opus-4-7
data-analyst:
model: openai/gpt-5.5
```

Inline `provider/model` shorthand works. Full `ModelConfig` with temperature, max_tokens, etc. also works. Fallback chains per agent. Rule-based routing per agent.

**Caveat:** Verify that docker-agent's OpenAI provider supports GPT-5.5's specific API parameters (reasoning effort, etc.) before committing. Provider implementations lag model releases.

---

## 6. Migration Complexity

| Workstream | Effort | Notes |
|---|---|---|
| Translate ~70 skills to docker-agent layout | 1-2 days | Mechanical. Content unchanged. |
| Build GM + specialist agent YAMLs | 2-3 days | One multi-agent YAML. Test each handoff. |
| Wire MCP toolsets (Notion, Slack, Granola, Snowflake, Opine, Chorus, BambooHR) | 2-3 days | Each MCP server needs a config entry. |
| Replace opencode's file editing primitives | 3-5 days | **Underestimated cost.** Need custom MCP server or use coarser `filesystem` toolset. |
| Test TUI workflows | 1 week | Real-use shakedown. Things will break. |
| Build CI/CD for agent configs | 2-3 days | YAML linting, OCI push. |
| **Total without CLI harness** | **3-4 weeks** | One engineer, focused. |
| **+ CLI harness model provider** | **+2-4 weeks** | If you want codex/claude CLI as model backend. |

**The underestimated cost:** opencode's Read/Edit/Write/Grep/Glob are first-class, line-number-aware, safety-guarded file editing tools tuned for code. docker-agent's `filesystem` toolset is coarser. To replicate the editing quality, you'd write a custom MCP server exposing the same primitives. That's real work.

---

## 7. Recommendation

**Don't migrate the GM team now. Revisit in 2-3 months.**

### Why not now

1. **CLI harness as model doesn't exist.** If this is the central reason to evaluate docker-agent, you're paying migration cost *plus* feature development cost. Not a win.

2. **Your current setup works.** The skill library is mature, the GM/specialist pattern is tuned, the team has muscle memory. Marginal gain from docker-agent's orchestration primitives doesn't outweigh switching costs.

3. **File editing quality gap.** The opencode file primitives are better for coding workflows today. Closing this gap requires building a custom MCP server.

### When to change this recommendation

- **You want to deploy agents to non-engineering users** (sales, finance, support). docker-agent's binary distribution and OCI packaging is much cleaner. **Migrate.**
- **You need agents callable by other systems** (API, A2A, MCP). docker-agent wins clearly. **Migrate.**
- **You're hitting opencode-specific blockers.** **Migrate to whichever unblocks fastest.**
- **Strategic alignment** — you work at Docker, you want to standardize on your own product. **Migrate, accept the cost as investment.**

### Suggested path if you do migrate

1. Build the file-tools MCP server first (the underestimated cost).
2. Port one specialist (engineer) end-to-end before doing the rest.
3. Decide on CLI harness model provider explicitly — Phase 1 or post-migration?
4. Don't try to do this and ship anything else in the same month.

### The honest one-liner

docker-agent is the better *architecture* for where multi-agent systems are heading. opencode is the better *coding tool* for where you are today. Migrate when the architecture benefit (distribution, composability, A2A) becomes load-bearing, not before.

14 changes: 14 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,20 @@
All notable changes to this project will be documented in this file.


## [Unreleased]

### Added

- **Cross-harness orchestration.** Agents can now be backed by external agent runtimes instead of a model. Declare `harness: { type: claude-code }` on any subagent and the orchestrator dispatches tasks to the Claude Code CLI, Codex CLI, OpenCode, Copilot CLI, or OpenClaw — getting results back through docker-agent's normal event stream, TUI, and session model.
- Supported harnesses: `claude-code`, `codex`, `opencode`, `copilot` (via ACP), `openclaw` (via ACP).
- Config version bumped to `10`. Version `9` configs upgrade automatically.
- New `harness:` key on agent config, mutually exclusive with `model:`.
- Multi-turn sessions: native resume for Claude Code; simulated history prepend for Codex/OpenCode.
- Sandbox: ACP file operations confined to the session working directory.
- Permission policy: `ask` (default), `auto_allow` (requires `i_understand_the_risk: true`), `deny_all`.
- See `examples/harness-team.yaml` and [Agent Configuration docs](docs/configuration/agents/index.md#harness-backed-agents).


## [v1.59.0] - 2026-05-13

This release adds XML tool call parsing for better model compatibility, performance improvements for TUI rendering, and enhanced remote runtime capabilities.
Expand Down
Loading
Loading