|
| 1 | +--- |
| 2 | +name: debug |
| 3 | +description: > |
| 4 | + Diagnose a running or completed Druids execution. Pulls agent traces, |
| 5 | + activity logs, and diffs, then produces a structured diagnostic covering |
| 6 | + communication health, errors, goal progress, agent performance, and |
| 7 | + behavioral bottlenecks. |
| 8 | +user-invocable: true |
| 9 | +--- |
| 10 | + |
| 11 | +# Debug an Execution |
| 12 | + |
| 13 | +The user wants to understand what is happening (or what went wrong) inside a Druids execution. This skill produces a diagnostic report by pulling every available signal and analyzing it systematically. |
| 14 | + |
| 15 | +## 1. Identify the execution |
| 16 | + |
| 17 | +The user may pass a slug directly (`/debug gentle-nocturne`) or say something like "debug the current run". If no slug is given, call `list_executions` with `active_only=true` and pick the most recent one. If there are multiple active executions, ask which one. |
| 18 | + |
| 19 | +## 2. Gather all available data |
| 20 | + |
| 21 | +Make these calls in parallel where possible: |
| 22 | + |
| 23 | +**a. Execution state** -- `get_execution` for the slug. Record: status, agent names, agent types, connections, topology edges, exposed services, PR URL, branch name. |
| 24 | + |
| 25 | +**b. Full activity log** -- `get_execution_activity` with `n=200` and `compact=false`. This is the richest signal. It contains every tool call, message, error, connection event, and response across all agents. Request the full (non-compact) version so you can see tool arguments and outputs. |
| 26 | + |
| 27 | +**c. Per-agent traces** -- For each agent in the execution, call `get_agent_trace`. This gives you the coalesced view: messages, thoughts, tool calls with status, and plans. Pull traces for all agents in parallel. |
| 28 | + |
| 29 | +**d. Diff** -- `get_execution_diff`. If no diff exists yet, note that. |
| 30 | + |
| 31 | +**e. Spec** -- from the execution data. You need this to evaluate whether agents are achieving the goal. |
| 32 | + |
| 33 | +## 3. Analyze |
| 34 | + |
| 35 | +Work through each dimension below. Do not skip dimensions even if they seem fine -- explicitly confirming health is part of the diagnostic. |
| 36 | + |
| 37 | +### 3a. Communication health |
| 38 | + |
| 39 | +Questions to answer: |
| 40 | + |
| 41 | +- Are all agents connected? Check for `connected` and `disconnected` events. An agent that connected and then disconnected has a problem. |
| 42 | +- Is the topology correct for the program? Do agents that need to talk to each other have edges between them? |
| 43 | +- Are messages actually flowing? Look for `message` tool calls in the activity. Check that messages sent by one agent show up as received by the target. |
| 44 | +- How long between a message being sent and the receiver acting on it? Gaps longer than 30 seconds suggest the receiver is stuck or not listening. |
| 45 | +- Are any agents talking to themselves or sending messages that go nowhere? |
| 46 | + |
| 47 | +### 3b. Errors |
| 48 | + |
| 49 | +Questions to answer: |
| 50 | + |
| 51 | +- Are there any `error` type events in the activity? What do they say? |
| 52 | +- Are there tool calls that returned errors? Look at `tool_result` events with error indicators. |
| 53 | +- Did any agent disconnect unexpectedly? |
| 54 | +- Are there repeated failures on the same tool call? This usually means the agent is stuck in a retry loop. |
| 55 | +- Did any agent hit a timeout? |
| 56 | +- Are there permission errors (git push failures, file access denied, port already in use)? |
| 57 | + |
| 58 | +### 3c. Agent performance |
| 59 | + |
| 60 | +For each agent, characterize: |
| 61 | + |
| 62 | +- **Activity level**: How many tool calls has it made? Is it actively working or idle? |
| 63 | +- **Focus**: What is it spending its time on? (e.g., 80% file edits, 10% git, 10% messages) |
| 64 | +- **Progress**: Based on its trace, what has it accomplished relative to its role? |
| 65 | +- **Stuck indicators**: Is it repeating the same action? Has it gone silent? Is it producing long stretches of thinking without action? |
| 66 | +- **Tool usage patterns**: Which tools does it use most? Are there tools it should be using but isn't? |
| 67 | + |
| 68 | +Then compare across agents: |
| 69 | + |
| 70 | +- Which agent is furthest along? |
| 71 | +- Which agent is the weakest link (blocking others or making no progress)? |
| 72 | +- Is any agent doing redundant work that another agent already did? |
| 73 | + |
| 74 | +### 3d. Goal progress |
| 75 | + |
| 76 | +Compare the current state against the spec: |
| 77 | + |
| 78 | +- What did the spec ask for? |
| 79 | +- What has actually been built? (Use the diff and exposed services.) |
| 80 | +- Has anyone attempted the demo from the spec? |
| 81 | +- What percentage of the requirements are met? |
| 82 | +- What is left to do? |
| 83 | + |
| 84 | +### 3e. Behavioral bottlenecks |
| 85 | + |
| 86 | +These are the pragmatic, structural problems that slow executions down: |
| 87 | + |
| 88 | +- **File sharing**: Are agents trying to read files another agent is still writing? Look for file-not-found errors or stale reads. |
| 89 | +- **Info aggregation**: In programs with sub-agents, is the orchestrator actually collecting and using sub-agent output? Or is information getting lost? |
| 90 | +- **Messaging timeliness**: Are there long gaps where an agent should have sent a message but didn't? Calculate the longest gap between activity events for each agent. |
| 91 | +- **Hanging**: Is any agent completely silent for more than 2 minutes? This usually means it's stuck waiting for something or has crashed. |
| 92 | +- **Serialization**: Are agents doing work sequentially that could be parallel? (e.g., one agent waiting for another to finish before starting its own independent work) |
| 93 | +- **Scope creep**: Is any agent doing work outside its assigned role? (e.g., the reviewer starting to implement instead of reviewing) |
| 94 | +- **Thrashing**: Is any agent undoing and redoing work? Look for patterns like edit-revert-edit on the same files. |
| 95 | + |
| 96 | +## 4. Present the diagnostic |
| 97 | + |
| 98 | +Structure the output as follows. Be concrete and specific -- cite actual tool names, file paths, message contents, and timestamps from the trace. Do not hedge or generalize. |
| 99 | + |
| 100 | +``` |
| 101 | +## Diagnostic: {slug} |
| 102 | +
|
| 103 | +**Status**: {status} | **Agents**: {count} | **Duration**: {time since start} |
| 104 | +**Spec**: {one-line summary of what was asked for} |
| 105 | +**Branch**: {branch} | **PR**: {url or "none yet"} | **Diff**: +{added}/-{removed} lines |
| 106 | +
|
| 107 | +### Communication |
| 108 | +{2-4 sentences on topology health, message flow, latency. Flag any issues.} |
| 109 | +
|
| 110 | +### Errors |
| 111 | +{List each error with agent name and context. Or "No errors detected."} |
| 112 | +
|
| 113 | +### Agent Performance |
| 114 | +
|
| 115 | +#### {agent_name} ({agent_type}) |
| 116 | +- Activity: {active/idle/stuck} -- {N} tool calls, last active {time} |
| 117 | +- Focus: {what it's spending time on} |
| 118 | +- Progress: {what it's accomplished} |
| 119 | +- Issues: {any problems, or "none"} |
| 120 | +
|
| 121 | +(repeat for each agent) |
| 122 | +
|
| 123 | +### Weakest link |
| 124 | +{Which agent is the bottleneck and why. Be direct.} |
| 125 | +
|
| 126 | +### Goal Progress |
| 127 | +- Spec asks for: {requirements list} |
| 128 | +- Completed: {what's done} |
| 129 | +- Remaining: {what's left} |
| 130 | +- Estimated completion: {close / far / stuck} |
| 131 | +
|
| 132 | +### Bottlenecks |
| 133 | +{List each bottleneck found, with evidence from the trace. Or "No structural bottlenecks detected."} |
| 134 | +
|
| 135 | +### Recommended actions |
| 136 | +{Concrete next steps. Examples: |
| 137 | +- "Send builder a message: the tests are failing because X, try Y" |
| 138 | +- "Stop agent Z, it's been hanging for 5 minutes" |
| 139 | +- "The reviewer hasn't received the builder's submission -- check topology" |
| 140 | +- "Everything looks healthy, just needs more time"} |
| 141 | +``` |
| 142 | + |
| 143 | +## 5. Offer to act |
| 144 | + |
| 145 | +After presenting the diagnostic, ask the user if they want to take any of the recommended actions. You can: |
| 146 | + |
| 147 | +- Send a message to an agent via `send_message` |
| 148 | +- Stop a stuck agent via `stop_agent` |
| 149 | +- Run a command on an agent's VM via `remote_exec` to inspect state |
| 150 | +- Check specific files or processes on the VM |
| 151 | +- SSH into the VM for the user via `get_agent_ssh` |
| 152 | + |
| 153 | +Do not take action without the user's confirmation. |
| 154 | + |
| 155 | +## Notes |
| 156 | + |
| 157 | +- For running executions, the trace is live. If the user asks to "keep watching", re-pull activity after a minute and report changes. |
| 158 | +- If the execution has already completed or failed, the diagnostic is a post-mortem. Shift language accordingly: "what happened" instead of "what's happening". |
| 159 | +- The activity log is the primary signal. Agent traces are secondary -- they show the agent's perspective but miss inter-agent dynamics. |
| 160 | +- When citing evidence, include the agent name and a brief quote or description of the event. "builder called `Write` on `src/app.py` at 14:32" is better than "an agent edited a file". |
| 161 | +- If you see an agent in a retry loop (same tool call 3+ times with errors), that is the highest-priority finding. Flag it first. |
| 162 | +- Token usage from the execution record can indicate whether an agent is doing real work (high token usage) or stuck early (low token usage). |
0 commit comments