Skip to content

fix(orchestrating-agent-relay): correct ACK target + broker-lifecycle troubleshooting#52

Merged
khaliqgant merged 1 commit into
mainfrom
fix/orchestrator-ack-and-broker-troubleshooting
May 19, 2026
Merged

fix(orchestrating-agent-relay): correct ACK target + broker-lifecycle troubleshooting#52
khaliqgant merged 1 commit into
mainfrom
fix/orchestrator-ack-and-broker-troubleshooting

Conversation

@khaliqgant

Copy link
Copy Markdown
Member

Summary

Hardens the orchestrating-agent-relay skill against friction observed repeatedly in fresh runs.

1. Wrong ACK target (the recurring one)

The skill led orchestrators to tell workers to DM broker for ACK. broker is the broker's internal routing self-name (crates/broker workspace.rs/routing.rs in the relay repo), not a spawnable/DM-able agent — every codex/claude worker hit Agent "broker" not found and recovered via #general. Protocol now states ACK/DONE go to orchestrator (the auto-registered spawning identity) or #general, never broker.

2. Broker-lifecycle troubleshooting (Common Mistakes rows)

  • Half-started broker: process alive but status says STOPPED and Failed to read broker connection metadataup spawned a broker whose readiness timed out and was never reaped; retrying up doesn't clean the orphan. Documents the pkill/down --force + clean .agent-relay/ recovery.
  • status vs who contradiction: status (state file) says RUNNING/Agents:N while who/send/replies (live RPC) fail — stale .agent-relay/connection.json / wrong port / second broker. Recovery + agent-relay doctor.
  • Invalid agent token from orchestrator CLI while broker+workers keep working — unresolved ${RELAY_API_KEY} template used as a literal key.

Docs only. (A relay-side PR #914 hardens the corresponding broker/doctor behavior; this is the skill-guidance counterpart.)

🤖 Generated with Claude Code

…ycle troubleshooting

Across multiple fresh runs, orchestrators told workers to DM `broker` for
ACK, but `broker` is the broker's internal routing self-name, not a
DM-able agent (workers hit `Agent "broker" not found`). Make ACK/DONE
target explicitly `orchestrator`/`#general`. Add Common Mistakes rows
for: half-started broker (process alive, status STOPPED, missing
connection metadata), the status-vs-who stale-connection contradiction,
and unresolved ${RELAY_API_KEY} template auth failures — each pointing
at `agent-relay doctor`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented May 19, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: c31def17-26d1-4be6-a478-677427a60198

📥 Commits

Reviewing files that changed from the base of the PR and between 2131172 and 0f727a7.

📒 Files selected for processing (1)
  • skills/orchestrating-agent-relay/SKILL.md

📝 Walkthrough

Walkthrough

This PR updates the orchestrating-agent-relay SKILL.md documentation to clarify the correct protocol targets for worker ACK/DONE responses (orchestrator or #general, never broker) and adds four new "Common Mistakes" troubleshooting entries addressing half-started broker states, worker DM failures, stale broker connection metadata, and unresolved token template issues.

Changes

Orchestrator Relay Protocol and Troubleshooting Guide

Layer / File(s) Summary
Protocol clarification: ACK/DONE recipient targets
skills/orchestrating-agent-relay/SKILL.md
Workers must respond to orchestrator or #general channel. Explicitly forbids DM targets to broker since broker is an internal routing self-name, not a DM-able agent identity.
Common troubleshooting scenarios
skills/orchestrating-agent-relay/SKILL.md
Four new troubleshooting rows address: half-started broker state with metadata write failures and cleanup steps; expected worker DM failures when tasks target broker; stale/wrong broker connection state where status shows RUNNING but live commands fail; and "Invalid agent token" caused by unresolved ${RELAY_API_KEY} template in orchestrator environment.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Possibly related PRs

  • AgentWorkforce/skills#48: Updates to the same orchestrating-agent-relay/SKILL.md file covering CLI-first guidance and protocol framing.
  • AgentWorkforce/skills#43: Related agent-relay orchestration and troubleshooting guidance updates addressing broker state readiness and fixing stale broker connection conditions.

Poem

🐰 A broker that spoke but was never quite there,
No DM for shadows, just orchestrators fair!
With tokens and metadata, now we are clear—
The relay guides workers to destinations dear. ✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically summarizes the main changes: correcting the ACK target (orchestrator vs. broker) and adding broker-lifecycle troubleshooting guidance to the orchestrating-agent-relay skill documentation.
Description check ✅ Passed The description is directly related to the changeset, providing detailed context for the documentation updates including specific issues addressed (wrong ACK target, broker-lifecycle scenarios) and their resolutions.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/orchestrator-ack-and-broker-troubleshooting

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no bugs or issues to report.

Open in Devin Review

@khaliqgant khaliqgant merged commit 3f81c2a into main May 19, 2026
3 checks passed
@khaliqgant khaliqgant deleted the fix/orchestrator-ack-and-broker-troubleshooting branch May 19, 2026 12:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant