Skip to content

Add Bridge & Staffing design for multi-project orchestration#9

Merged
khaliqgant merged 35 commits into
mainfrom
claude/multi-project-agent-sockets-RCkNs
Dec 25, 2025
Merged

Add Bridge & Staffing design for multi-project orchestration#9
khaliqgant merged 35 commits into
mainfrom
claude/multi-project-agent-sockets-RCkNs

Conversation

@khaliqgant

Copy link
Copy Markdown
Member

Introduces the "bridge" command for cross-project agent coordination:

  • Architect/Principal role as orchestrator connecting multiple projects
  • Leads with spawn capability to dynamically create worker agents
  • Standup protocol for daily work coordination
  • Multi-project dashboard visibility

Introduces the "bridge" command for cross-project agent coordination:
- Architect/Principal role as orchestrator connecting multiple projects
- Leads with spawn capability to dynamically create worker agents
- Standup protocol for daily work coordination
- Multi-project dashboard visibility
- bridge: just project paths as args
- lead: just your name
- System handles all complexity underneath
- Add bridge module with MultiProjectClient for multi-socket connections
- Add AgentSpawner for lead agents to spawn/release workers
- Add 'bridge' command: agent-relay bridge ~/project1 ~/project2
- Add 'lead' command: agent-relay lead Alice claude
- Support config file for project defaults (~/.agent-relay/bridge.json)
- Support --cli override for all projects
Add Multi-Project Orchestration section with:
- Bridge command usage
- Lead command usage
- Cross-project messaging syntax
- Spawn/release worker patterns
- Link to full design doc
- agent-relay-spawn-handler: Wire up spawn/release in lead mode
- agent-relay-cross-project-parser: Add project:agent syntax
- agent-relay-multi-project-dashboard: Bridge view for dashboard
@khaliqgant khaliqgant requested a review from Copilot December 21, 2025 10:42

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a multi-project orchestration layer that enables a single "Architect" agent to coordinate work across multiple projects, each managed by a "Lead" who can dynamically spawn worker agents.

Key Changes:

  • New bridge command for cross-project agent coordination via multi-socket connections
  • New lead command for project leads with worker spawning capabilities
  • Infrastructure for dynamic agent spawning and release via tmux

Reviewed changes

Copilot reviewed 10 out of 11 changed files in this pull request and generated 13 comments.

Show a summary per file
File Description
src/cli/index.ts Added bridge and lead CLI commands with multi-project connection and spawn management
src/bridge/config.ts Configuration loading and project resolution logic for bridge mode
src/bridge/multi-project-client.ts Client for connecting to multiple project daemons simultaneously
src/bridge/spawner.ts Agent spawning service for creating/releasing worker agents in tmux windows
src/bridge/types.ts Type definitions for bridge, spawn, and worker management
src/bridge/utils.ts Utility functions for shell escaping, sleep, and target parsing
src/bridge/index.ts Module exports for bridge functionality
docs/DESIGN_BRIDGE_STAFFING.md Comprehensive design document for the bridge and staffing feature
README.md Updated with multi-project orchestration usage examples
.beads/issues.jsonl Added tracking issues for remaining implementation work

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/cli/index.ts

// Handle messages from projects
client.onMessage = (projectId, from, payload, messageId) => {
console.log(`[${projectId}] ${from}: ${payload.body.substring(0, 80)}...`);

Copilot AI Dec 21, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential error if payload.body is undefined or not a string. Add a check or default value before calling substring().

Suggested change
console.log(`[${projectId}] ${from}: ${payload.body.substring(0, 80)}...`);
const bodyText =
payload && typeof payload.body === 'string'
? payload.body
: String(payload?.body ?? '');
console.log(`[${projectId}] ${from}: ${bodyText.substring(0, 80)}...`);

Copilot uses AI. Check for mistakes.
Comment thread src/cli/index.ts Outdated
Comment on lines +389 to +393
const [mainCommand, ...commandArgs] = cli.split(':');

const wrapper = new TmuxWrapper({
name,
command: mainCommand,

Copilot AI Dec 21, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable name mainCommand is ambiguous as it could be confused with a system command. Consider renaming to cliTool or baseCli to clarify it represents the CLI tool portion.

Suggested change
const [mainCommand, ...commandArgs] = cli.split(':');
const wrapper = new TmuxWrapper({
name,
command: mainCommand,
const [cliTool, ...commandArgs] = cli.split(':');
const wrapper = new TmuxWrapper({
name,
command: cliTool,

Copilot uses AI. Check for mistakes.
Comment thread src/bridge/utils.ts
Comment on lines +39 to +46
export function escapeForShell(str: string): string {
return str
.replace(/\\/g, '\\\\')
.replace(/"/g, '\\"')
.replace(/\$/g, '\\$')
.replace(/`/g, '\\`')
.replace(/!/g, '\\!');
}

Copilot AI Dec 21, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The shell escaping implementation may not handle all edge cases for shell injection. Consider using a well-tested library or adding single quote escaping and newline handling for more robust protection.

Copilot uses AI. Check for mistakes.
Comment thread src/bridge/spawner.ts
Comment on lines +25 to +30
try {
await execAsync(`tmux has-session -t ${this.tmuxSession} 2>/dev/null`);
} catch {
// Session doesn't exist, create it
await execAsync(
`tmux new-session -d -s ${this.tmuxSession} -c "${this.projectRoot}"`

Copilot AI Dec 21, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tmux session name is not escaped before being used in a shell command. If this.tmuxSession contains shell metacharacters, it could lead to command injection. Use proper escaping or validate the session name format.

Suggested change
try {
await execAsync(`tmux has-session -t ${this.tmuxSession} 2>/dev/null`);
} catch {
// Session doesn't exist, create it
await execAsync(
`tmux new-session -d -s ${this.tmuxSession} -c "${this.projectRoot}"`
const safeSession = escapeForTmux(this.tmuxSession);
try {
await execAsync(`tmux has-session -t ${safeSession} 2>/dev/null`);
} catch {
// Session doesn't exist, create it
await execAsync(
`tmux new-session -d -s ${safeSession} -c "${this.projectRoot}"`

Copilot uses AI. Check for mistakes.
Comment thread src/bridge/spawner.ts
} catch {
// Session doesn't exist, create it
await execAsync(
`tmux new-session -d -s ${this.tmuxSession} -c "${this.projectRoot}"`

Copilot AI Dec 21, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both this.tmuxSession and this.projectRoot are interpolated into shell commands without proper escaping. Use the escapeForShell utility function to prevent command injection.

Copilot uses AI. Check for mistakes.
Comment thread src/bridge/spawner.ts
).catch(() => {});

// Wait a bit for graceful shutdown
await sleep(2000);

Copilot AI Dec 21, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another hardcoded sleep duration. Define this as a named constant (e.g., GRACEFUL_SHUTDOWN_DELAY_MS) for better maintainability.

Copilot uses AI. Check for mistakes.
type: 'PONG',
id: uuid(),
ts: Date.now(),
payload: (envelope.payload as { nonce?: string }) ?? {},

Copilot AI Dec 21, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The nullish coalescing operator (??) here is unnecessary because envelope.payload as { nonce?: string } will never be null or undefined from a type cast. If the intent is to provide a default when payload is missing, check envelope.payload directly.

Suggested change
payload: (envelope.payload as { nonce?: string }) ?? {},
payload: envelope.payload ? (envelope.payload as { nonce?: string }) : {},

Copilot uses AI. Check for mistakes.
Comment thread src/bridge/config.ts
Comment on lines +153 to +154
await execAsync(`cd "${project.path}" && agent-relay up &`, {
timeout: 5000,

Copilot AI Dec 21, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The project.path is interpolated into a shell command without escaping. Use proper escaping to prevent command injection if the path contains shell metacharacters.

Suggested change
await execAsync(`cd "${project.path}" && agent-relay up &`, {
timeout: 5000,
await execAsync('agent-relay up &', {
timeout: 5000,
cwd: project.path,

Copilot uses AI. Check for mistakes.
Comment thread src/bridge/config.ts
Comment on lines +152 to +153
// Start daemon in background
await execAsync(`cd "${project.path}" && agent-relay up &`, {

Copilot AI Dec 21, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using & to background the process in a shell command passed to execAsync may not work as intended. The parent process will complete immediately, and the backgrounded process may not be properly managed. Consider using a proper process spawning mechanism or handling the async nature differently.

Suggested change
// Start daemon in background
await execAsync(`cd "${project.path}" && agent-relay up &`, {
// Start daemon; let agent-relay manage its own backgrounding if needed
await execAsync(`cd "${project.path}" && agent-relay up`, {

Copilot uses AI. Check for mistakes.
│ auth-service │ │ frontend │ │ api-service │
│ Project Daemon │ │ Project Daemon │ │ Project Daemon │
│ │ │ │ │ │
│ Socket: │ │ Socket: │ │ Socket: │

Copilot AI Dec 21, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ASCII diagram uses inconsistent line styles (box-drawing characters). While functional, using consistent Unicode box-drawing characters throughout would improve visual consistency.

Copilot uses AI. Check for mistakes.
claude and others added 22 commits December 21, 2025 10:45
- config.test.ts: Tests for path resolution, lead naming, project resolution
- utils.test.ts: Tests for target parsing, shell escaping

26 tests passing
Tests actual socket connections with real daemons for:
- Connection management across multiple projects
- Message routing to specific project agents
- Lead alias resolution
- Broadcast to all leads
Track implementation of Slack-style thread replies with parent_message_id
tracking, nested UI display, and notification badges.
Frontend changes:
- Complete redesign with Slack-inspired layout (sidebar + main panel)
- Channel-based navigation (#general, #broadcasts, agent DMs)
- Command palette with Cmd+K for quick actions and search
- Presence indicators (online/offline) on agent avatars
- Message bubbles with avatars, timestamps, and hover actions
- Date dividers and thread indicators (UI ready for threads feature)
- Auto-resizing composer with target selector
- Typing indicator UI (infrastructure ready)

Backend changes:
- Add presence table for real-time status tracking
- Add read_state table for unread message counts
- Add presence management methods (updatePresence, getAllPresence)
- Add typing indicator support (setTypingIndicator)
- Add read state tracking (updateReadState, getUnreadCounts)

Design system:
- CSS custom properties for theming
- Inter font for UI, JetBrains Mono for code
- Slack-accurate color palette and spacing
Frontend TypeScript modules:
- types.ts: Type definitions for Agent, Message, AppState, DOMElements
- utils.ts: Utility functions (escapeHtml, formatTime, getAvatarColor, etc.)
- state.ts: Reactive state management with subscribe/notify pattern
- websocket.ts: WebSocket connection with reconnection logic
- components.ts: UI rendering (agents, messages, command palette)
- app.ts: Application entry point and event listeners
- index.ts: Module exports

Testing:
- utils.test.ts: Comprehensive tests for utility functions
- state.test.ts: Tests for state management and message filtering
- vitest.config.ts: Added jsdom environment for frontend tests

Build configuration:
- Added esbuild for bundling frontend TypeScript
- Added build:frontend script to package.json
- Added jsdom dev dependency for DOM testing
- Created frontend-specific tsconfig.json

HTML updated to load bundled /js/app.js module
- Exclude src/dashboard/frontend from main tsconfig to prevent DOM type errors
- Fix project-namespace test to check for .project marker instead of just data dir
- Add built frontend bundle
- .env.example: Environment variables reference
- cli-usage.sh: CLI command examples
- programmatic-usage.ts: Library usage in TypeScript
- docker-compose.yml: Docker deployment example
- agent-relay.service: Systemd service file
- README.md: Examples documentation
…:khaliqgant/agent-relay into claude/review-pr-9-6HyEY
Add beads task for threaded conversations feature
- computeNeedsAttention: derive agents needing attention from message history
- findAgentConfig: auto-detect agent role from .claude/agents/ or .openagents/
- Includes full test coverage for both utilities
- Enables agent-relay-9w0 (auto-detect role) implementation
- Tests passing: 452/453 ✓
Major Changes:
- Cross-project message support (@project:agent syntax)
- Bridge interface for project-based messaging
- Dashboard display with project badges
- Agent auto-detection from .claude/agents config
- Needs-attention indicators for pending messages

Features:
- computeNeedsAttention: Heuristic for pending message detection
- Agent config detection from frontmatter
- Project namespace support in relay messages
- Enhanced bridge header and navigation

Tests:
- 452/453 tests passing ✓
- Full coverage for new utilities
- Integration tests complete

Issues Created:
- agent-relay-9uq: Project chat targeting
- agent-relay-1t7: Project name display
- agent-relay-vxc: Navigation consistency
- agent-relay-290: Interface parity

[bd-agent-relay] Dashboard + Bridge improvements complete
khaliqgant and others added 7 commits December 24, 2025 12:01
Bridge Features Completed:
- Project connection status (Online/Reconnecting/Offline with pulsing)
- Project chat targeting (message composer with project/agent dropdowns)
- Project name display in bridge header (with back link + highlighting)
- Navigation consistency (Dashboard ↔ Bridge seamless transitions)
- Interface parity audit (bridge vs dashboard features locked)
- Multi-project client reconnection (exponential backoff, configurable)

Dashboard Features:
- needsAttention indicators (pulsing badge, real-time updates)
- Project-based communication (selective targeting)
- Cross-project messaging (@project:agent syntax)
- Agent auto-detection from .claude/agents config

All tests passing (457+). Bridge UI production-ready.
- POST /api/spawn - Create new agent with name, cli, task
- GET /api/workers - List active spawned workers
- DELETE /api/workers/:name - Release a worker

Enable with: agent-relay up --spawn
- Remove --spawn flag requirement, API always enabled
- Rename /api/workers to /api/spawned for consistency
- Update response field from 'workers' to 'agents'
@khaliqgant khaliqgant merged commit e7ae9a7 into main Dec 25, 2025
6 checks passed
@khaliqgant khaliqgant deleted the claude/multi-project-agent-sockets-RCkNs branch December 25, 2025 13:49
khaliqgant added a commit that referenced this pull request Feb 3, 2026
…tart

Three bug fixes reported during MCP testing:

1. **Spawn race condition fix** (Bug #10): Added spawningAgents mutex to
   prevent concurrent spawn requests for the same agent from both passing
   the activeWorkers.has() check before either completes.

2. **SIGKILL diagnostics** (Bug #7, #10): Added gatherSigkillDiagnostics()
   to capture memory usage, process count, and OOM killer messages when
   exit code 137 or SIGKILL is detected. This helps diagnose resource
   exhaustion issues.

3. **Orphan cleanup** (Bug #8, #9): Added cleanupOrphanedWorkers() that
   runs on spawner startup to kill relay-pty processes from a previous
   daemon run. This ensures a clean slate after daemon restarts.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants