Skip to content

fix(broker): fail fast when a persona worker exits before harness-ready; repair installed persona skill paths#223

Merged
khaliqgant merged 7 commits into
mainfrom
fix/persona-spawn-fail-fast
Jun 11, 2026
Merged

fix(broker): fail fast when a persona worker exits before harness-ready; repair installed persona skill paths#223
khaliqgant merged 7 commits into
mainfrom
fix/persona-spawn-fail-fast

Conversation

@khaliqgant

Copy link
Copy Markdown
Member

Problem

Spawning linear-dispatcher (and repo-router) failed with:

Error: Timed out waiting for Workforce persona linear-dispatcher to prepare its harness. Last output:
...
cp: /Users/.../pear/skills/persona-relayfile-mount.md: No such file or directory

Two independent bugs compounded:

  1. Broken installed personas. The installed linear-dispatcher.json / repo-router.json carry raw ./skills/*.md skill sources from an agentworkforce installer version without skill-asset support (the local 3.0.52 — npx prefers it over the registry, so today's personas:refresh used it). The workforce CLI resolves those against the project root, the cp fails, the &&-chained skill install aborts the spawn, and the workforce skill-cache marker is never written — so every retry re-downloads all prpm skills from scratch.
  2. No fail-fast. When the workforce CLI process died during setup, pear's harness-ready wait kept listening for output until the full 120s PERSONA_HARNESS_READY_TIMEOUT_MS expired, turning an instant failure into a 2-minute mystery timeout.

Fix

Broker (src/main/broker.ts): waitForPersonaHarnessReadyOutput now rejects as soon as the persona's worker is gone:

  • watches agent_exit / agent_exited / agent_released events for the agent, and
  • runs a 2s listAgents liveness poll as a catch-all for exits that slip past the event stream (e.g. between spawn and subscription).

The error includes the captured output tail, so failures like the cp above surface in ~2 seconds with the actual cause.

Installed personas: copied the skill files into __assets/<id>/skills/ and rewrote the JSON sources to the convention the fixed installer emits (same shape terminal-renderer.json already uses). The 3.0.52 runtime resolves these via repoRoot; verified the exact generated install command now succeeds end-to-end.

⚠️ Don't re-run personas:refresh until a workforce CLI with the installer fix (AgentWorkforce/workforce#226 is the runtime half; the installer skill-asset support is already in workforce main, unpublished) is released and the agentworkforce dep here is bumped — npx will keep preferring the local 3.0.52, which regenerates the broken JSONs with --overwrite.

Testing

  • npm run typecheck:node clean.
  • npm test: 120/120 pass.
  • Verified agentworkforce show linear-dispatcher --json resolves the rewritten sources and the generated skill-install cp succeeds against the repaired paths.

🤖 Generated with Claude Code

khaliqgant and others added 7 commits June 11, 2026 07:15
…team spawn → implement

Adds the complete software factory pipeline so issues marked `ready-for-agent`
flow automatically through scoping, team spawning, and implementation:

- writeRemoteFile IPC: renderer/agents can now create/update Linear issues
- ready-for-agent 4th band in Attention Inbox (expanded by default)
- spawnTeamForIssue: spawns codex-impl + claude-review pairs with task prompts
- issue-scoping module: detectRepo, suggestTeamSize, labelIssueWithRepo
- board-steward persona: proactive agent that watches and drives the pipeline

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…line

# Conflicts:
#	src/renderer/src/components/issues/AttentionInbox.tsx
Install @agentworkforce/persona-linear-dispatcher and
@agentworkforce/persona-repo-router, and append both to the
personas:refresh npm script so all personas stay current via one command.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
board-steward overlaps with the linear-dispatcher + repo-router pair —
both poll ready-for-agent Linear issues and spawn impl/review teams, so
running them together double-dispatches. Keep linear-dispatcher (board
watcher) + repo-router (per-issue routing) as the single dispatch path.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…dy; repair installed persona skill paths

A workforce CLI that dies during setup (e.g. a failed skill install
calls process.exit) previously burned the entire 120s
PERSONA_HARNESS_READY_TIMEOUT_MS before surfacing an error. The
harness-ready wait now rejects as soon as the worker is gone: it watches
agent_exit/agent_exited/agent_released events and runs a 2s listAgents
liveness poll as a catch-all for exits that slip past the event stream
(e.g. between spawn and subscription).

Also repairs the installed linear-dispatcher and repo-router personas,
which carried raw `./skills/*.md` sources from an installer version
without skill-asset support. The cp for those paths resolved against the
project root, failed, aborted every spawn, and kept the workforce skill
cache marker from being written — forcing a full prpm re-download on
each attempt. Skill files now live under __assets/<id>/skills/ with
sources rewritten to the convention the fixed installer emits
(AgentWorkforce/workforce#226). Do not re-run personas:refresh until a
CLI with that fix is published and the agentworkforce dep is bumped —
npx prefers the local 3.0.52 binary, which would regenerate the broken
JSONs.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 11, 2026

Copy link
Copy Markdown

Warning

Review limit reached

@khaliqgant, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 59 minutes and 22 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more credits in the billing tab to continue.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 50434ab2-37eb-440f-945c-b05e06ddb250

📥 Commits

Reviewing files that changed from the base of the PR and between bec8311 and f88f25a.

📒 Files selected for processing (23)
  • .agentworkforce/workforce/personas/__assets/linear-dispatcher/linear-dispatcher.md
  • .agentworkforce/workforce/personas/__assets/linear-dispatcher/skills/persona-relayfile-mount.md
  • .agentworkforce/workforce/personas/__assets/repo-router/repo-router.md
  • .agentworkforce/workforce/personas/__assets/repo-router/skills/agentworkforce-repo-map.md
  • .agentworkforce/workforce/personas/__assets/repo-router/skills/persona-relayfile-mount.md
  • .agentworkforce/workforce/personas/__assets/slack-comms/slack-comms.md
  • .agentworkforce/workforce/personas/linear-dispatcher.json
  • .agentworkforce/workforce/personas/repo-router.json
  • .agentworkforce/workforce/personas/slack-comms.json
  • package.json
  • src/main/broker.ts
  • src/main/integrations.test.ts
  • src/main/integrations.ts
  • src/main/ipc-handlers.ts
  • src/preload/index.ts
  • src/renderer/src/components/issues/AttentionInbox.tsx
  • src/renderer/src/lib/ipc-mock.ts
  • src/renderer/src/lib/issue-scoping.test.ts
  • src/renderer/src/lib/issue-scoping.ts
  • src/renderer/src/lib/spawn-agent.test.ts
  • src/renderer/src/lib/spawn-agent.ts
  • src/renderer/src/stores/issues-store.ts
  • src/shared/types/ipc.ts
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/persona-spawn-fail-fast

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the linear-dispatcher and repo-router personas, adds support for writing remote files through Relayfile, and implements a "Ready for Agent" band in the Attention Inbox UI to allow manual spawning of agent teams. It also adds a liveness poll to the broker to fail fast if a persona process exits during setup. Feedback on these changes suggests replacing the setInterval liveness poll with a recursive setTimeout pattern to avoid overlapping executions, expanding the repository keyword map to include core repos like cloud and agents, and improving error reporting in the UI when spawning an agent team fails.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread src/main/broker.ts
Comment on lines +2990 to +3003
livenessTimer = setInterval(() => {
void session.client
.listAgents()
.then((agents) => {
if (settled) return
if (!agents.some((agent) => agent.name === name)) {
finish(exitedError())
}
})
.catch(() => {
// Transient broker errors are fine — the readiness timeout still
// bounds the wait.
})
}, PERSONA_LIVENESS_POLL_MS)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using setInterval with asynchronous operations can lead to overlapping executions if a call takes longer than the interval duration. It is highly recommended to use a recursive setTimeout pattern instead to ensure that a new poll is only scheduled after the previous one has completed. Additionally, please update the type of livenessTimer to ReturnType<typeof setTimeout> and clear it using clearTimeout.

    const pollLiveness = (): void => {
      if (settled) return
      void session.client
        .listAgents()
        .then((agents) => {
          if (settled) return
          if (!agents.some((agent) => agent.name === name)) {
            finish(exitedError())
          } else {
            livenessTimer = setTimeout(pollLiveness, PERSONA_LIVENESS_POLL_MS)
          }
        })
        .catch(() => {
          if (!settled) {
            livenessTimer = setTimeout(pollLiveness, PERSONA_LIVENESS_POLL_MS)
          }
        })
    }
    livenessTimer = setTimeout(pollLiveness, PERSONA_LIVENESS_POLL_MS)

Comment on lines +1 to +5
const REPO_KEYWORDS: Record<string, string[]> = {
relay: ['relay', 'agent-relay', 'broker', 'mcp', 'workspace', 'webhook', 'mount', 'relayfile'],
pear: ['pear', 'electron', 'renderer', 'terminal', 'ui', 'inbox', 'sidebar', 'dialog', 'ipc'],
workforce: ['workforce', 'persona', 'autonomous', 'proactive', 'agent-workforce', 'skill'],
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The REPO_KEYWORDS map is missing several core repositories of the organization, such as cloud and agents. Adding keywords for these core repos will significantly improve the accuracy of the repository detection logic.

Suggested change
const REPO_KEYWORDS: Record<string, string[]> = {
relay: ['relay', 'agent-relay', 'broker', 'mcp', 'workspace', 'webhook', 'mount', 'relayfile'],
pear: ['pear', 'electron', 'renderer', 'terminal', 'ui', 'inbox', 'sidebar', 'dialog', 'ipc'],
workforce: ['workforce', 'persona', 'autonomous', 'proactive', 'agent-workforce', 'skill'],
}
const REPO_KEYWORDS: Record<string, string[]> = {
relay: ['relay', 'agent-relay', 'broker', 'mcp', 'workspace', 'webhook', 'mount', 'relayfile'],
pear: ['pear', 'electron', 'renderer', 'terminal', 'ui', 'inbox', 'sidebar', 'dialog', 'ipc'],
workforce: ['workforce', 'persona', 'autonomous', 'proactive', 'agent-workforce', 'skill'],
cloud: ['cloud', 'worker', 'api', 'nango', 'd1', 'kv', 'durable'],
agents: ['agents', 'reviewer', 'harness', 'bot']
}

Comment on lines +633 to +635
} catch {
setNavNotice('Failed to spawn team')
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Displaying a generic error message like 'Failed to spawn team' makes debugging difficult when spawning fails due to specific issues (e.g., missing project root, broker startup failure). Catching the error and displaying its actual message provides much better feedback to the user.

Suggested change
} catch {
setNavNotice('Failed to spawn team')
}
} catch (error) {
setNavNotice(error instanceof Error ? error.message : 'Failed to spawn team')
}

@agent-relay-code

Copy link
Copy Markdown
Contributor

ℹ️ pr-reviewer: review only — no file changes were applied to the PR (nothing to commit after review). The notes below are advisory and were not pushed.

Reviewed PR #223 against .workforce/pr.diff, changed files, and current checkout. I did not find a validated breakage that needed a code edit.

Addressed comments

  • No bot or reviewer comments were present in .workforce/context.json or other .workforce files, so there were no external review threads to validate or fix.

Verification

Passed:

  • npm install
  • npm run typecheck
  • npx vitest run src/main/integrations.test.ts src/renderer/src/lib/issue-scoping.test.ts src/renderer/src/lib/spawn-agent.test.ts src/main/broker.test.ts
  • npm test
  • npx vitest run
  • npm run build
  • npm run lint with warnings only, 0 errors

Note: one earlier parallel run caused lint to be killed and Vitest to time out under load; both passed when rerun normally.

@khaliqgant khaliqgant merged commit af3027e into main Jun 11, 2026
5 checks passed
@khaliqgant khaliqgant deleted the fix/persona-spawn-fail-fast branch June 11, 2026 10:25
@agent-relay-code

Copy link
Copy Markdown
Contributor

Implemented two fixes:

  • Fixed issue banding so ready-for-agent labels alone no longer move a live To do issue into the Ready for agent lane. The live Linear state now wins. Fixed in issues-store.ts, covered by issues-store.dom.test.ts.
  • Cleaned up the PR-added broker liveness timer lint warning without changing behavior. Fixed in broker.ts.

Addressed comments

  • No bot or reviewer comments were available in .workforce/context.json or as separate .workforce files, so there were no external comment threads to validate.

Advisory Notes

None.

Local verification passed:

  • npm run verify:mcp-resources-drift
  • npm run lint with existing warnings only, 0 errors
  • npm run typecheck
  • npm test
  • npx vitest run
  • npm run build
  • npm run build:web
  • npx playwright test --config playwright.fidelity.config.ts
  • npx playwright test --config playwright.redraw.config.ts

I installed the missing Playwright Chromium browser locally before rerunning the Playwright gates.

@agent-relay-code

Copy link
Copy Markdown
Contributor

Reviewed PR #223 and made one small verified fix: cleaned up the new persona liveness timer in src/main/broker.ts so it satisfies lint without changing behavior, and added regression coverage for agent_exited during persona harness readiness in src/main/broker.test.ts.

Addressed comments

  • No bot or reviewer comments were available in .workforce/context.json or the .workforce files, so there were no external review threads to validate or fix.

Advisory Notes

  • None.

Validation run:

  • npm run lint passed with existing warnings only.
  • npm run typecheck passed.
  • npx vitest run src/main/broker.test.ts passed.
  • npm run test:all passed: 120 Node tests, 400 Vitest tests.
  • npm run build passed.
  • npm run verify:mcp-resources-drift passed.

I’m not printing READY because I cannot verify remote GitHub CI status or mergeability from this harness without using gh, which the task forbids.

@agent-relay-code

Copy link
Copy Markdown
Contributor

Implemented one review fix: the persona harness liveness timer in src/main/broker.ts now uses a cleanup-safe timer holder, removing the new lint warning without risking cleanup before assignment.

Addressed comments

  • No bot/reviewer comment threads were available in .workforce/context.json or other .workforce sidecar files, so there were no external comments to validate or fix.
  • Self-review: PR introduced an ESLint prefer-const warning for the broker liveness timer; fixed in src/main/broker.ts.

Advisory Notes

  • npm run lint still reports existing warnings in unrelated files, but exits with 0 errors. I left those unchanged as out of scope.

Local validation run:

  • npm run typecheck
  • npm run test:all - 120 Node tests and 399 Vitest tests passed
  • npm run build
  • npm run lint - 0 errors, warnings only

@agent-relay-code

Copy link
Copy Markdown
Contributor

Implemented scoped fixes from review and one validated self-review issue:

Addressed comments

  • coderabbitai[bot]: review was rate-limited and raised no code finding; no code change needed.
  • gemini-code-assist[bot] top-level review: raised liveness polling, repo keyword coverage, and spawn error reporting; fixed in broker.ts, issue-scoping.ts, and AttentionInbox.tsx.
  • gemini-code-assist[bot] on src/main/broker.ts: replaced overlapping async setInterval with recursive setTimeout; fixed in broker.ts.
  • gemini-code-assist[bot] on src/renderer/src/lib/issue-scoping.ts: added cloud and agents keyword maps; fixed in issue-scoping.ts.
  • gemini-code-assist[bot] on AttentionInbox.tsx: surfaced actual spawn errors; fixed in AttentionInbox.tsx.
  • agent-relay-code[bot] earlier review comments: advisory/stale relative to current checkout; I revalidated the current code and applied the still-needed fixes above.

Advisory Notes

  • Existing lint warnings remain in unrelated files; lint exits successfully with 0 errors.
  • I did not print READY because I cannot verify remote GitHub check status or mergeability from the available tools.

Local verification passed:

  • npm run verify:mcp-resources-drift
  • npm run lint
  • npm run typecheck
  • npm test
  • npx vitest run
  • npm run build
  • npm run build:web
  • npx playwright test --config playwright.fidelity.config.ts
  • npx playwright test --config playwright.redraw.config.ts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant