Skip to content

Fix factory SDK agent exit latch#238

Merged
kjgbot merged 1 commit into
mainfrom
factory-sdk/fleet-exit-latch
Jun 11, 2026
Merged

Fix factory SDK agent exit latch#238
kjgbot merged 1 commit into
mainfrom
factory-sdk/fleet-exit-latch

Conversation

@kjgbot

@kjgbot kjgbot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Summary

Replaces the factory SDK InternalFleetClient agent-exit dedup window with a latch keyed by agent name.

Root Cause

The broker emits a single agent_exit, but InternalFleetClient can observe the same logical exit through overlapping listener fan-out paths (onEvent, deliveryUpdate, and typed agentExited). The previous 5 second dedup window was not sufficient because live duplicate callbacks were observed 10 seconds or more apart.

Changes

  • Track exited agent names in a latch instead of a time-bounded map.
  • Suppress all further onAgentExit callbacks for the same name until that name is spawned or resumed again.
  • Clear the latch after successful spawn() and resume() for the returned agent name.
  • Add regression coverage for:
    • same-name exits 10 seconds apart collapsing to one callback
    • different names still emitting independently
    • spawn/resume clearing the latch so a later lifecycle exit emits again
    • typed agentExited duplicate fan-out suppression

Validation

  • npx vitest run packages/factory-sdk
  • npx tsc --noEmit -p tsconfig.node.json

@gemini-code-assist

Copy link
Copy Markdown

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@coderabbitai

coderabbitai Bot commented Jun 11, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 1cc8595c-b5fc-4806-a243-399566a9f18e

📥 Commits

Reviewing files that changed from the base of the PR and between d05b8d9 and 92c9f40.

📒 Files selected for processing (2)
  • packages/factory-sdk/src/fleet/internal-fleet-client.test.ts
  • packages/factory-sdk/src/fleet/internal-fleet-client.ts

📝 Walkthrough

Walkthrough

InternalFleetClient replaces time-window-based agent exit deduplication with a persistent latch-based approach using a set of exited agent names. The latch clears whenever an agent is spawned or resumed, allowing subsequent exit events for that agent to be re-emitted. Tests validate latch behavior across duplicate callbacks, different agent names, and lifecycle events.

Changes

Agent Exit Deduplication Refactor

Layer / File(s) Summary
Exit latch data structure and core deduplication logic
packages/factory-sdk/src/fleet/internal-fleet-client.ts
Time-window constant AGENT_EXIT_DEDUP_WINDOW_MS is removed and replaced with a private #exitedAgentNames set. Core deduplication method #rememberAgentExit now tracks exited agent names in the set, and a new #clearAgentExitLatch(name) method removes agents from tracking.
Latch clearing on agent lifecycle events
packages/factory-sdk/src/fleet/internal-fleet-client.ts
After spawning agents via spawnPty in both spawn and resume methods, the latch for that agent name is cleared, allowing future exit events to be re-emitted on re-spawn rather than suppressed.
Test coverage for latch behavior
packages/factory-sdk/src/fleet/internal-fleet-client.test.ts
Exit deduplication test is renamed to reflect "latch one agent death by name across lagged exit callbacks" semantics. Coverage expanded to validate single-latch emission across duplicate callbacks, non-suppression for different agent names, latch clearing on spawn and resume, and typed duplicate suppression until next spawn. Lifecycle-restart vs time-passing test adjusted to match revised suppression timing.

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly Related PRs

  • AgentWorkforce/pear#232: Adds agent exit/delivery event handling logic to InternalFleetClient, directly related to the deduplication refactor in this PR.

Poem

🐰 A latch replaces the ticking clock,
One death per agent name we lock.
Spawn anew, the gate swings wide,
Fresh exits now will not hide!
Time-windows fade—the latch stands true! 🎯

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Fix factory SDK agent exit latch' directly summarizes the main change: replacing agent exit deduplication with a latch mechanism.
Description check ✅ Passed The description comprehensively explains the root cause, changes, and validation steps related to the agent exit latch fix.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch factory-sdk/fleet-exit-latch

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint install failed due to a network error.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@kjgbot kjgbot merged commit 3c8fe63 into main Jun 11, 2026
5 checks passed
@kjgbot kjgbot deleted the factory-sdk/fleet-exit-latch branch June 11, 2026 21:20
@agent-relay-code

Copy link
Copy Markdown
Contributor

Review Result

I found no breakage in the current checkout and made no code changes. The PR is scoped to InternalFleetClient agent-exit latching, and the downstream factory restart/resume paths are covered by the existing and added tests.

Addressed Comments

  • gemini-code-assist[bot]: quota-limit status message only, no code finding to address.
  • coderabbitai[bot]: review-in-progress/status comment only, no actionable finding. Inline review threads list is empty.

Advisory Notes

  • The first full Playwright fidelity run hit a timeout while taking a screenshot in tests/playwright/rendering-corruption.spec.ts. The same spec passed on isolated rerun, and the full fidelity suite passed on rerun.

Local Validation

Passed:

  • npm ci
  • npm run verify:mcp-resources-drift
  • npm run lint with warnings only
  • npm run typecheck:web
  • npm run typecheck:node
  • npm test
  • npx vitest run
  • npm run build
  • npm run build:web
  • npx playwright test --config playwright.redraw.config.ts
  • npx playwright test --config playwright.fidelity.config.ts on rerun

I did not run the macOS-only dist:mac packaged smoke locally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant