Terminate factory agent processes on release#263
Conversation
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
📝 WalkthroughWalkthroughPID-aware termination: ports add process-control hooks and AgentPidResolution; InternalFleetClient resolves broker/connection/roster PIDs; new findAgentProcessByName resolves anchored agent processes; terminatePids() kills PID trees child-first with protected-PID handling; factory integrates termination into release/completion flows and tests exercise many PID-edge cases. ChangesAgent Process Termination via PID
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related issues
Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
ESLint install failed due to a network error. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
ℹ️ pr-reviewer: review only — no file changes were applied to the PR (nothing to commit after review). The notes below are advisory and were not pushed. No code changes were needed. I reviewed the PR diff and traced the factory cleanup/reaper changes through callers, types, registry writing, and tests. I did not find a validated PR-scoped defect to fix. Addressed comments
ValidationPassed locally:
I could not reproduce the macOS-only |
d116fbc to
e4d9fbf
Compare
e4d9fbf to
5c62ecd
Compare
|
Implemented one PR-scope fix: factory agent process trees are now terminated before broker sessions are released, so child PID discovery cannot be lost if Added a regression test that simulates Addressed comments
Advisory Notes
Validation run locally:
|
|
Fixed a PR-scoped type break in internal-fleet-client.ts: Addressed comments
ValidationPassed:
Not fully verified in this sandbox:
|
|
Fixed one current PR breakage: Addressed comments
Advisory Notes
ValidationPassed locally:
I did not run the macOS |
|
No code changes were needed in the current checkout. I validated the prior bot notes against the current code: the Addressed comments
Advisory Notes
Validation run locally:
|
|
Fixed one PR-scoped issue: factory agent PID trees are now terminated before releasing the broker session, so child PID discovery is not lost when release tears down broker-side state. The helper now processes agents sequentially to preserve release ordering. Fixed in factory.ts. Added a regression test that makes child PID lookup return empty after release and verifies children are still terminated. Added in factory.test.ts. Addressed comments
Advisory NotesNone. ValidationPassed locally:
I did not run the macOS-only |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
packages/factory-sdk/src/fleet/internal-fleet-client.test.ts (1)
22-25: 📐 Maintainability & Code Quality | ⚡ Quick winMake the fake spawn handle return
pidwhen configured.
FakeHarnessDriverClient.spawnPty()recordspidin roster but always omits it from the returned handle, so the directhandle.pidpath is not testable from this double.Suggested fix
async spawnPty(input: SpawnPtyInput): Promise<{ name: string; session_ref: string; pid?: number }> { this.spawned.push(input) - this.agents.push({ name: input.name, pid: this.nextPid }) - return { name: input.name, session_ref: this.nextSessionRef } + const pid = this.nextPid + this.agents.push({ name: input.name, pid }) + return { name: input.name, session_ref: this.nextSessionRef, pid } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/factory-sdk/src/fleet/internal-fleet-client.test.ts` around lines 22 - 25, FakeHarnessDriverClient.spawnPty currently records pid in agents but never includes it in the returned handle; update spawnPty to include a pid property in the returned object (e.g., return { name: input.name, session_ref: this.nextSessionRef, pid: this.nextPid }), ensuring pid can be undefined when not configured so tests can exercise the direct handle.pid path; locate the method spawnPty and add the pid field to the returned handle.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/factory-sdk/src/fleet/internal-fleet-client.ts`:
- Around line 149-151: The PID validation currently accepts any number
(agent?.pid) which allows 0, negatives or non-integers; update the checks after
this.#client.listAgents() and the equivalent roster/handle PID extraction to
require a positive integer by replacing typeof agent?.pid === 'number' with a
stricter check such as Number.isInteger(agent?.pid) && agent!.pid > 0, and apply
the same validation where PIDs are read for the termination flow so only >0
integer PIDs are passed to signaling/kill logic.
In `@packages/factory-sdk/src/orchestrator/reaper.ts`:
- Around line 124-128: The current catch around await readChildren(pid) swallows
enumeration failures and treats them as empty children which breaks
pidTreePostOrder() guarantees; instead, modify the catch to capture the thrown
error, add an entry to the existing skipped array (or call the module logger)
with the pid and error details, and preserve the error context so the caller
knows child discovery failed for that root; do the same for the other block (the
catch around readChildPids()/lines 141-147), referencing readChildren(pid),
readChildPids(), children, pidTreePostOrder(), and skipped so failures are
surfaced rather than downgraded to "no children".
- Around line 75-88: The catch block currently treats any error as a skipped
PID, but if the error is an ESRCH (no such process) that can occur after the
liveness probe and should be treated as already-exited; change the catch handler
in the reaper logic to detect error.code === 'ESRCH' (or equivalent on the
thrown Error) and, in that case, record the PID as terminated (push into
terminated with the same signals array) instead of pushing to skipped; for other
errors keep the existing skipped behavior and preserve the existing reason
string logic. Use the existing local symbols kill, signals, termGraceMs,
terminated and skipped to implement the change.
---
Nitpick comments:
In `@packages/factory-sdk/src/fleet/internal-fleet-client.test.ts`:
- Around line 22-25: FakeHarnessDriverClient.spawnPty currently records pid in
agents but never includes it in the returned handle; update spawnPty to include
a pid property in the returned object (e.g., return { name: input.name,
session_ref: this.nextSessionRef, pid: this.nextPid }), ensuring pid can be
undefined when not configured so tests can exercise the direct handle.pid path;
locate the method spawnPty and add the pid field to the returned handle.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 8c29c23c-ffb5-4eb2-9791-c92f7597b4d7
📒 Files selected for processing (8)
packages/factory-sdk/src/fleet/internal-fleet-client.test.tspackages/factory-sdk/src/fleet/internal-fleet-client.tspackages/factory-sdk/src/orchestrator/factory.test.tspackages/factory-sdk/src/orchestrator/factory.tspackages/factory-sdk/src/orchestrator/reaper.test.tspackages/factory-sdk/src/orchestrator/reaper.tspackages/factory-sdk/src/ports/fleet.tspackages/factory-sdk/src/types.ts
| const agent = (await this.#client.listAgents()).find((candidate) => candidate.name === name) | ||
| if (typeof agent?.pid === 'number') { | ||
| return agent.pid |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟠 Major | ⚡ Quick win
Harden PID ingress validation for roster/handle sources.
Line 150 and Line 425 currently accept any number, unlike Line 170 which requires a positive integer. This can propagate invalid PIDs (e.g., 0, negative, non-integer) into termination flow and risk signaling the wrong target.
Suggested fix
async resolveAgentPid(name: string): Promise<number | undefined> {
try {
for (let attempt = 1; attempt <= PID_RESOLVE_ATTEMPTS; attempt += 1) {
const agent = (await this.#client.listAgents()).find((candidate) => candidate.name === name)
- if (typeof agent?.pid === 'number') {
+ if (Number.isInteger(agent?.pid) && (agent?.pid ?? 0) > 0) {
return agent.pid
}
if (attempt < PID_RESOLVE_ATTEMPTS) {
await sleep(PID_RESOLVE_BACKOFF_MS)
}
@@
function spawnResultFrom(handle: SpawnedHandleLike, resolvedPid = handle.pid): SpawnResult {
const result: SpawnResult = { name: handle.name }
const sessionRef = sessionRefFrom(handle)
if (sessionRef) result.sessionRef = sessionRef
- if (typeof resolvedPid === 'number') result.pid = resolvedPid
+ if (Number.isInteger(resolvedPid) && resolvedPid > 0) result.pid = resolvedPid
return result
}Also applies to: 421-426
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/factory-sdk/src/fleet/internal-fleet-client.ts` around lines 149 -
151, The PID validation currently accepts any number (agent?.pid) which allows
0, negatives or non-integers; update the checks after this.#client.listAgents()
and the equivalent roster/handle PID extraction to require a positive integer by
replacing typeof agent?.pid === 'number' with a stricter check such as
Number.isInteger(agent?.pid) && agent!.pid > 0, and apply the same validation
where PIDs are read for the termination flow so only >0 integer PIDs are passed
to signaling/kill logic.
| try { | ||
| kill(pid, 'SIGTERM') | ||
| signals.push('SIGTERM') | ||
| if (termGraceMs > 0) { | ||
| await sleep(termGraceMs) | ||
| } | ||
| if (isPidLive(pid, kill)) { | ||
| kill(pid, 'SIGKILL') | ||
| signals.push('SIGKILL') | ||
| } | ||
| terminated.push({ pid, signals }) | ||
| } catch (error) { | ||
| skipped.push({ pid, reason: error instanceof Error ? error.message : String(error) }) | ||
| } |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win
Treat post-probe ESRCH as an already-exited process, not a failed termination.
A PID can disappear between the liveness probe and the subsequent SIGTERM/SIGKILL. In that race, this catch block records the PID as skipped, even though the process is already gone and may have exited cleanly after SIGTERM. That will undercount real terminations and create false "missing PID" noise for the factory/reaper callers.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/factory-sdk/src/orchestrator/reaper.ts` around lines 75 - 88, The
catch block currently treats any error as a skipped PID, but if the error is an
ESRCH (no such process) that can occur after the liveness probe and should be
treated as already-exited; change the catch handler in the reaper logic to
detect error.code === 'ESRCH' (or equivalent on the thrown Error) and, in that
case, record the PID as terminated (push into terminated with the same signals
array) instead of pushing to skipped; for other errors keep the existing skipped
behavior and preserve the existing reason string logic. Use the existing local
symbols kill, signals, termGraceMs, terminated and skipped to implement the
change.
| try { | ||
| children = await readChildren(pid) | ||
| } catch { | ||
| children = [] | ||
| } |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟠 Major | 🏗️ Heavy lift
Don't silently downgrade tree termination to root-only termination when child discovery fails.
Both pidTreePostOrder() and the default readChildPids() implementation turn child-enumeration failures into []. If pgrep is unavailable or the injected reader fails for a live root, the helper loses its child-before-parent guarantee and kills only the root, which can leave descendants running after release/reap. Please surface that failure in skipped or logs instead of treating it as "no children".
Also applies to: 141-147
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/factory-sdk/src/orchestrator/reaper.ts` around lines 124 - 128, The
current catch around await readChildren(pid) swallows enumeration failures and
treats them as empty children which breaks pidTreePostOrder() guarantees;
instead, modify the catch to capture the thrown error, add an entry to the
existing skipped array (or call the module logger) with the pid and error
details, and preserve the error context so the caller knows child discovery
failed for that root; do the same for the other block (the catch around
readChildPids()/lines 141-147), referencing readChildren(pid), readChildPids(),
children, pidTreePostOrder(), and skipped so failures are surfaced rather than
downgraded to "no children".
10b650b to
1a76c1e
Compare
1a76c1e to
e1ede3f
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/factory-sdk/src/orchestrator/process-identity.ts`:
- Around line 10-26: Add an 'unresolved' variant to the AgentProcessLookupResult
union and wire it through the scan path so failures to run pgrep/ps surface as {
status: 'unresolved' } instead of collapsing into 'missing'; specifically,
update the AgentProcessLookupResult type to include | { status: 'unresolved';
error?: Error | string } and modify the scan helpers (the function
findAgentProcessByName and any callers that use
AgentProcessLookupOptions.listPidsByCommand, readProcessIdentity, readParentPid)
to catch execution errors and return status:'unresolved' (propagating the error
message where helpful) so downstream logic in factory.ts can distinguish real
missing processes from scan failures.
In `@packages/factory-sdk/src/orchestrator/reaper.ts`:
- Around line 334-339: Replace the loose substring check on the process command
line with the same anchored --agent-name matcher used by findAgentProcessByName:
in the block around readProcessIdentity usage (the identity check that currently
does cmdline.includes(agent.name)) and the similar check at lines ~358-362,
validate identity.cmdline against the anchored `--agent-name` pattern (reuse the
exact matcher/regex or helper used by findAgentProcessByName) so only exact
agent-name tokens (not substrings) are accepted before pushing to skipped or
returning the identity.
- Around line 286-297: The loop treats agent.pids as "handled" via
agentHasPersistedPid() but never materializes ProcessInfo entries, causing
durable entries with only pids to be skipped; update the logic in the agent
iteration (the loop over registry.agents and the use of agentHasPersistedPid,
resolveRegistryAgentProcess, and processes Map) to also materialize process
records for any positive integers found in agent.pids (e.g., create
ProcessInfo-like entries or map the pid into processes with minimal metadata) or
else only consider agentHasPersistedPid true when a corresponding processes Map
entry exists; apply the same fix to the analogous code at the later block (lines
~302-304) that checks agentHasPersistedPid.
- Around line 312-333: When resolving an agent by name, prefer the broker's
answer (opts.fleet.resolveAgentPid) over a stale/ambiguous local scan: after
calling scanRegistryAgentProcess(agent, opts, protectedPids) do not immediately
convert scan ambiguity/identity-mismatch into terminal skips if the broker
returns a resolution; instead call opts.fleet.resolveAgentPid(agent.name)
whenever scan did not return status 'found', and if resolution exists and
resolution.status === 'found' return the resolved process, if resolution exists
and resolution.status !== 'found' push a skipped entry that preserves
resolution.status (e.g. 'unresolved') rather than rewriting it to "pid missing";
only when no resolution exists fall back to the original scan-based skipped
reasons (use scanned.status === 'ambiguous' / 'identity-mismatch' to push the
same skip reasons).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: c20ceb4f-dc6f-448c-98d4-65939ff71871
📒 Files selected for processing (12)
packages/factory-sdk/src/cli/fleet.tspackages/factory-sdk/src/fleet/internal-fleet-client.test.tspackages/factory-sdk/src/fleet/internal-fleet-client.tspackages/factory-sdk/src/orchestrator/factory.test.tspackages/factory-sdk/src/orchestrator/factory.tspackages/factory-sdk/src/orchestrator/process-identity.test.tspackages/factory-sdk/src/orchestrator/process-identity.tspackages/factory-sdk/src/orchestrator/reaper.test.tspackages/factory-sdk/src/orchestrator/reaper.tspackages/factory-sdk/src/ports/fleet.tspackages/factory-sdk/src/ports/index.tspackages/factory-sdk/src/types.ts
🚧 Files skipped from review as they are similar to previous changes (3)
- packages/factory-sdk/src/fleet/internal-fleet-client.test.ts
- packages/factory-sdk/src/orchestrator/factory.ts
- packages/factory-sdk/src/orchestrator/factory.test.ts
| export type AgentProcessLookupResult = | ||
| | { status: 'found'; identity: ProcessIdentity } | ||
| | { status: 'missing' } | ||
| | { status: 'ambiguous' } | ||
|
|
||
| export interface AgentProcessFinderOptions { | ||
| protectedPids?: number[] | ||
| } | ||
|
|
||
| export type AgentProcessFinder = (agentName: string, opts?: AgentProcessFinderOptions) => Promise<AgentProcessLookupResult> | ||
|
|
||
| export interface AgentProcessLookupOptions { | ||
| listPidsByCommand?: (pattern: string) => Promise<number[]> | ||
| readProcessIdentity?: (pid: number) => Promise<ProcessIdentity | undefined> | ||
| readParentPid?: (pid: number) => Promise<number | undefined> | ||
| protectedPids?: number[] | ||
| } |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟠 Major | 🏗️ Heavy lift
Add an unresolved result for process-scan failures.
This union leaves findAgentProcessByName() no way to surface pgrep/ps execution failures, so the helper stack collapses those cases into { status: 'missing' }. Downstream, packages/factory-sdk/src/orchestrator/factory.ts treats scan.status === 'missing' plus broker unresolved as "already exited", which can silently skip termination for a still-live agent when the local scan path is broken.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/factory-sdk/src/orchestrator/process-identity.ts` around lines 10 -
26, Add an 'unresolved' variant to the AgentProcessLookupResult union and wire
it through the scan path so failures to run pgrep/ps surface as { status:
'unresolved' } instead of collapsing into 'missing'; specifically, update the
AgentProcessLookupResult type to include | { status: 'unresolved'; error?: Error
| string } and modify the scan helpers (the function findAgentProcessByName and
any callers that use AgentProcessLookupOptions.listPidsByCommand,
readProcessIdentity, readParentPid) to catch execution errors and return
status:'unresolved' (propagating the error message where helpful) so downstream
logic in factory.ts can distinguish real missing processes from scan failures.
| for (const agent of registry.agents ?? []) { | ||
| for (const processInfo of agent.processes ?? []) { | ||
| if (Number.isInteger(processInfo.pid) && processInfo.pid > 0) { | ||
| processes.set(processInfo.pid, processInfo) | ||
| } | ||
| } | ||
| if (!agentHasPersistedPid(agent)) { | ||
| const resolved = await resolveRegistryAgentProcess(agent, opts, skipped, protectedPids) | ||
| if (resolved) { | ||
| processes.set(resolved.pid, resolved) | ||
| } | ||
| } |
There was a problem hiding this comment.
🗄️ Data Integrity & Integration | 🟠 Major | 🏗️ Heavy lift
Don't treat agent.pids as handled unless you also materialize a process record.
registryProcesses() only queues agent.processes, but agentHasPersistedPid() returns true for a positive PID in agent.pids too. A durable registry entry with pids populated and processes absent will therefore skip both termination and name re-resolution, leaking a live orphan.
Also applies to: 302-304
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/factory-sdk/src/orchestrator/reaper.ts` around lines 286 - 297, The
loop treats agent.pids as "handled" via agentHasPersistedPid() but never
materializes ProcessInfo entries, causing durable entries with only pids to be
skipped; update the logic in the agent iteration (the loop over registry.agents
and the use of agentHasPersistedPid, resolveRegistryAgentProcess, and processes
Map) to also materialize process records for any positive integers found in
agent.pids (e.g., create ProcessInfo-like entries or map the pid into processes
with minimal metadata) or else only consider agentHasPersistedPid true when a
corresponding processes Map entry exists; apply the same fix to the analogous
code at the later block (lines ~302-304) that checks agentHasPersistedPid.
| const scanned = await scanRegistryAgentProcess(agent, opts, protectedPids) | ||
| if (scanned.status === 'found') { | ||
| return scanned.process | ||
| } | ||
| if (scanned.status === 'ambiguous') { | ||
| skipped.push({ reason: `pid ambiguous for ${agent.name}` }) | ||
| return undefined | ||
| } | ||
| if (scanned.status === 'identity-mismatch') { | ||
| skipped.push({ pid: scanned.pid, reason: 'pid identity mismatch' }) | ||
| return undefined | ||
| } | ||
|
|
||
| const resolution = await opts.fleet?.resolveAgentPid?.(agent.name) | ||
| if (!resolution) { | ||
| skipped.push({ reason: `pid missing for ${agent.name}` }) | ||
| return undefined | ||
| } | ||
| if (resolution.status !== 'found') { | ||
| skipped.push({ reason: resolution.status === 'unresolved' ? `pid missing for ${agent.name}` : `pid ${resolution.status} for ${agent.name}` }) | ||
| return undefined | ||
| } |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
Keep broker liveness authoritative in the name-only resolution path.
This method trusts scanRegistryAgentProcess() before fleet.resolveAgentPid(), returns early on scan ambiguity/mismatch, and then rewrites broker unresolved to pid missing. That inverts the contract described in the PR: a stale or ambiguous local scan can block the broker's answer, and a present-without-PID agent is silently downgraded instead of being reported as unresolved.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/factory-sdk/src/orchestrator/reaper.ts` around lines 312 - 333, When
resolving an agent by name, prefer the broker's answer
(opts.fleet.resolveAgentPid) over a stale/ambiguous local scan: after calling
scanRegistryAgentProcess(agent, opts, protectedPids) do not immediately convert
scan ambiguity/identity-mismatch into terminal skips if the broker returns a
resolution; instead call opts.fleet.resolveAgentPid(agent.name) whenever scan
did not return status 'found', and if resolution exists and resolution.status
=== 'found' return the resolved process, if resolution exists and
resolution.status !== 'found' push a skipped entry that preserves
resolution.status (e.g. 'unresolved') rather than rewriting it to "pid missing";
only when no resolution exists fall back to the original scan-based skipped
reasons (use scanned.status === 'ambiguous' / 'identity-mismatch' to push the
same skip reasons).
| const identity = await (opts.readProcessIdentity ?? readProcessIdentity)(resolution.pid) | ||
| if (!identity || !identity.cmdline.includes(agent.name)) { | ||
| skipped.push({ pid: resolution.pid, reason: 'pid identity mismatch' }) | ||
| return undefined | ||
| } | ||
| return { ...identity, agentName: agent.name } |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟠 Major | ⚡ Quick win
Use anchored agent-name validation here, not substring matching.
cmdline.includes(agent.name) accepts ar-1-impl inside --agent-name ar-1-impl-extra, so both the broker fallback and an injected processFinder can bless the wrong PID for termination. Reuse the same anchored --agent-name matcher that findAgentProcessByName() already applies.
Also applies to: 358-362
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/factory-sdk/src/orchestrator/reaper.ts` around lines 334 - 339,
Replace the loose substring check on the process command line with the same
anchored --agent-name matcher used by findAgentProcessByName: in the block
around readProcessIdentity usage (the identity check that currently does
cmdline.includes(agent.name)) and the similar check at lines ~358-362, validate
identity.cmdline against the anchored `--agent-name` pattern (reuse the exact
matcher/regex or helper used by findAgentProcessByName) so only exact agent-name
tokens (not substrings) are accepted before pushing to skipped or returning the
identity.
Summary
listAgents()liveness as authoritative: present+pid means terminate/persist, present-without-pid is a loud unresolved error, absent means already exited and not an orphan.factory reap-orphansresolve name-only registry entries by agent name via the fleet before terminating, closing the spawn-registration lag crash window.--agent-name <name>) in both graceful termination and D9 reaping; broker worker PIDs are supplementary liveness data, not tree roots.stop()rollback and issue completion.Verification
npx vitest run packages/factory-sdk-> 266 passednpx tsc --noEmit -p tsconfig.node.json-> exit 0V0 coverage
resolveAgentPid(name)backed by brokerlistAgents().found,missing(absent/exited), andunresolved(present but no pid)./api/spawnedregistration lags.--agent-name <name>matching rejects loose substrings, excludes protected PIDs, fails closed on zero/disjoint multiple matches, and allows multiple matches only when they form one coherent agent tree.SpawnResult.pids[].unresolvedplus ps-found agent shell reaps; brokerunresolvedplus ps-missing skips; foreign ps identity is rejected.terminatePids; a hermetic child-tree test proves a name-only entry resolving to a protected broker PID is skipped instead of signaled or child-walked.unresolvedplus ps-found agent shell terminates with noagentTerminateMissingPid; brokerunresolvedplus ps-missing is process-less and does not over-count.agentTerminateMissingPid.issue-donereleases and terminates tracked pair process trees, including child PIDs discovered at kill time.Notes
SpawnResult.node_modulesin the worktree is an untracked symlink only.