Skip to content

Add factory orphan reaper#253

Merged
kjgbot merged 1 commit into
mainfrom
factory-sdk/reaper
Jun 12, 2026
Merged

Add factory orphan reaper#253
kjgbot merged 1 commit into
mainfrom
factory-sdk/reaper

Conversation

@kjgbot

@kjgbot kjgbot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds D9 factory orphan reaping for the unattended overnight safety path:

  • factory loop writes a durable in-flight registry sidecar alongside the D6 heartbeat
  • registry includes factory-dispatched agent names, roles, session refs, PIDs, and PID identity signatures (agentName, cmdline, start time)
  • external reapFactoryOrphansOnce only reaps when the heartbeat is stale and only for registry-confirmed PID identities
  • PID reuse guard re-reads current process identity before signaling and fails closed on mismatch/unreadable identity
  • reaper sends SIGTERM, waits a grace period, then SIGKILLs only if the PID is still live
  • fleet factory reap-orphans exposes the external watchdog path

Safety

  • No broad process sweep.
  • Foreign/non-registry PIDs are untouched.
  • Recycled PID with foreign cmdline/start-time is skipped.
  • Fresh heartbeat produces no reap.
  • Clean stop/issue completion clears the registry so stale clean-stop files cannot reap old PIDs.

Validation

  • npx vitest run packages/factory-sdk -> 224 passed (19 files)
  • npx tsc --noEmit -p tsconfig.node.json -> exit 0

No live writes, live dispatches, or live process kills were run.

@gemini-code-assist

Copy link
Copy Markdown

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@coderabbitai

coderabbitai Bot commented Jun 12, 2026

Copy link
Copy Markdown

Review Change Stack

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Walkthrough

This PR adds factory orphan reaping: a mechanism to detect stale factory loop instances via heartbeat staleness checks, persist a registry of in-flight agents with process identity metadata, verify process liveness and identity at reap time to prevent false positives from PID recycling, and terminate confirmed orphaned processes via SIGTERM/SIGKILL. It extends the fleet client to carry PIDs, injects process identity introspection, persists registry alongside heartbeat, and exposes the reaper as a CLI command and periodic scheduler.

Changes

Factory Orphan Reaping with In-Flight Process Registry

Layer / File(s) Summary
Type contracts and configuration
src/types.ts, src/config/schema.ts, src/config/schema.test.ts
New ProcessIdentity type, FactoryPorts.processIdentityReader hook, FactoryLoopRunOptions.registryPath, FactoryLoopHeartbeat.registryPath, and FactoryInFlightRegistry* types for agent and process metadata tracking. Config schema adds required non-empty registryPath with default /tmp/factory-run/factory-loop-registry.json.
Process identity introspection
src/orchestrator/process-identity.ts
New readProcessIdentity(pid) function shells out to ps, extracts lstart and command fields, validates output, and returns { pid, startTime, cmdline } or undefined on read/parse failures or missing processes.
Fleet client PID support
src/ports/fleet.ts, src/fleet/internal-fleet-client.ts
SpawnResult now optionally includes pid?: number and pids?: number[]. SpawnedHandleLike gains optional pid field. New spawnResultFrom helper unifies spawn/resume result construction with conditional PID extraction.
Factory loop in-flight registry
src/orchestrator/factory.ts
FactoryLoop maintains persistent in-flight registry alongside heartbeat: reads/writes registry path from config, gathers process identity via injected processIdentityReader, persists registry after each heartbeat write, conditionally writes on spawn/resume and issue completion when PIDs present, and clears registry on shutdown. New helpers recordHasPids, pidsFromSpawnResult, and #hasInFlightPids normalize PID extraction.
Reaper module and orphan detection
src/orchestrator/reaper.ts
New reapFactoryOrphansOnce(opts) reads heartbeat and registry, determines staleness, iterates PIDs, skips non-live or identity-mismatched processes, sends SIGTERM then grace-delay SIGKILL, and reports reaped/skipped outcomes. FactoryReaper scheduler manages periodic runs via start()/stop(). Helpers: registryProcesses, isPidLive, processIdentityMatches.
Reaper test suite
src/orchestrator/reaper.test.ts
Comprehensive tests: stale heartbeat triggers TERM→KILL sequence, PID recycling protection skips mismatched identity, fresh heartbeat produces no-op report, timer cleanup verifies no pending timers after stop.
Factory loop registry tests
src/orchestrator/factory.test.ts
Integration tests: PidFleetClient allocates deterministic PIDs, registry persists agent/process identity entries, stop() clears persisted entries.
CLI command for reap-orphans
src/cli/fleet.ts, src/cli/fleet.test.ts
New factory reap-orphans action: parsed by parseFactoryCommand, executed by runFactoryCommand invoking reapFactoryOrphansOnce with loop heartbeat/registry/stale paths. Tests verify parsing and fresh-heartbeat no-op output.
Public API exports
src/index.ts, src/orchestrator/index.ts
Expose DEFAULT_FACTORY_LOOP_REGISTRY_PATH, FactoryReaper, readFactoryInFlightRegistry, reapFactoryOrphansOnce, and in-flight registry types through orchestrator and main package indexes.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

  • AgentWorkforce/pear#229: Scaffolds SDK configuration schema infrastructure (FactoryConfigSchema) and public exports that this PR extends with registryPath and reaper symbols.

Poem

🐰 The rabbit hops through stale heartbeats fast,
Reaps orphaned processes from the past,
With PIDs tracked and identities true,
No recycled ghosts shall slip through! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The PR title 'Add factory orphan reaper' directly and concisely describes the main change: introducing an orphan reaper feature for the factory system.
Description check ✅ Passed The PR description comprehensively explains the orphan reaper feature, implementation details, safety measures, and validation results, clearly relating to the changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch factory-sdk/reaper

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint install failed due to a network error.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@kjgbot kjgbot merged commit a85a09c into main Jun 12, 2026
4 of 5 checks passed
@kjgbot kjgbot deleted the factory-sdk/reaper branch June 12, 2026 15:49
@agent-relay-code

Copy link
Copy Markdown
Contributor

Implemented one PR-scope fix: FactoryLoop.runLoop() now keeps the active heartbeat/registry override paths through spawn-time and stop-time registry writes, so the heartbeat does not point the reaper at a stale or empty registry. Fix is in factory.ts and factory.ts. Added a regression in factory.test.ts.

Addressed comments

  • gemini-code-assist[bot]: quota-limit warning only; no actionable review finding to validate or fix.
  • coderabbitai[bot]: walkthrough/commentary plus “review failed: Pull request was closed or merged during review”; no actionable code finding.
  • coderabbitai[bot]: docstring coverage warning; left unchanged because it is not a demonstrated repo CI failure and adding broad docstrings would be style churn outside the narrow bug fix.
  • coderabbitai[bot]: ESLint tool install/network failure; no code change needed. Locally ran npm run lint successfully.

Advisory Notes

  • CodeRabbit’s docstring coverage warning may reflect an external bot policy, but this repo’s canonical local checks do not enforce it.

Local verification

  • npm ci
  • npx vitest run packages/factory-sdk/src/orchestrator/reaper.test.ts packages/factory-sdk/src/orchestrator/factory.test.ts packages/factory-sdk/src/cli/fleet.test.ts packages/factory-sdk/src/config/schema.test.ts
  • npm run verify:mcp-resources-drift
  • npm run lint
  • npm run typecheck:web
  • npm run typecheck:node
  • npm test
  • npx vitest run
  • npm run build
  • npm run build:web
  • npx playwright install --with-deps chromium
  • npx playwright test --config playwright.fidelity.config.ts
  • npx playwright test --config playwright.redraw.config.ts

I did not print READY because I could not independently confirm GitHub mergeability from the available tools, and the PR discussion now reports CodeRabbit saw the PR as closed or merged during review.

@agent-relay-code

Copy link
Copy Markdown
Contributor

Implemented one PR-scope fix: FactoryLoop.runLoop() now keeps the active heartbeat/registry override paths through spawn-time and stop-time registry writes, so the heartbeat does not point the reaper at a stale or empty registry. Fix is in factory.ts and factory.ts. Added a regression in factory.test.ts.

Addressed comments

  • gemini-code-assist[bot]: quota-limit warning only; no actionable review finding to validate or fix.
  • coderabbitai[bot]: walkthrough/commentary plus “review failed: Pull request was closed or merged during review”; no actionable code finding.
  • coderabbitai[bot]: docstring coverage warning; left unchanged because it is not a demonstrated repo CI failure and adding broad docstrings would be style churn outside the narrow bug fix.
  • coderabbitai[bot]: ESLint tool install/network failure; no code change needed. Locally ran npm run lint successfully.

Advisory Notes

  • CodeRabbit’s docstring coverage warning may reflect an external bot policy, but this repo’s canonical local checks do not enforce it.

Local verification

  • npm ci
  • npx vitest run packages/factory-sdk/src/orchestrator/reaper.test.ts packages/factory-sdk/src/orchestrator/factory.test.ts packages/factory-sdk/src/cli/fleet.test.ts packages/factory-sdk/src/config/schema.test.ts
  • npm run verify:mcp-resources-drift
  • npm run lint
  • npm run typecheck:web
  • npm run typecheck:node
  • npm test
  • npx vitest run
  • npm run build
  • npm run build:web
  • npx playwright install --with-deps chromium
  • npx playwright test --config playwright.fidelity.config.ts
  • npx playwright test --config playwright.redraw.config.ts

I did not print READY because I could not independently confirm GitHub mergeability from the available tools, and the PR discussion now reports CodeRabbit saw the PR as closed or merged during review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant