Fix burn subprocess leak in BurnLedger.capture#481
Conversation
Root cause of the runaway `burn summary` processes: capture() used a terminationHandler+semaphore that could return while the child was still alive (handler race), and only sent SIGTERM on timeout. Since BurnLedger is an actor, a capture() that returns early frees the actor to spawn the next poll's subprocess — so the live tab's 1.5s poll piled up dozens of orphaned `burn summary` processes. Rewrite capture() to read stdout to EOF + waitUntilExit on a background queue, bounded by the timeout, and SIGKILL (not just SIGTERM) on timeout. This blocks the actor until the process truly exits or is killed, so the actor's serialization guarantees at most one `burn` subprocess alive at a time. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthrough
ChangesBurnLedger capture timeout refactor
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 77233d588b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| process.terminate() // SIGTERM… | ||
| usleep(200_000) | ||
| if process.isRunning { // …then SIGKILL if it ignores it | ||
| kill(process.processIdentifier, SIGKILL) |
There was a problem hiding this comment.
Kill the actual burn child on PATH timeouts
When resolveTool() falls back to .path, loginShell makes this Process be /bin/zsh -lc ..., so signaling process.processIdentifier here only targets the shell. If burn summary is the hung process, killing the shell can orphan the burn/node child while capture() returns and the actor accepts the next poll, so the subprocess pile-up still occurs for PATH fallback builds. Start the shell in its own process group and kill the group, or avoid the shell after resolving the executable.
Useful? React with 👍 / 👎.
Summary
The macOS app's live-burn tab was leaking
burn summarysubprocesses — dozens accumulated (one per ~1.5s poll, none exiting), which hammered the machine. This fix landed on the #480 branch after #480 was squash-merged, so it didn't reach main; this PR brings it in.Root cause
BurnLedgeris an actor, socapture()calls are serialized — the only wayburnprocesses pile up is ifcapture()returns while the child is still alive. The previous implementation used aterminationHandler+ semaphore (which has a race: if the process exits before the handler is wired, the wait misbehaves) and only sent SIGTERM on timeout, which a busyburncan ignore. So each poll's subprocess got orphaned and the next spawned on top.Fix
Rewrite
capture()to read stdout to EOF +waitUntilExiton a background queue, bounded by the timeout, and SIGKILL (not just SIGTERM) on timeout. It now blocks the actor until the process truly exits or is killed, so the actor's serialization guarantees at most oneburnsubprocess alive at a time — no pile-up possible.🤖 Generated with Claude Code