Skip to content

fix: headed browser auto-shuts down within 1-2 minutes#1020

Closed
rocke2020 wants to merge 1 commit intogarrytan:mainfrom
rocke2020:fix/headed-browser-watchdog-shutdown
Closed

fix: headed browser auto-shuts down within 1-2 minutes#1020
rocke2020 wants to merge 1 commit intogarrytan:mainfrom
rocke2020:fix/headed-browser-watchdog-shutdown

Conversation

@rocke2020
Copy link
Copy Markdown

Summary

  • The connect command (headed browser launch) sets BROWSE_PARENT_PID to the CLI process PID, then immediately exits (process.exit(0) at cli.ts:905)
  • The server's parent-process watchdog (server.ts:760-769) polls every 15s, detects the dead parent, and calls shutdown() — killing the browser within 15-75 seconds
  • This contradicts the server's own design: the idle timer already exempts headed mode with the comment "Headed mode: the user is looking at the browser. Never auto-die." (server.ts:744-745)
  • Fix: always set BROWSE_PARENT_PID=0 in headed mode serverEnv, disabling the watchdog. The previous code only passed through PID=0 when pair-agent set it, missing the default connect case

Test plan

  • Run /open-gstack-browser and verify the browser stays alive for more than 2 minutes without any interaction
  • Verify $B disconnect still works (explicit user-initiated shutdown)
  • Verify pair-agent flow still works (already set BROWSE_PARENT_PID=0)
  • Verify headless mode still auto-shuts down when parent exits (watchdog unaffected for non-headed)

🤖 Generated with Claude Code

The `connect` command spawns the server with BROWSE_PARENT_PID set to the
CLI process PID, then immediately exits (process.exit(0)). The server's
parent-process watchdog polls every 15s and detects the dead parent,
triggering shutdown within 15-75 seconds of launch.

This contradicts the server's own design intent — the idle timer already
exempts headed mode with the comment "Never auto-die. Only shut down when
the user explicitly disconnects."

Fix: always set BROWSE_PARENT_PID=0 in headed mode serverEnv, which
disables the watchdog. The previous code only passed through PID=0 when
pair-agent set it, missing the default connect case.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@garrytan
Copy link
Copy Markdown
Owner

Closing — this fix is already live on main (landed via PR #847 in v0.15.13.0). The browse server now skips idle timeout in headed mode and only activates the watchdog when BROWSE_PARENT_PID > 0. Thank you for the contribution!

@garrytan garrytan closed this Apr 16, 2026
garrytan added a commit that referenced this pull request Apr 16, 2026
The parent-process watchdog in server.ts polls the spawning CLI's PID
every 15s and self-terminates if it is gone. The connect command in
cli.ts exits with process.exit(0) immediately after launching the server,
so the watchdog would reliably kill the headed browser within ~15s.

This contradicted the idle timer's own design: server.ts:745 explicitly
skips headed mode because "the user is looking at the browser. Never
auto-die." The watchdog had no such exemption.

Two-layer fix:
1. CLI layer: connect handler always sets BROWSE_PARENT_PID=0 (was only
   pass-through for pair-agent subprocesses). The user owns the headed
   browser lifecycle; cleanup happens via browser disconnect event or
   $B disconnect.
2. CLI layer: startServer() honors caller's BROWSE_PARENT_PID=0 in the
   headless spawn path too. Lets CI, non-interactive shells, and Claude
   Code Bash calls opt into persistent servers across short-lived CLI
   invocations.
3. Server layer: defense-in-depth. Watchdog now also skips when
   BROWSE_HEADED=1, so even if a future launcher forgets PID=0, headed
   browsers won't die. Adds log lines when the watchdog is disabled
   so lifecycle debugging is easier.

Four community contributors diagnosed variants of this bug independently.
Thanks for the clear analyses and reproductions.

Closes #1020 (rocke2020)
Closes #1018 (sanghyuk-seo-nexcube)
Closes #1012 (rodbland2021)
Closes #986 (jbetala7)
Closes #1006
Closes #943

Co-Authored-By: rocke2020 <noreply@github.com>
Co-Authored-By: sanghyuk-seo-nexcube <noreply@github.com>
Co-Authored-By: rodbland2021 <noreply@github.com>
Co-Authored-By: jbetala7 <noreply@github.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
garrytan added a commit that referenced this pull request Apr 16, 2026
…1025)

* fix: headed browser no longer auto-shuts down after 15 seconds

The parent-process watchdog in server.ts polls the spawning CLI's PID
every 15s and self-terminates if it is gone. The connect command in
cli.ts exits with process.exit(0) immediately after launching the server,
so the watchdog would reliably kill the headed browser within ~15s.

This contradicted the idle timer's own design: server.ts:745 explicitly
skips headed mode because "the user is looking at the browser. Never
auto-die." The watchdog had no such exemption.

Two-layer fix:
1. CLI layer: connect handler always sets BROWSE_PARENT_PID=0 (was only
   pass-through for pair-agent subprocesses). The user owns the headed
   browser lifecycle; cleanup happens via browser disconnect event or
   $B disconnect.
2. CLI layer: startServer() honors caller's BROWSE_PARENT_PID=0 in the
   headless spawn path too. Lets CI, non-interactive shells, and Claude
   Code Bash calls opt into persistent servers across short-lived CLI
   invocations.
3. Server layer: defense-in-depth. Watchdog now also skips when
   BROWSE_HEADED=1, so even if a future launcher forgets PID=0, headed
   browsers won't die. Adds log lines when the watchdog is disabled
   so lifecycle debugging is easier.

Four community contributors diagnosed variants of this bug independently.
Thanks for the clear analyses and reproductions.

Closes #1020 (rocke2020)
Closes #1018 (sanghyuk-seo-nexcube)
Closes #1012 (rodbland2021)
Closes #986 (jbetala7)
Closes #1006
Closes #943

Co-Authored-By: rocke2020 <noreply@github.com>
Co-Authored-By: sanghyuk-seo-nexcube <noreply@github.com>
Co-Authored-By: rodbland2021 <noreply@github.com>
Co-Authored-By: jbetala7 <noreply@github.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: disconnect handler runs full cleanup before exiting

When the user closed the headed browser window, the disconnect handler
in browser-manager.ts called process.exit(2) directly, bypassing the
server's shutdown() function entirely. That meant:

- sidebar-agent daemon kept polling a dead server
- session state wasn't saved
- Chromium profile locks (SingletonLock, SingletonSocket, SingletonCookie)
  weren't cleaned — causing "profile in use" errors on next $B connect
- state file at .gstack/browse.json was left stale

Now the disconnect handler calls onDisconnect(), which server.ts wires
up to shutdown(2). Full cleanup runs first, then the process exits with
code 2 — preserving the existing semantic that distinguishes user-close
(exit 2) from crashes (exit 1).

shutdown() now accepts an optional exitCode parameter (default 0) so
the SIGTERM/SIGINT paths and the disconnect path can share cleanup code
while preserving their distinct exit codes.

Surfaced by Codex during /plan-eng-review of the watchdog fix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: pre-existing test flakiness in relink.test.ts

The 23 tests in this file all shell out to gstack-config + gstack-relink
(bash scripts doing subprocess work). Under parallel bun test load, those
subprocess spawns contend with other test suites and each test can drift
~200ms past Bun's 5s default timeout, causing 5+ flaky timeouts per run
in the gate-tier ship gate.

Wrap the `test` import to default the per-test timeout to 15s. Explicit
per-test timeouts (third arg) still win, so individual tests can lower
it if needed. No behavior change — only gives subprocess-heavy tests
more headroom under parallel load.

Noticed by /ship pre-flight test run. Unrelated to the main PR fix but
blocking the gate, so fixing as a separate commit per the test ownership
protocol.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: SIGTERM/SIGINT shutdown exit code regression

Node's signal listeners receive the signal name ('SIGTERM' / 'SIGINT')
as the first argument. When shutdown() started accepting an optional
exitCode parameter in the prior disconnect-cleanup commit, the bare
`process.on('SIGTERM', shutdown)` registration started silently calling
shutdown('SIGTERM'). The string passed through to process.exit(), Node
coerced it to NaN, and the process exited with code 1 instead of 0.

Wrap both listeners so they call shutdown() with no args — signal name
never leaks into the exitCode slot. Surfaced by /ship's adversarial
subagent.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: onDisconnect async rejection leaves process running

The disconnect handler calls this.onDisconnect() without awaiting it,
but server.ts wires the callback to shutdown(2) — which is async. If
that promise rejects, the rejection drops on the floor as an unhandled
rejection, the browser is already disconnected, and the server keeps
running indefinitely with no browser attached.

Add a sync try/catch for throws and a .catch() chain for promise
rejections. Both fall back to process.exit(2) so a dead browser never
leaves a live server. Also widen the callback type from `() => void`
to `() => void | Promise<void>` to match the actual runtime shape of
the wired shutdown(2) call.

Surfaced by /ship's adversarial subagent.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: honor BROWSE_PARENT_PID=0 with trailing whitespace

The strict string compare `process.env.BROWSE_PARENT_PID === '0'` meant
any stray newline or whitespace (common from shell `export` in a pipe or
heredoc) would fail the check and re-enable the watchdog against the
caller's intent.

Switch to parseInt + === 0, matching the server's own parseInt at
server.ts:760. Handles '0', '0\n', ' 0 ', and unset correctly; non-numeric
values (parseInt returns NaN, NaN === 0 is false) fail safe — watchdog
stays active, which is the safe default for unexpected input.

Surfaced by /ship's adversarial subagent.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: preserve bun:test sub-APIs in relink test wrapper

The previous commit wrapped bun:test's `test` to bump the per-test
timeout default to 15s but cast the wrapper `as typeof _bunTest`
without copying the sub-properties (`.only`, `.skip`, `.each`,
`.todo`, `.failing`, `.if`) from the original. The cast was a lie:
the wrapper was a plain function, not the full callable with those
chained properties attached.

The file doesn't use any of them today, but a future test.only or
test.skip would fail with a cryptic "undefined is not a function."
Object.assign the original _bunTest's properties onto the wrapper so
sub-APIs chain correctly forever.

Surfaced by /ship's adversarial subagent.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.18.1.0)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test: regression tests for parent-process watchdog

End-to-end tests in browse/test/watchdog.test.ts that prove the three
invariants v0.18.1.0 depends on. Each test spawns the real server.ts
(not a mock), so any future change that breaks the watchdog logic fails
here — the thing /ship's adversarial review flagged as missing.

1. BROWSE_PARENT_PID=0 disables the watchdog
   Spawns server with PID=0, reads stdout, confirms the
   "watchdog disabled (BROWSE_PARENT_PID=0)" log line appears and
   "Parent process ... exited" does NOT. ~2s.

2. BROWSE_HEADED=1 disables the watchdog (server-side guard)
   Spawns server with BROWSE_HEADED=1 and a bogus parent PID (999999).
   Proves BROWSE_HEADED takes precedence over a present PID — if the
   server-side defense-in-depth regresses, the watchdog would try to
   poll 999999 and fire on the "dead parent." ~2s.

3. Default headless mode: watchdog fires when parent dies
   The regression guard for the original orphan-prevention behavior.
   Spawns a real `sleep 60` parent and a server watching its PID, then
   kills the parent and waits up to 25s for the server to exit. The
   watchdog polls every 15s so first tick is 0-15s after death, plus
   shutdown() cleanup. ~18s.

Total runtime: ~21s for all 3 tests. They catch the class of bug this
branch exists to fix: "does the process live or die when it should?"

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: rocke2020 <noreply@github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
JerkyJesse added a commit to JerkyJesse/cavestack that referenced this pull request Apr 18, 2026
Ports gstack garrytan#994 (joelgreen) + garrytan#1020 gating + SIGINT wrapper fix (garrytan#1025).

The browse server was dying between Claude Code Bash tool invocations because:
1. SIGTERM: Claude Code sandbox sends SIGTERM to child processes when a Bash
   command completes. Server called shutdown() and deleted state file.
2. Parent watchdog: 15s BROWSE_PARENT_PID poll shut down on parent death.

Both made every new $B call start a fresh server with no cookies or state.

Fix (scoped to normal headless mode):
- SIGTERM handler: in headless mode, log and ignore. Idle timeout (30min)
  handles eventual cleanup. Headed/tunnel modes respect SIGTERM and shutdown
  cleanly (idle timeout early-returns in those modes, so ignoring would leak).
- Parent watchdog: same three-way resolution. Active cookie picker always
  stays alive to avoid stranding picker UI with "Failed to fetch".
- SIGINT (Ctrl+C) always shuts down — intentional stop.
- Outer IS_HEADED_WATCHDOG gate skips watchdog entirely in headed mode.
- shutdown() called via arrow wrapper so signal name arg doesn't become
  NaN exitCode (fixes legacy exit-code-1 bug).

Imports hasActivePicker() from cookie-picker-routes (added in prior commit).

/open-cavestack-browser path used in place of /open-gstack-browser in inline docs.

Upstream: garrytan/gstack@1211b6b

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
JerkyJesse added a commit to JerkyJesse/cavestack that referenced this pull request Apr 18, 2026
Ports 5 upstream PRs from garrytan#1028 (community wave v0.18.1.0):

- garrytan#892 Windows cookie import (DPAPI + v20 CDP fallback)
- garrytan#996 Cookie picker survives CLI exit
- garrytan#994 + garrytan#1020 SIGTERM persistence + headless-mode gating
- garrytan#993 Tilde-in-assignment fix (design resolver + 5 templates)
- garrytan#864 OpenClaw native skill frontmatter normalization (codex-friendly)

Plus local change: caveman mode locked to full (lite/ultra removed).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants