Skip to content

fix(dxt): OAuth-on-install — non-blocking bootstrap + PATH-independent browser launch (v0.21.2)#108

Merged
milstan merged 13 commits into
mainfrom
ArtyETH06/dxt-oauth-browser-open
Jun 19, 2026
Merged

fix(dxt): OAuth-on-install — non-blocking bootstrap + PATH-independent browser launch (v0.21.2)#108
milstan merged 13 commits into
mainfrom
ArtyETH06/dxt-oauth-browser-open

Conversation

@ArtyETH06

@ArtyETH06 ArtyETH06 commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Symptom

Installing the Leadbay .dxt in Claude Desktop fails OAuth-on-install. Three layered failures, all fixed here (each surfaced after fixing the prior one):

  1. "Unable to connect to extension server" — server never finished connecting.
  2. No browser opens — spawn failed under the sanitized launch env.
  3. MCP connects, gate fires, but still no sign-in — the spawned process can't open a GUI browser at all, and there was no link to fall back on.

Root causes (all reproduced against the built bundle)

1. Blocking startup. main() awaited the full 5-minute interactive OAuth flow before server.connect. Claude Desktop gives an extension only seconds to answer initialize, so it timed out.

2. PATH-only browser launch. spawn("xdg-open"|"open"|"cmd") used bare command names; Claude Desktop's sanitized PATH doesn't contain them → ENOENT, no browser.

3. No DISPLAY in the spawned env. Even with absolute launcher paths, the child process has no DISPLAY/WAYLAND_DISPLAY, so xdg-open exits 0 without launching anything. The spawn "succeeds", so no error fires — the user is left with a "browser should have opened" message and no link. Confirmed from real Claude Desktop logs (mcp-server-Leadbay.log) + a DISPLAY-stripped repro.

None are regressions in our OAuth code — the host's launch contract tightened (sanitized PATH + DISPLAY, short initialize timeout).

Fix

Non-blocking bootstrapinitialize answered immediately with a real tokenless client (authState: "pending"); OAuth runs in the background; the live client is mutated (setBaseUrl+setToken) when the token lands, so the next tool call is authenticated with no rebuild.

Surface the sign-in link (the reliable path) — a spawned stdio server can't depend on opening a browser, so oauthLogin fires onAuthorizeUrl(url) the moment the URL is built (listener already live), the background bootstrap stashes it, and the CallTool gate returns it as a clickable AUTH_REQUIRED link ("Open this link to authorize Leadbay, then re-run this tool"). The loopback listener stays alive, so clicking it completes the flow. Browser auto-open is still attempted best-effort via absolute launcher paths for envs where it works.

Verification

  • Ran the built 0.21.2 bundle with PATH and DISPLAY stripped (the real Claude Desktop spawn env): initialize answered instantly; the tool call returns the live leadbay.app/oauth/authorize?...redirect_uri=127.0.0.1:<port>/callback link; the listener stays up to catch the callback.
  • Confirmed against Arty's actual Claude Desktop logs: server v0.21.2 connects, lists tools, and the gate fired the pending envelope (the pre-link build) — this PR replaces that with the clickable link.
  • New test files oauth-bootstrap-nonblocking.test.ts + oauth-browser-open.test.ts pin: pending client, the gate, the surfaced live URL, the openFailed note, the post-setToken flip, and absolute-path launch. Existing test files untouched.
  • pnpm -r build, pnpm -r typecheck, pnpm -r test all green (core 402, promptforge 16, mcp 432).

Versions: @leadbay/mcp 0.21.1 → 0.21.2, server.json 0.21.2, @leadbay/dxt 0.2.6 → 0.2.7. AuthState + StartupAuthState unions gain "pending".

Manual test

A built .dxt/.mcpb is on Arty's Desktop. The tool call now hands you a clickable Leadbay sign-in link; click it, authorize, re-run — you're connected.

…anitized PATH (v0.21.2)

The .dxt bundle runs the loopback OAuth flow at startup and opened the
browser via spawn("open" | "xdg-open" | "cmd", …) — bare command names
resolved through PATH. Claude Desktop spawns .dxt/.mcpb extensions with a
sanitized environment whose PATH does NOT contain those launchers, so the
spawn failed with ENOENT, no browser ever opened, and the only diagnostic
went to MCP stderr (invisible to the user) while the server dangled on the
5-minute callback wait. Net effect: "no browser opens at all" on install,
tools return AUTH_MISSING. Reproduced by running the built bundle with a
PATH containing only node.

openInBrowser now launches via the OS launcher's absolute path
(/usr/bin/open, %SystemRoot%\System32\cmd.exe, /usr/bin/xdg-open) with the
bare-command PATH lookup kept only as a trailing fallback — independent of
the inherited PATH, which restores the click-Install -> browser-opens UX.
If the launcher genuinely can't be found, oauthLogin({failFastOnOpenError})
throws BrowserOpenFailedError instead of blocking; bootstrap catches it,
lets the server come up immediately, and the first tool call returns an
AUTH_MISSING envelope telling the user to restart the extension to retry.

New test file oauth-browser-open.test.ts pins the absolute-path candidate
ordering per platform and the fail-fast behaviour. Existing oauth.test.ts
untouched. Versions: @leadbay/mcp 0.21.1 -> 0.21.2, server.json 0.21.2,
@leadbay/dxt 0.2.6 -> 0.2.7.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@ArtyETH06 ArtyETH06 self-assigned this Jun 17, 2026
…ects (v0.21.2)

The bundled stdio server ran the full interactive browser OAuth flow (up to
5 min) at startup, BEFORE answering the MCP `initialize` handshake. Claude
Desktop gives a launched extension only a few seconds to respond, so it timed
out the connection and showed "Unable to connect to extension server". The
PATH-only browser launch (previous commit) made the browser open but couldn't
fix the connection because the handshake was still blocked.

Make bootstrap non-blocking:
- resolveClientFromEnv returns a REAL tokenless LeadbayClient with a new
  authState "pending" immediately — no await on OAuth. initialize is answered
  at once.
- main() kicks off bootstrapOAuthIfMissing in the background AFTER
  server.connect; on success it mutates the LIVE client the handler holds
  (setBaseUrl + setToken), identifies the user to telemetry (skipped at boot
  to avoid latching an anonymous identity), and starts the notifications WS.
- buildServer gains a bootstrapStatus() getter; the CallTool handler returns a
  transient AUTH_PENDING envelope (or AUTH_MISSING when the browser couldn't
  open) while unauthenticated, then executes normally the instant the token
  lands (gate reads client.isAuthenticated per call — no server rebuild).

Verified end-to-end against the built bundle with a sanitized PATH: initialize
is answered immediately, a tool call returns the AUTH_PENDING envelope, and
OAuth runs in the background. New test file oauth-bootstrap-nonblocking.test.ts
pins the pending client, the gate, the post-setToken flip, and the
non-bootstrap path. Existing test files untouched.

AuthState + StartupAuthState unions gain "pending". @leadbay/mcp 0.21.2.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@ArtyETH06 ArtyETH06 changed the title fix(dxt): OAuth-on-install opens the browser under Claude Desktop's sanitized PATH (v0.21.2) fix(dxt): OAuth-on-install — non-blocking bootstrap + PATH-independent browser launch (v0.21.2) Jun 17, 2026
ArtyETH06 and others added 11 commits June 17, 2026 14:04
…en a browser

Follow-up from real Claude Desktop logs: MCP connects and the AUTH_PENDING gate
fires, but no browser ever opens, so the user is stuck with "a browser window
should have opened" and no link. Root cause (reproduced): Claude Desktop strips
DISPLAY/WAYLAND (and PATH) from the spawned .dxt child env, so xdg-open/open
either ENOENTs or — worse — exits 0 WITHOUT launching anything. Our spawn
"succeeds", failFastOnOpenError never trips, and the auto-open silently no-ops.

A spawned stdio server can't reliably open a GUI browser. So stop depending on
it: capture the live authorize URL and surface it to the user.

- oauthLogin gains onAuthorizeUrl(url): fires the moment the URL is built (after
  the loopback listener is up, before blocking on the callback) so the URL is
  immediately clickable.
- bootstrapOAuthIfMissing stashes that URL in pendingSignInUrl and DROPS
  failFastOnOpenError — we now keep the listener alive in the background so a
  click completes the flow; cleared when the token lands.
- buildServer's bootstrapStatus getter now returns { done, signInUrl?, openFailed }
  and the CallTool gate renders the live URL as a clickable AUTH_REQUIRED link
  ("Open this link to authorize Leadbay, then re-run this tool"). Auto-open is
  still attempted best-effort (absolute launcher paths) for envs where it works.

Verified end-to-end against the built bundle with DISPLAY + PATH stripped (the
real Claude Desktop spawn env): the tool call returns the live
leadbay.app/oauth/authorize?...redirect_uri=127.0.0.1:<port>/callback link, and
the listener stays up to catch the callback. Tests updated for the new object
shape + surfaced-link assertions; existing test files untouched. mcp 432 green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…race

Real install logs showed the surfaced-link path works but the browser still
didn't auto-open. Root cause (confirmed in the logs): Claude Desktop probes a
freshly-installed extension with rapid connect→shutdown cycles — the first
spawned process lived 99ms (21:09:02.700 started → .799 "intentional
shutdown"). The background OAuth flow does discovery + client registration
(hundreds of ms) before reaching the browser-launch step, so the probe killed
the process first and no tab opened. On a stable session the auto-open works
(verified: a real tab opens with the intact desktop env — DISPLAY/DBUS/PATH are
all present in the spawned child; the earlier "no DISPLAY" theory was a test
artifact from env -i).

Make the auto-open win the race:
- bootstrap fires the browser-open the moment the authorize URL is known
  (inside onAuthorizeUrl), tracked in a module-level browserOpenInFlight handle.
  oauthLogin's own open is disabled (openBrowser no-op) to avoid a double tab.
- shutdown() awaits browserOpenInFlight (bounded 1.5s) before process.exit, so
  a teardown mid-dispatch still lets the detached launcher spawn; once spawned
  (detached + unref'd) it survives our exit on its own.

The surfaced clickable sign-in link stays as the backstop for envs where the
launcher genuinely can't run.

New test: oauthLogin fires onAuthorizeUrl with the live URL before blocking.
resolveClientFromEnv tests now isolate HOME so credentials-file hydration on a
signed-in dev machine doesn't flip authState to "ok". mcp 433 green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…me auto-open failure

Every launch layer (xdg-open → gio → snap Firefox) opens a tab from a shell with
the real desktop env, yet the browser still doesn't auto-open on a real Claude
Desktop install. Claude Desktop surfaces none of the spawned server's stderr, so
we can't see where it fails. Add a best-effort timestamped trace to
~/.leadbay/oauth-bootstrap-debug.log capturing: when the background bootstrap
starts, the env openInBrowser sees (DISPLAY/WAYLAND/DBUS), the candidate list,
the spawn result per candidate, auto-open dispatched/failed, bootstrap
complete/failed, and whether shutdown raced an in-flight open.

openInBrowser gains an optional debug(msg) sink (used only by the bootstrap).
Validated against the staged bundle: the log shows a ~1.8s gap between bootstrap
START and authorize-URL-ready (OAuth discovery + DCR latency), then
"spawn OK: /usr/bin/xdg-open" — so the install failure is a lifecycle/timing
issue the log will localize. No behavior change to the auth flow; mcp 433 green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…aunch

The diagnostic build pinpointed the real install-time failure: bootstrap died
with "OAuth client registration rate-limited (429) — ~10 registrations/IP/hour"
BEFORE it ever built a sign-in URL, which is why no browser opened and no link
appeared. Root cause: oauthLogin called registerClient (Dynamic Client
Registration) on EVERY launch, and Claude Desktop fires several launches per
install via its probe-restarts — so a handful of installs/tests exhausted the
backend's hourly registration cap. The incrementing client_id in the logs
(67,68,70,72,75,77…) was the tell.

DCR is a once-and-reuse operation. Fix: persist the registered client_id per
auth-server URL in ~/.leadbay/oauth-client.json and reuse it on every later
launch, skipping registration entirely. oauthLogin gains getCachedClientId() /
onClientRegistered() hooks (caller owns the file I/O, keeping oauth.ts pure);
bin.ts implements the cache. Loopback clients accept any 127.0.0.1 port per
RFC 8252 §7.3, so a fresh ephemeral port works against the cached id. Net: at
most one registration ever per machine — never approaches the 10/hr cap.

Also keeps the diagnostic trace (~/.leadbay/oauth-bootstrap-debug.log) from the
previous commit — it's what found this and is cheap/bootstrap-only.

New tests: cached id skips the /register POST entirely (the 429 fix), and a
cache-miss registers + reports the id via onClientRegistered. Existing
oauth.test.ts end-to-end (no cache hooks) still registers, unchanged. mcp 435.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… install

A real Claude Desktop install log (not a simulation) showed the spawned server's
env had DISPLAY=<unset> WAYLAND=<unset> — the host strips them inconsistently
(present on some launches, absent on others, which matches "it worked before,
now it doesn't"). With no display var, xdg-open spawns "successfully" but can't
reach the display server, so no tab opens. This was the final piece of the
"nothing opens on install" puzzle, sitting underneath the client-id-429 fix.

openInBrowser now builds the launch env via browserLaunchEnv(): on Linux, when
DISPLAY/WAYLAND_DISPLAY are absent, it backfills them from XDG_RUNTIME_DIR (the
wayland-N socket) and /tmp/.X11-unix (lowest X display, default :0), then passes
that env to the spawned launcher. Already-set vars are untouched; non-Linux is a
no-op. Verified against the staged bundle with DISPLAY+WAYLAND stripped: the log
shows "injected WAYLAND_DISPLAY=wayland-0 / DISPLAY=:0" then a successful
xdg-open spawn — a tab opens where before nothing did.

New tests: non-Linux no-op, already-set left unchanged, missing DISPLAY
backfilled. mcp 438 green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…authorizes

Real-device test surfaced "This app's redirect URL is not authorized" at the
authorize step. Root cause: the client-id cache (added to stop the 429) shipped
a wrong assumption — that the backend accepts any 127.0.0.1 port for a cached
client. It does NOT when the client was registered with a concrete port: the
Leadbay backend pins the registered redirect_uri, and the loopback listener
uses a fresh random port every launch, so the cached id was rejected on the
next launch.

Fix: register with the PORT-LESS loopback redirect (http://127.0.0.1/callback)
instead of the listener's concrete port. Verified against the live backend that
a specific per-launch port (127.0.0.1:42973) then authorizes with HTTP 200
against a port-less registration — RFC 8252 §7.3 loopback matching, which the
backend honors only when the registered URI is itself port-less. The token
exchange continues to send the same concrete port the authorize step used, so
the request-pair stays consistent. Net: one cached client_id is now reusable
across launches AND authorizes correctly.

Verified end-to-end: fresh run registers client_id=109 port-less, builds an
authorize URL with a real ephemeral port, backend returns 200. mcp build +
typecheck green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Regression guard for the "redirect URL is not authorized" bug: asserts the DCR
registration body carries http://127.0.0.1/callback (port-less, no :port) while
the authorize URL carries the real ephemeral loopback port. New test in the
existing new-in-branch file; no source change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… redirect-URL error)

Real-device test (and an honest re-check of my own verification) disproved the
port-less assumption: the Leadbay authorize page is a client-side JS app that
returns HTTP 200 regardless, then validates the redirect_uri IN THE BROWSER
against the EXACT registered string — it does NOT do RFC 8252 loopback-port
matching. So port-less registration still got rejected ("This app's redirect URL
is not authorized") because the per-launch random port didn't match.

Fix: bind the loopback listener to a STABLE port (LEADBAY_LOOPBACK_PORT=51789,
ephemeral fallback if busy) and register with that exact redirect_uri, so the
port in the registration equals the port at /authorize equals the port at token
exchange — an exact match every time. The client_id cache key now includes the
port, so a fallback to an ephemeral port forces a fresh registration for that
port rather than reusing a mismatched id. Keeps the once-per-machine
registration (no 429) AND authorizes correctly across launches.

startLoopbackListener gains preferredPort (+ exposes the bound `port`);
oauthLogin registers/authorizes the bound port and keys the cache by it; bin.ts
cache helpers are port-keyed.

Verified end-to-end against the live backend: register + authorize + cache all
show 127.0.0.1:51789 consistently (client_id=114, port 51789). Regression test
updated to assert register-port == authorize-port. mcp 439 green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ephemeral

If the single fixed port (51789) was busy, we fell back to a random ephemeral
port — which works for sign-in but misses the client_id cache (port-keyed) and
forces a fresh registration that launch, drifting toward the ~10/hr 429 cap on
repeated collisions.

Now try [51789, 51790, 51791, 51792] in order; bind the first free one and only
fall back to an ephemeral port if ALL are taken. A transient collision (e.g. a
prior sign-in tab still holding the port) still lands on a STABLE, cacheable
port, preserving register-once. The exact-match guarantee is unchanged: whatever
port binds is the one registered, authorized, and exchanged.

startLoopbackListener: preferredPort:number -> preferredPorts:number[].
New tests: binds the next port when the first is busy; ephemeral fallback when
all preferred are busy. mcp 441 green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…LED (PR review)

Two correctness gaps in the non-blocking OAuth bootstrap, from PR review:

P1 — early host shutdown killed OAuth mid-registration. shutdown() only waited
on browserOpenInFlight, which is null until onAuthorizeUrl runs. Claude Desktop's
probe→teardown (stdin close ~100ms) during region-probe/discovery/registration
therefore exited immediately, before any browser spawn or sign-in URL. Now
shutdown() waits (bounded ~4s) on the whole bootstrap task (bootstrapInFlight)
while still unauthenticated, then on the browser-open. Verified end-to-end:
stdin closed at 1s still reached "spawn OK" at ~2.7s (registration → URL →
detached xdg-open), where before it died at 1s.

P2 — non-browser failures stayed "pending" forever. If bootstrapOAuthIfMissing
returned false for a discovery/GeoIP/registration/token-exchange failure with no
URL minted, the gate kept returning AUTH_PENDING ("a browser window should have
opened") indefinitely. Now that path records bootstrapFailureMessage (and clears
any stale URL); the gate returns AUTH_FAILED with the real error + restart
guidance, taking priority over a stale sign-in link.

New tests: AUTH_FAILED gate (with and without a stale signInUrl). Existing
bin.ts global wiring + server.ts bootstrapStatus type extended with
failureMessage. mcp 443 green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… port alternation

PR review P2: the client-id cache stored a single {client_id, port} per auth
server, so when the bound stable port alternated (51789 busy → 51790, then
51789 free next launch) each write overwrote the other port's id — forcing a
fresh Dynamic Client Registration on every alternation and recreating the
~10/hr 429 risk the cache exists to prevent.

Restructure the on-disk shape to { byPort: { "<port>": "<client_id>" } } per
auth server and MERGE on write, so an id is retained for every port we've
registered on. Reads look up by exact port (unchanged contract). The
getCachedClientId/onClientRegistered call sites are untouched.

New test file oauth-client-cache.test.ts (7 tests, isolated temp HOME): per-port
round-trip, exact-port match, RETAINS both ports across writes, alternating
ports never lose either id, auth-server keying, 0600 byPort file shape. Helpers
exported for testing. mcp 450 green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@ArtyETH06 ArtyETH06 marked this pull request as ready for review June 19, 2026 18:57
@ArtyETH06 ArtyETH06 requested a review from milstan June 19, 2026 18:57
@milstan milstan merged commit cf52c4c into main Jun 19, 2026
1 check passed
milstan added a commit that referenced this pull request Jun 19, 2026
main shipped/tagged 0.21.2 (dxt OAuth, #108) while this branch reused
0.21.2-0.21.5 for the 401 work. Resolved the 3 conflicts:

- package.json + server.json -> 0.21.3 (next clean number after main's 0.21.2)
- CHANGELOG: collapsed the four 401-fix dev entries into one 0.21.3
  release (4 bullets, the final code-gated behavior), kept main's
  0.21.2 dxt entry intact.

Auto-merges (server.ts triggered_by wiring, WORKFLOWS.md WF30/31)
verified: prompts:build no-drift, build, typecheck, and full test pass
(core 414, promptforge 16, mcp 455).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants