Pre-release hardening: playground TS bootstrap telemetry and worker exit on uncaught#646
Merged
Conversation
The runtime node_modules stub install + TS server restart used to fail silently into outputChannel.debug. On read-only extension installs (Snap, system package, locked enterprise, some container setups) the stub install throws EACCES/EROFS and playground IntelliSense breaks with no signal to operators and no possibility to recover during the session. Changes: - Wrap ensureTsRestart in callWithTelemetryAndErrorHandling so failures are reported via the standard playground.tsPluginBootstrap event with result, error name, and errorMessage captured automatically. - Add a stage property (stubInstall, tsActivate, tsWait, tsRestart, complete) so we can tell which phase failed without parsing error messages. - Track stubCreated and tsExtensionFound for additional context. - Reset tsRestarted on failure so opening another playground retries the bootstrap instead of being stuck for the rest of the session. No additional logic change to the 2-second wait before typescript.restartTsServer; that magic delay is tracked separately for a future revisit (see follow-up issue).
Previously the worker reported the error to the main thread and stayed alive. mongoClient and shellRuntime could be in a corrupted state, so a subsequent eval might silently misbehave on the same connection. Now the worker schedules process.exit(1) after a short delay (to allow the IPC error message to flush). WorkerSessionManager already handles unexpected worker exits: it rejects pending requests with a meaningful error and emits worker.unexpectedExit telemetry. The next eval call on the same cluster will respawn a fresh worker via ensureWorker.
…-01) stubCreated=false was ambiguous: the stub may have already existed, or the stub-install branch may never have been reached (for example if an error fired before that block). Add an explicit stubExisted property so the combinations are unambiguous: stubCreated=true, stubExisted=false -> first-time install on this profile stubCreated=false, stubExisted=true -> stub already present, no work stubCreated=false, stubExisted=false -> bootstrap never reached the stub step Pure telemetry refinement; no behavior change.
The uncaughtException handler was updated to schedule process.exit(1) so the supervisor can respawn a clean worker (F-04). The companion unhandledRejection handler was left as log-only, which means a late promise rejection that escapes all eval-scoped catch blocks can leave the worker running with potentially the same kind of corrupted state F-04 set out to prevent. Apply the same treatment symmetrically: - If an eval is in flight, forward the rejection as an evalError so the user sees a meaningful failure instead of 'Worker exited unexpectedly'. - Log the message and stack. - Schedule process.exit(1) with the same 50 ms IPC-flush delay used by uncaughtException. WorkerSessionManager already handles the resulting unexpected exit and emits worker.unexpectedExit telemetry. Pushes us closer to the 'crash early, respawn fresh' invariant for the playground worker.
Contributor
There was a problem hiding this comment.
Pull request overview
Hardens the Query Playground bootstrap path by adding telemetry/observability around TypeScript plugin initialization and ensuring the playground worker terminates after an uncaught exception so it can be respawned cleanly by the supervisor.
Changes:
- Wrap TS plugin bootstrap in
callWithTelemetryAndErrorHandling('playground.tsPluginBootstrap'), adding staged telemetry properties and resetting the retry flag on failure. - On
uncaughtExceptionin the playground worker, post the error back to the main thread and then schedule a non-graceful exit to force a clean respawn on the next eval.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| src/documentdb/playground/playgroundWorker.ts | Ensures the worker exits shortly after reporting an uncaught exception to avoid continuing in a potentially corrupted state. |
| src/documentdb/ClustersExtension.ts | Adds staged telemetry and retry behavior for TS plugin bootstrap, improving diagnosability and resilience on read-only installs. |
sajeetharan
approved these changes
May 13, 2026
Contributor
✅ Code Quality Checks
This comment is updated automatically on each push. |
Contributor
📦 Build Size Report
Download artifact · updated automatically on each push. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Three pre-release hardening fixes identified by the 0.8.0 release audit. All are scoped to the playground worker bootstrap path; nothing in this PR changes connectivity, data handling, or public APIs.
Changes
fix(playground): instrument and retry TS plugin bootstrap(F-01, F-02)ClustersExtension.ensureTsRestartpreviously swallowed all errors intooutputChannel.debugand settsRestarted = truebefore doing any async work, so a single failure on first playground open left IntelliSense broken for the rest of the session with no signal to operators.callWithTelemetryAndErrorHandling('playground.tsPluginBootstrap')so result, error name, and errorMessage are captured automatically.stageproperty (stubInstall,tsActivate,tsWait,tsRestart,complete) andstubCreated/tsExtensionFoundso we can tell what failed without parsing messages.tsRestartedon failure so the next playground open retries. This matters on read-only extension installs (Snap, system package, locked enterprise) where the FS write can fail transiently.fix(playground): exit worker after uncaughtException(F-04)playgroundWorker.tspreviously reported the exception to the main thread and stayed alive with a potentially corruptedmongoClient/shellRuntime. Now the worker schedulesprocess.exit(1)after a 50 ms delay to let the IPC error flush.WorkerSessionManageralready rejects pending requests, emitsworker.unexpectedExittelemetry, and respawns on the next eval, so the path is already well covered downstream.What this PR does not do
typescript.restartTsServer; the new telemetry will tell us if that timing is actually a problem in the field.ALLOWED_HOSTS(tracked separately).Follow-up issues
Filed against the remaining audit findings:
Test plan
.documentdb.jsplayground; verify the bootstrap telemetry event fires once withresult=Succeeded. Force a failure (chmod 0555 on the extensionnode_modules/) and verifyresult=Failedis emitted withstage=stubInstalland that opening a second playground retries.process.nextTick(() => { throw new Error('boom'); })); verify the worker exits, the error surfaces in the result panel, and the next Run respawns a fresh worker.