Pre-release hardening: playground TS bootstrap telemetry and worker exit on uncaught by tnaum-ms · Pull Request #646 · microsoft/vscode-documentdb

tnaum-ms · 2026-05-13T08:36:37Z

Three pre-release hardening fixes identified by the 0.8.0 release audit. All are scoped to the playground worker bootstrap path; nothing in this PR changes connectivity, data handling, or public APIs.

Changes

`fix(playground): instrument and retry TS plugin bootstrap` (F-01, F-02)

ClustersExtension.ensureTsRestart previously swallowed all errors into outputChannel.debug and set tsRestarted = true before doing any async work, so a single failure on first playground open left IntelliSense broken for the rest of the session with no signal to operators.

Wraps the bootstrap in callWithTelemetryAndErrorHandling('playground.tsPluginBootstrap') so result, error name, and errorMessage are captured automatically.
Adds a stage property (stubInstall, tsActivate, tsWait, tsRestart, complete) and stubCreated / tsExtensionFound so we can tell what failed without parsing messages.
Resets tsRestarted on failure so the next playground open retries. This matters on read-only extension installs (Snap, system package, locked enterprise) where the FS write can fail transiently.

`fix(playground): exit worker after uncaughtException` (F-04)

playgroundWorker.ts previously reported the exception to the main thread and stayed alive with a potentially corrupted mongoClient / shellRuntime. Now the worker schedules process.exit(1) after a 50 ms delay to let the IPC error flush. WorkerSessionManager already rejects pending requests, emits worker.unexpectedExit telemetry, and respawns on the next eval, so the path is already well covered downstream.

What this PR does not do

Does not change the long-term plan to publish the TS plugin as a standalone npm package (WIP: Publish TS server plugin as standalone npm package #548).
Does not change the 2 s magic sleep before typescript.restartTsServer; the new telemetry will tell us if that timing is actually a problem in the field.
Does not address sovereign-cloud OIDC ALLOWED_HOSTS (tracked separately).

Follow-up issues

Filed against the remaining audit findings:

Test plan

Lint and tests should pass unchanged; this is a pure logic refactor of two narrow code paths.
Manual: open a .documentdb.js playground; verify the bootstrap telemetry event fires once with result=Succeeded. Force a failure (chmod 0555 on the extension node_modules/) and verify result=Failed is emitted with stage=stubInstall and that opening a second playground retries.
Manual: throw inside a playground eval (for example process.nextTick(() => { throw new Error('boom'); })); verify the worker exits, the error surfaces in the result panel, and the next Run respawns a fresh worker.

The runtime node_modules stub install + TS server restart used to fail silently into outputChannel.debug. On read-only extension installs (Snap, system package, locked enterprise, some container setups) the stub install throws EACCES/EROFS and playground IntelliSense breaks with no signal to operators and no possibility to recover during the session. Changes: - Wrap ensureTsRestart in callWithTelemetryAndErrorHandling so failures are reported via the standard playground.tsPluginBootstrap event with result, error name, and errorMessage captured automatically. - Add a stage property (stubInstall, tsActivate, tsWait, tsRestart, complete) so we can tell which phase failed without parsing error messages. - Track stubCreated and tsExtensionFound for additional context. - Reset tsRestarted on failure so opening another playground retries the bootstrap instead of being stuck for the rest of the session. No additional logic change to the 2-second wait before typescript.restartTsServer; that magic delay is tracked separately for a future revisit (see follow-up issue).

Previously the worker reported the error to the main thread and stayed alive. mongoClient and shellRuntime could be in a corrupted state, so a subsequent eval might silently misbehave on the same connection. Now the worker schedules process.exit(1) after a short delay (to allow the IPC error message to flush). WorkerSessionManager already handles unexpected worker exits: it rejects pending requests with a meaningful error and emits worker.unexpectedExit telemetry. The next eval call on the same cluster will respawn a fresh worker via ensureWorker.

…-01) stubCreated=false was ambiguous: the stub may have already existed, or the stub-install branch may never have been reached (for example if an error fired before that block). Add an explicit stubExisted property so the combinations are unambiguous: stubCreated=true, stubExisted=false -> first-time install on this profile stubCreated=false, stubExisted=true -> stub already present, no work stubCreated=false, stubExisted=false -> bootstrap never reached the stub step Pure telemetry refinement; no behavior change.

The uncaughtException handler was updated to schedule process.exit(1) so the supervisor can respawn a clean worker (F-04). The companion unhandledRejection handler was left as log-only, which means a late promise rejection that escapes all eval-scoped catch blocks can leave the worker running with potentially the same kind of corrupted state F-04 set out to prevent. Apply the same treatment symmetrically: - If an eval is in flight, forward the rejection as an evalError so the user sees a meaningful failure instead of 'Worker exited unexpectedly'. - Log the message and stack. - Schedule process.exit(1) with the same 50 ms IPC-flush delay used by uncaughtException. WorkerSessionManager already handles the resulting unexpected exit and emits worker.unexpectedExit telemetry. Pushes us closer to the 'crash early, respawn fresh' invariant for the playground worker.

Copilot

Pull request overview

Hardens the Query Playground bootstrap path by adding telemetry/observability around TypeScript plugin initialization and ensuring the playground worker terminates after an uncaught exception so it can be respawned cleanly by the supervisor.

Changes:

Wrap TS plugin bootstrap in callWithTelemetryAndErrorHandling('playground.tsPluginBootstrap'), adding staged telemetry properties and resetting the retry flag on failure.
On uncaughtException in the playground worker, post the error back to the main thread and then schedule a non-graceful exit to force a clean respawn on the next eval.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File	Description
src/documentdb/playground/playgroundWorker.ts	Ensures the worker exits shortly after reporting an uncaught exception to avoid continuing in a potentially corrupted state.
src/documentdb/ClustersExtension.ts	Adds staged telemetry and retry behavior for TS plugin bootstrap, improving diagnosability and resilience on read-only installs.

github-actions · 2026-05-13T08:44:19Z

✅ Code Quality Checks

Check	Status	How to fix
Localization (`l10n`)	✅ Passed
ESLint	✅ Passed
Prettier formatting	✅ Passed

This comment is updated automatically on each push.

github-actions · 2026-05-13T08:47:52Z

📦 Build Size Report

Metric	Base (`next`)	PR	Delta
VSIX (`vscode-documentdb-0.8.0.vsix`)	7.51 MB	7.51 MB	⬆️ +0 KB (+0.0%)
Webview bundle (`views.js`)	5.79 MB	5.79 MB	✅ 0 KB (0.0%)

Download artifact · updated automatically on each push.

tnaum-ms added 2 commits May 13, 2026 10:28

tnaum-ms requested a review from a team as a code owner May 13, 2026 08:36

Copilot AI review requested due to automatic review settings May 13, 2026 08:36

Copilot started reviewing on behalf of tnaum-ms May 13, 2026 08:37 View session

tnaum-ms added 2 commits May 13, 2026 10:40

Copilot AI reviewed May 13, 2026

View reviewed changes

sajeetharan approved these changes May 13, 2026

View reviewed changes

tnaum-ms merged commit 4a4278a into next May 13, 2026
8 checks passed

tnaum-ms deleted the dev/tnaum/pre-release-fixes branch May 13, 2026 08:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pre-release hardening: playground TS bootstrap telemetry and worker exit on uncaught#646

Pre-release hardening: playground TS bootstrap telemetry and worker exit on uncaught#646
tnaum-ms merged 4 commits into
nextfrom
dev/tnaum/pre-release-fixes

tnaum-ms commented May 13, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

github-actions Bot commented May 13, 2026

Uh oh!

github-actions Bot commented May 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

tnaum-ms commented May 13, 2026

Changes

fix(playground): instrument and retry TS plugin bootstrap (F-01, F-02)

fix(playground): exit worker after uncaughtException (F-04)

What this PR does not do

Follow-up issues

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

github-actions Bot commented May 13, 2026

✅ Code Quality Checks

Uh oh!

github-actions Bot commented May 13, 2026

📦 Build Size Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

`fix(playground): instrument and retry TS plugin bootstrap` (F-01, F-02)

`fix(playground): exit worker after uncaughtException` (F-04)