Skip to content

fix(spanner): don't tear down host app's OTel ContextManager#8344

Open
jcalem-rogo wants to merge 2 commits into
googleapis:mainfrom
jcalem-rogo:fix-spanner-context-manager-hijack
Open

fix(spanner): don't tear down host app's OTel ContextManager#8344
jcalem-rogo wants to merge 2 commits into
googleapis:mainfrom
jcalem-rogo:fix-spanner-context-manager-hijack

Conversation

@jcalem-rogo
Copy link
Copy Markdown

Problem

ensureInitialContextManagerSet() in handwritten/spanner/src/instrument.ts (added to make async/await tracing work for apps that haven't configured OpenTelemetry) currently tears down the host application's ContextManager on most calls. The guard is:

if (!context['_contextManager'] || context.active() === ROOT_CONTEXT) {
  context.disable();  // disables any prior contextManager
  const contextManager = new AsyncHooksContextManager();
  contextManager.enable();
  context.setGlobalContextManager(contextManager);
}

Two issues:

  1. context['_contextManager'] is not a public field on the API singleton. setGlobalContextManager writes into the package-private global registry (registerGlobal in @opentelemetry/api/internal/global-utils.js), not to a _contextManager property on the API object. So the first leg of the OR is effectively always true and the function runs its install path on every call.
  2. context.active() === ROOT_CONTEXT is true on every gRPC call made outside an active span. That's exactly what the Spanner session pool does during background warmup (BatchCreateSessions, keep-alives, pool maintenance). On those calls, the function context.disable()s the host app's already-installed ContextManager and replaces it with a fresh AsyncHooksContextManager.

Impact

Any host app that:

  • installs its own ContextManager (via NodeSDK, manual AsyncHooksContextManager, etc.)
  • propagates baggage on the root context (user id, request id, tenant id, etc.)
  • uses Spanner in the same process

…will see its baggage and active-context state silently disappear shortly after new Spanner({...}) warms up the session pool. Span parents and BaggageSpanProcessor-style enrichment break for everything downstream.

We hit this in a Node service that uses OTel + Spanner. Chats produced LLM observation log records with undefined chat ids because the host's baggage had been wiped by Spanner's pool-warmup startTrace call. Bisecting the regression landed exactly on adding @google-cloud/spanner to the process.

Affected versions confirmed: 8.7.1 (latest), 8.6.0, and the same body exists back through 7.21.0. No relevant change between 7.x and 8.x in this code path.

Fix

Use the public API surface: setGlobalContextManager() already returns false when a manager is already registered. Try to install; if something else owns the slot, back out without touching anything.

function ensureInitialContextManagerSet() {
  const contextManager = new AsyncHooksContextManager();
  contextManager.enable();
  if (!context.setGlobalContextManager(contextManager)) {
    contextManager.disable();
  }
}

This preserves the original intent — apps without OTel still get a working AsyncHooksContextManager — while ensuring Spanner cannot tear down a manager the host already configured.

Notes

  • Removed the now-unused ROOT_CONTEXT import from handwritten/spanner/src/instrument.ts.
  • Happy to add a unit test if the maintainers can point me at the preferred pattern; the existing observability-test/observability.ts blocks use a per-describe setGlobalContextManager for setup, which doesn't compose cleanly with assertions on this function's own registration behavior. The fix is verified end-to-end against a host service that previously showed the symptom.

`ensureInitialContextManagerSet()` (added to make async/await tracing work
for apps that haven't configured OpenTelemetry) currently tears down the
host application's `ContextManager` on most calls. The guard is OR-joined:

    if (!context['_contextManager'] || context.active() === ROOT_CONTEXT) {
      context.disable();
      // ...install fresh AsyncHooksContextManager
    }

Two problems:

1. `context['_contextManager']` is not a public field on the API singleton.
   `setGlobalContextManager` writes into the package-private global registry
   (`registerGlobal` in `@opentelemetry/api/internal/global-utils.js`), not
   to a `_contextManager` property. So the first leg of the OR is effectively
   always true and the function runs its install path on every call.

2. Even with that, `context.active() === ROOT_CONTEXT` fires on every gRPC
   call made outside an active span — exactly what the Spanner session pool
   does during background warmup (BatchCreateSessions, pool maintenance).
   On those calls, the function `context.disable()`s the host app's
   already-installed `ContextManager` and replaces it with a fresh one.
   Any baggage the host had set before that moment is silently lost, along
   with span parent linkage.

In practice this means: an app that uses OpenTelemetry, sets baggage on the
root context (e.g. user id, request id), and then issues a Spanner call,
will observe its baggage disappear on the next span emitted after Spanner's
pool warms up. We hit this in a Node service that uses OTel + Spanner —
chats emitted log records with `undefined` ids because host baggage had
been wiped by Spanner's pool-warmup `startTrace` call.

The fix uses the public API surface — `setGlobalContextManager()` already
returns `false` when a manager is already registered. Try to install; if
something else owns the slot, back out without touching anything.
@jcalem-rogo jcalem-rogo requested a review from a team as a code owner May 21, 2026 19:58
@product-auto-label product-auto-label Bot added the api: spanner Issues related to the Spanner API. label May 21, 2026
@google-cla
Copy link
Copy Markdown

google-cla Bot commented May 21, 2026

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the OpenTelemetry context manager initialization in the Spanner instrumentation to prevent breaking existing context managers by avoiding unnecessary teardowns. Instead of force-disabling previous managers, it now attempts to set a global manager and only keeps it if one wasn't already registered. Feedback was provided to optimize this process by using a module-level flag to prevent redundant object creation and initialization on every call.

Comment thread handwritten/spanner/src/instrument.ts
Latch on a module-level flag so we don't allocate + enable + disable a fresh
AsyncHooksContextManager on every `new Spanner({...})` when the host app has
already registered its own ContextManager. Addresses review feedback.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api: spanner Issues related to the Spanner API.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant