Skip to content

feat(spanner): optimize RequestId propagation and minimize OpenTelemetry active tracing overhead#8329

Open
surbhigarg92 wants to merge 3 commits into
mainfrom
performance_optimization
Open

feat(spanner): optimize RequestId propagation and minimize OpenTelemetry active tracing overhead#8329
surbhigarg92 wants to merge 3 commits into
mainfrom
performance_optimization

Conversation

@surbhigarg92
Copy link
Copy Markdown
Contributor

@surbhigarg92 surbhigarg92 commented May 20, 2026

Optimize RequestId propagation and minimize OpenTelemetry active tracing overhead.

@product-auto-label product-auto-label Bot added the api: spanner Issues related to the Spanner API. label May 20, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a mechanism to check if tracing is enabled before performing instrumentation, improving performance by avoiding unnecessary work when tracing is inactive. It also optimizes project ID retrieval in prepareGapicRequest_ by caching the result and refactors request ID header injection. Feedback identifies a logic bug in the isTracingEnabled caching mechanism regarding the precedence of custom providers and global state pollution. Additionally, suggestions were made to correct a copy-pasted docstring and to utilize the existing craftRequestId utility to maintain consistency.

Comment thread handwritten/spanner/src/instrument.ts Outdated
Comment thread handwritten/spanner/src/instrument.ts
Comment thread handwritten/spanner/src/request_id_header.ts Outdated
@surbhigarg92 surbhigarg92 force-pushed the performance_optimization branch from 5ea84d5 to 652517b Compare May 20, 2026 16:58
@surbhigarg92
Copy link
Copy Markdown
Contributor Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces conditional tracing logic to the Spanner client, ensuring that observability features like context propagation and request ID attribution are only active when tracing is enabled. It also refactors project ID replacement and request ID header generation for improved efficiency. Feedback from the review highlights critical issues where project ID and resource header replacements are incorrectly skipped after the first request, which could lead to incorrect headers in subsequent calls. Additionally, the reviewer suggested refining the tracing enablement cache to avoid permanently disabling tracing if the client is initialized before the OpenTelemetry SDK is fully configured.

Comment thread handwritten/spanner/src/index.ts Outdated
Comment on lines +1718 to +1725
config.headers[CLOUD_RESOURCE_HEADER] = replaceProjectIdToken(
config.headers[CLOUD_RESOURCE_HEADER],
projectId!,
);
this.projectIdReplaced_ = true;
} else {
reqOpts = config.reqOpts;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The CLOUD_RESOURCE_HEADER replacement has been moved inside the if (!this.projectIdReplaced_) block, meaning it will be skipped for all requests after the first one. This header must be updated for every request to ensure the correct project ID is sent to the backend.

        this.projectIdReplaced_ = true;
      }
      config.headers[CLOUD_RESOURCE_HEADER] = replaceProjectIdToken(
        config.headers[CLOUD_RESOURCE_HEADER],
        projectId!,
      );

Copy link
Copy Markdown
Contributor Author

@surbhigarg92 surbhigarg92 May 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During the first request, the SDK permanently replaces the {{projectId}} token in-place inside all cached instance and database objects (instance.formattedName_, database.formattedName_). For all subsequent requests, the headers are constructed from these already-resolved properties, meaning config.headers[CLOUD_RESOURCE_HEADER] already contains the correct project ID. Running the replacement on every subsequent request is mathematically redundant and adds unnecessary regex overhead on Node's single-threaded event loop.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think that this is an actual problem, and then specifically for non-cached resource names. See this test case:

it('real application flow: should replace project ID tokens on Backup objects after initial Spanner request', done => {
  const {Instance} = require('../src/instance');
  const {Backup} = require('../src/backup');
  // Stub setup...
  const appSpanner = new Spanner({projectId: '{{projectId}}'});
  asAny(appSpanner).auth.getProjectId = callback => {
    callback(null, PROJECT_ID);
  };
  const realInstance = new Instance(appSpanner, 'my-instance');
  appSpanner.instances_.set('my-instance', realInstance);
  // Backup created before project ID is replaced
  const realBackup = new Backup(realInstance, 'my-backup');
  // 1. Initial request triggers prepareGapicRequest_
  FAKE_GAPIC_CLIENT.getInstanceConfig = (reqOpts, gaxOpts, callback) => {
    callback(null, {});
  };
  appSpanner.getInstanceConfig('nam1', err => {
    if (err) return done(err);
    // At this point, appSpanner.projectIdReplaced_ is true.
    // 2. Application calls backup.getMetadata()
    FAKE_GAPIC_CLIENT.getBackup = reqOpts => {
      try {
        assert.strictEqual(
          reqOpts.name,
          `projects/${PROJECT_ID}/instances/my-instance/backups/my-backup`,
        );
        done();
      } catch (e) {
        done(e);
      }
      return Promise.resolve([{}]);
    };
    realBackup.getMetadata();
  });
});

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed this case, this seems a problem only for classes like Backups.

Because any DataClient operations will be done via instance object which is created using Spanner instance. For these objects we are replacing the projectId.

this.instances_.forEach(instance => {
          instance.formattedName_ = replaceProjectIdToken(
            instance.formattedName_,
            projectId!,
          );
          instance.databases_.forEach(database => {
            database.formattedName_ = replaceProjectIdToken(
              database.formattedName_,
              projectId!,
            );
          });
        });

I think the same problem will come for instanceConfigs object also. Using Spanner client for AdminOperations are deprecated but it may still impact existing customers . Let me think of an alternate solution

Comment thread handwritten/spanner/src/instrument.ts Outdated
@surbhigarg92 surbhigarg92 force-pushed the performance_optimization branch from a983da5 to 571de00 Compare May 21, 2026 06:08
@surbhigarg92 surbhigarg92 force-pushed the performance_optimization branch from 571de00 to 85e4b52 Compare May 21, 2026 06:13
@surbhigarg92 surbhigarg92 marked this pull request as ready for review May 21, 2026 06:16
@surbhigarg92 surbhigarg92 requested a review from a team as a code owner May 21, 2026 06:16
@olavloite olavloite changed the title feat(spanner): optimize RequestId propagation and minimize OpenTeleme… feat(spanner): optimize RequestId propagation and minimize OpenTelemetry active tracing overhead May 21, 2026
const gaxClient = this.clients_.get(clientName)!;
let reqOpts = extend(true, {}, config.reqOpts);
reqOpts = replaceProjectIdToken(reqOpts, projectId!);
// It would have been preferable to replace the projectId already in the
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we really remove this comment? Isn't it still valid?

const probeSpan = globalProvider
.getTracer(TRACER_NAME, TRACER_VERSION)
.startSpan('probe');
const isRecording = probeSpan.isRecording();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is a safe assumption. It assumes that probeSpan.isRecording() will return true if tracing is enabled. But if tracing is enabled with a 5% sample rate, then there is no guarantee that this will return true. Or am I misunderstanding what is going on here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for highlighting it. Completely missed this . Another option which I was trying was below which also didn't look like a strong approach. I am checking this with OpenTelemetry team . Unlike Java, Node does not expose an option to check if tracer is enabled https://github.com/open-telemetry/opentelemetry-java/blob/31b3cd5f561a7cf6278a255fad33d40887c1a48b/api/all/src/main/java/io/opentelemetry/api/trace/Tracer.java#L72

const globalProvider = trace.getTracerProvider();
  if (globalProvider) {
    let delegate = globalProvider;
    if (typeof (globalProvider as any).getDelegate === 'function') {
      delegate = (globalProvider as any).getDelegate();
    }
    if (delegate) {
      const name = delegate.constructor.name;
      // Exclude the dummy NoopTracerProvider and uninitialized ProxyTracerProvider
      if (name !== 'NoopTracerProvider' && name !== 'ProxyTracerProvider') {
        globalTracingEnabled = true;
        return true;
      }
    }
  }

*/
function isGlobalTracingEnabled(): boolean {
if (globalTracingEnabled !== undefined) {
return globalTracingEnabled;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This caching means that the result that is returned the first time is valid for the lifetime of the application. This means that:

  1. If this function happens to be called before the application has configured OpenTelemetry, then the value that it calculates and caches can be wrong.
  2. It does not take into the (maybe theoretical) possibility that an application could change its configuration later.

Copy link
Copy Markdown
Contributor Author

@surbhigarg92 surbhigarg92 May 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this function happens to be called before the application has configured OpenTelemetry, then the value that it calculates and caches can be wrong.

Expectation is OpenTelemetry Global registration should be done before Spanner instance is created, even in Java we expect customers to do before SpannerInstance creation , if done later it will not be picked for adding traces.

It does not take into the (maybe theoretical) possibility that an application could change its configuration later.

If opentelemetry provider is passed while creating Spanner object that will be considered, but global configuration is not expected to be changed later

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we try to accommodate the requirement of letting customer register OpenTelemetry later, we will not be able to avoid enabling registering of AsyncHooksContextManager() . Registering this adds a good load on the application.

X_GOOG_SPANNER_REQUEST_ID_SPAN_ATTR,
attributeXGoogSpannerRequestIdToActiveSpan,
craftRequestId,
PROCESS_PREFIX,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really used elsewhere?

InMemorySpanExporter,
} = require('@opentelemetry/sdk-trace-node');
const {SimpleSpanProcessor} = require('@opentelemetry/sdk-trace-base');
const {startTrace, ObservabilityOptions} = require('../src/instrument');
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(not related to this line, but it feels like the most logical place to add this comment)

We are not adding any new tests for this. Should we add tests that verify that:

  1. It does not matter when the OpenTelemetry configuration is done (before or after creating a Spanner instance).
  2. It does not matter what the trace sampling is. When tracing is enabled, even with a 1% sampling rate, then the request ID should be added to the traces.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not matter when the OpenTelemetry configuration is done (before or after creating a Spanner instance).

As mentioned in previous comment reply, we need to discuss if we want to allow this usecase.

It does not matter what the trace sampling is. When tracing is enabled, even with a 1% sampling rate, then the request ID should be added to the traces.

Sure will add it

Comment on lines +1718 to +1725
config.headers[CLOUD_RESOURCE_HEADER] = replaceProjectIdToken(
config.headers[CLOUD_RESOURCE_HEADER],
projectId!,
);
this.projectIdReplaced_ = true;
} else {
reqOpts = config.reqOpts;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think that this is an actual problem, and then specifically for non-cached resource names. See this test case:

it('real application flow: should replace project ID tokens on Backup objects after initial Spanner request', done => {
  const {Instance} = require('../src/instance');
  const {Backup} = require('../src/backup');
  // Stub setup...
  const appSpanner = new Spanner({projectId: '{{projectId}}'});
  asAny(appSpanner).auth.getProjectId = callback => {
    callback(null, PROJECT_ID);
  };
  const realInstance = new Instance(appSpanner, 'my-instance');
  appSpanner.instances_.set('my-instance', realInstance);
  // Backup created before project ID is replaced
  const realBackup = new Backup(realInstance, 'my-backup');
  // 1. Initial request triggers prepareGapicRequest_
  FAKE_GAPIC_CLIENT.getInstanceConfig = (reqOpts, gaxOpts, callback) => {
    callback(null, {});
  };
  appSpanner.getInstanceConfig('nam1', err => {
    if (err) return done(err);
    // At this point, appSpanner.projectIdReplaced_ is true.
    // 2. Application calls backup.getMetadata()
    FAKE_GAPIC_CLIENT.getBackup = reqOpts => {
      try {
        assert.strictEqual(
          reqOpts.name,
          `projects/${PROJECT_ID}/instances/my-instance/backups/my-backup`,
        );
        done();
      } catch (e) {
        done(e);
      }
      return Promise.resolve([{}]);
    };
    realBackup.getMetadata();
  });
});

const span = trace.getActiveSpan();
if (span) {
return span;
if (isTracingEnabled()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this could just as well call isGlobalTracingEnabled directly, as it does not supply any options. So the more specific check whether a tracer has been set on any options is always skipped. Is that intentional?

Also, calling trace.getActiveSpan() should be an extremely cheap method to call when tracing is disabled, so I am not sure this entire method really optimizes anything.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api: spanner Issues related to the Spanner API.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants