[EXPORTER ETW] Generate a fresh trace id for every root span#4101
Open
lukeina2z wants to merge 1 commit into
Open
[EXPORTER ETW] Generate a fresh trace id for every root span#4101lukeina2z wants to merge 1 commit into
lukeina2z wants to merge 1 commit into
Conversation
575ff53 to
4004611
Compare
Fixes open-telemetry#3846. etw::Tracer was caching a single TraceId in a member (traceId_) generated once in its constructor, and using that same id as the fallback trace id for every root span it created. So if you did something as ordinary as: auto tracer = provider->GetTracer("Geneva-Tracer-Foo"); auto spanFoo = tracer->StartSpan("Span-Foo"); // root auto spanBar = tracer->StartSpan("Span-Bar"); // also root both Foo and Bar would come out the back end stamped with the same env_dt_traceId, and the Geneva/Jarvis side would collapse two unrelated requests into one logical trace. The issue has a screenshot of exactly that: Span-Foo and Span-Bar emitted with the same trace id of 8507dcdc651af34fb8f21152a98e2977. The OTel spec says each root span starts a new trace, and that is also what sdk::trace::Tracer does -- when there is no valid parent it calls generator.GenerateTraceId() on the spot, no cache anywhere (see sdk/src/trace/tracer.cc, the "if (parent_context.IsValid()) ... else trace_id = generator.GenerateTraceId()" block). The ETW exporter was the odd one out here. Fix is one line in StartSpan(): when parentContext is invalid, call GetIdGenerator(tracerProvider_).GenerateTraceId() instead of returning the cached traceId_. This is the same call shape already used on the next line to mint the SpanId, so the access path is proven and thread-safe. A nice side benefit: trace-id-ratio samplers were getting the same trace id for every root span on a Tracer and therefore making the same accept/reject decision for all of them, which defeated the point of probabilistic sampling. With this change the sampler sees a fresh id per root span and behaves the way the SDK does. I deliberately did not remove the now-unused traceId_ member, its init in the ctor, or the public trace_id() accessor on Tracer. The accessor is part of the exporter's public surface and removing it would be a source break for anyone calling tracer.trace_id(). Happy to follow up with a cleanup PR if maintainers prefer. Also updated ETWTracer.GlobalSingletonTracer in exporters/etw/test/etw_tracer_test.cc. That test was asserting EXPECT_EQ(traceId1, traceId3) -- i.e. two consecutive root spans on the same Tracer should share a trace id -- which was just encoding the bug. After this fix each root span gets its own fresh trace id, so the assertion is flipped to EXPECT_NE, plus checks that all three trace ids are valid and distinct, and that the C++11 magic-static GetGlobalTracer() really does return the same Tracer reference both times. Comments and the illustrative sample-event TraceId in the doc comment were updated to match the new behavior. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
4004611 to
f7d1649
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #4101 +/- ##
==========================================
- Coverage 82.01% 82.00% -0.01%
==========================================
Files 385 385
Lines 16030 16030
==========================================
- Hits 13145 13143 -2
- Misses 2885 2887 +2 🚀 New features to boost your workflow:
|
Member
|
Thanks for the PR. The intention here is right: different root spans should not all get the same trace id. Just to clarify, the current ETW behavior was intentional for performance reasons. ETW is a high-throughput exporter and does not always follow the regular SDK behavior when that adds cost to the hot path. With this change, every root |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #3846.
etw::Tracer was caching a single TraceId in a member (traceId_) generated once in its constructor, and using that same id as the fallback trace id for every root span it created. So if you did something as ordinary as:
both Foo and Bar would come out the back end stamped with the same env_dt_traceId, and the Geneva/Jarvis side would collapse two unrelated requests into one logical trace. The issue has a screenshot of exactly that: Span-Foo and Span-Bar emitted with the same trace id of 8507dcdc651af34fb8f21152a98e2977.
The OTel spec says each root span starts a new trace, and that is also what sdk::trace::Tracer does -- when there is no valid parent it calls generator.GenerateTraceId() on the spot, no cache anywhere (see sdk/src/trace/tracer.cc, the "if (parent_context.IsValid()) ... else trace_id = generator.GenerateTraceId()" block). The ETW exporter was the odd one out here.
Fix is one line in StartSpan(): when parentContext is invalid, call GetIdGenerator(tracerProvider_).GenerateTraceId() instead of returning the cached traceId_. This is the same call shape already used on the next line to mint the SpanId, so the access path is proven and thread-safe.
A nice side benefit: trace-id-ratio samplers were getting the same trace id for every root span on a Tracer and therefore making the same accept/reject decision for all of them, which defeated the point of probabilistic sampling. With this change the sampler sees a fresh id per root span and behaves the way the SDK does.
I deliberately did not remove the now-unused traceId_ member, its init in the ctor, or the public trace_id() accessor on Tracer. The accessor is part of the exporter's public surface and removing it would be a source break for anyone calling tracer.trace_id(). Happy to follow up with a cleanup PR if maintainers prefer.
Fixes # (issue)
Changes
Please provide a brief description of the changes here.
For significant contributions please make sure you have completed the following items:
CHANGELOG.mdupdated for non-trivial changes