Skip to content

[EXPORTER ETW] Generate a fresh trace id for every root span#4101

Open
lukeina2z wants to merge 1 commit into
open-telemetry:mainfrom
lukeina2z:fix/etw-root-span-trace-id-3846
Open

[EXPORTER ETW] Generate a fresh trace id for every root span#4101
lukeina2z wants to merge 1 commit into
open-telemetry:mainfrom
lukeina2z:fix/etw-root-span-trace-id-3846

Conversation

@lukeina2z
Copy link
Copy Markdown

Fixes #3846.

etw::Tracer was caching a single TraceId in a member (traceId_) generated once in its constructor, and using that same id as the fallback trace id for every root span it created. So if you did something as ordinary as:

auto tracer  = provider->GetTracer("Geneva-Tracer-Foo");
auto spanFoo = tracer->StartSpan("Span-Foo");   // root
auto spanBar = tracer->StartSpan("Span-Bar");   // also root

both Foo and Bar would come out the back end stamped with the same env_dt_traceId, and the Geneva/Jarvis side would collapse two unrelated requests into one logical trace. The issue has a screenshot of exactly that: Span-Foo and Span-Bar emitted with the same trace id of 8507dcdc651af34fb8f21152a98e2977.

The OTel spec says each root span starts a new trace, and that is also what sdk::trace::Tracer does -- when there is no valid parent it calls generator.GenerateTraceId() on the spot, no cache anywhere (see sdk/src/trace/tracer.cc, the "if (parent_context.IsValid()) ... else trace_id = generator.GenerateTraceId()" block). The ETW exporter was the odd one out here.

Fix is one line in StartSpan(): when parentContext is invalid, call GetIdGenerator(tracerProvider_).GenerateTraceId() instead of returning the cached traceId_. This is the same call shape already used on the next line to mint the SpanId, so the access path is proven and thread-safe.

A nice side benefit: trace-id-ratio samplers were getting the same trace id for every root span on a Tracer and therefore making the same accept/reject decision for all of them, which defeated the point of probabilistic sampling. With this change the sampler sees a fresh id per root span and behaves the way the SDK does.

I deliberately did not remove the now-unused traceId_ member, its init in the ctor, or the public trace_id() accessor on Tracer. The accessor is part of the exporter's public surface and removing it would be a source break for anyone calling tracer.trace_id(). Happy to follow up with a cleanup PR if maintainers prefer.

Fixes # (issue)

Changes

Please provide a brief description of the changes here.

For significant contributions please make sure you have completed the following items:

  • CHANGELOG.md updated for non-trivial changes
  • Unit tests have been added
  • Changes in public API reviewed

@lukeina2z lukeina2z requested a review from a team as a code owner May 24, 2026 21:37
@lukeina2z lukeina2z force-pushed the fix/etw-root-span-trace-id-3846 branch from 575ff53 to 4004611 Compare May 24, 2026 22:21
Fixes open-telemetry#3846.

etw::Tracer was caching a single TraceId in a member (traceId_)
generated once in its constructor, and using that same id as the
fallback trace id for every root span it created. So if you did
something as ordinary as:

    auto tracer  = provider->GetTracer("Geneva-Tracer-Foo");
    auto spanFoo = tracer->StartSpan("Span-Foo");   // root
    auto spanBar = tracer->StartSpan("Span-Bar");   // also root

both Foo and Bar would come out the back end stamped with the same
env_dt_traceId, and the Geneva/Jarvis side would collapse two
unrelated requests into one logical trace. The issue has a screenshot
of exactly that: Span-Foo and Span-Bar emitted with the same trace id
of 8507dcdc651af34fb8f21152a98e2977.

The OTel spec says each root span starts a new trace, and that is
also what sdk::trace::Tracer does -- when there is no valid parent it
calls generator.GenerateTraceId() on the spot, no cache anywhere (see
sdk/src/trace/tracer.cc, the "if (parent_context.IsValid()) ... else
trace_id = generator.GenerateTraceId()" block). The ETW exporter was
the odd one out here.

Fix is one line in StartSpan(): when parentContext is invalid, call
GetIdGenerator(tracerProvider_).GenerateTraceId() instead of returning
the cached traceId_. This is the same call shape already used on the
next line to mint the SpanId, so the access path is proven and
thread-safe.

A nice side benefit: trace-id-ratio samplers were getting the same
trace id for every root span on a Tracer and therefore making the
same accept/reject decision for all of them, which defeated the point
of probabilistic sampling. With this change the sampler sees a fresh
id per root span and behaves the way the SDK does.

I deliberately did not remove the now-unused traceId_ member, its
init in the ctor, or the public trace_id() accessor on Tracer. The
accessor is part of the exporter's public surface and removing it
would be a source break for anyone calling tracer.trace_id(). Happy
to follow up with a cleanup PR if maintainers prefer.

Also updated ETWTracer.GlobalSingletonTracer in
exporters/etw/test/etw_tracer_test.cc. That test was asserting
EXPECT_EQ(traceId1, traceId3) -- i.e. two consecutive root spans
on the same Tracer should share a trace id -- which was just
encoding the bug. After this fix each root span gets its own
fresh trace id, so the assertion is flipped to EXPECT_NE, plus
checks that all three trace ids are valid and distinct, and that
the C++11 magic-static GetGlobalTracer() really does return the
same Tracer reference both times. Comments and the illustrative
sample-event TraceId in the doc comment were updated to match
the new behavior.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@lukeina2z lukeina2z force-pushed the fix/etw-root-span-trace-id-3846 branch from 4004611 to f7d1649 Compare May 24, 2026 22:30
@codecov
Copy link
Copy Markdown

codecov Bot commented May 24, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 82.00%. Comparing base (72953ea) to head (f7d1649).

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #4101      +/-   ##
==========================================
- Coverage   82.01%   82.00%   -0.01%     
==========================================
  Files         385      385              
  Lines       16030    16030              
==========================================
- Hits        13145    13143       -2     
- Misses       2885     2887       +2     

see 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@lalitb
Copy link
Copy Markdown
Member

lalitb commented May 25, 2026

Thanks for the PR. The intention here is right: different root spans should not all get the same trace id.

Just to clarify, the current ETW behavior was intentional for performance reasons. ETW is a high-throughput exporter and does not always follow the regular SDK behavior when that adds cost to the hot path.

With this change, every root StartSpan() will generate a new trace id, which means an extra GUID (or custom ID) generation during span creation. We need to confirm that cost is acceptable for ETW scenarios for existing users, or make this optional so existing users are not affected by default?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Issue: ETW Exporter Tracer Uses a Single Trace ID for All Spans

2 participants