Fix: ETW span ownership lifetime by fmutlu68 · Pull Request #4070 · open-telemetry/opentelemetry-cpp

fmutlu68 · 2026-05-08T18:24:44Z

Changes

Hold shared ownership of the ETW tracer in Span to prevent dangling references
Update span creation to pass shared ownership
Testing: ./ci/do_ci.sh cmake.c++17.test

codecov · 2026-05-11T08:53:59Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 82.10%. Comparing base (3057616) to head (1f76b52).
⚠️ Report is 4 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #4070   +/-   ##
=======================================
  Coverage   82.10%   82.10%           
=======================================
  Files         385      385           
  Lines       15989    15989           
=======================================
  Hits        13126    13126           
  Misses       2863     2863

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

fmutlu68 · 2026-05-14T20:03:25Z

Could you please have a look here ? also @ThomsonTan It passes all tests

lalitb · 2026-05-21T06:40:48Z

ETW exporter is a bit different from other exporters: the tracer name maps to the ETW provider name/GUID, so creating multiple tracers for the same name is not the intended hot-path pattern. Applications should normally create one tracer per ETW provider name and reuse it for the process/component lifetime.

Because of that, I’d prefer not to fix this by adding shared_from_this()/shared_ptr ownership work to every span creation. StartSpan() is the high-throughput path for ETW. Could we instead document the intended ETW tracer reuse model and/or have TracerProvider retain created tracers so the lifetime protection is handled at setup/GetTracer() time rather than per span?

Also @ThomsonTan suggestions would be helpful, as he has more experience on this.

fmutlu68 · 2026-05-22T11:48:38Z

ETW exporter is a bit different from other exporters: the tracer name maps to the ETW provider name/GUID, so creating multiple tracers for the same name is not the intended hot-path pattern. Applications should normally create one tracer per ETW provider name and reuse it for the process/component lifetime.

Because of that, I’d prefer not to fix this by adding shared_from_this()/shared_ptr ownership work to every span creation. StartSpan() is the high-throughput path for ETW. Could we instead document the intended ETW tracer reuse model and/or have TracerProvider retain created tracers so the lifetime protection is handled at setup/GetTracer() time rather than per span?

Also @ThomsonTan suggestions would be helpful, as he has more experience on this.

Thanks for the thorough explanation, I completely agree regarding the hot-path overhead; atomic reference counting in StartSpan() is not ideal for high-throughput ETW logging. The alternative approach makes perfect sense. I will revert the shared_from_this()/shared_ptr changes on the Tracer/Span and instead have the TracerProvider retain ownership of the created Tracers (e.g., via an internal map) to ensure lifetime protection at setup/GetTracer() time. I'll push the updated implementation shortly.

lukeina2z · 2026-05-24T19:42:14Z

Can we cache these tracers in ETW TraceProvider. This would match the behavior of common TraceProvider in OTel. Here is the PR #4100. @fmutlu68 @lalitb @ThomsonTan

Fixes open-telemetry#3849. etw::TracerProvider::GetTracer() was newing up a fresh Tracer per call and returning the only shared_ptr to it. The catch is that etw::Span stores its parent as a raw reference (Tracer &owner_), so the moment a caller drops or reassigns that shared_ptr, every Span that came from that Tracer is left with a dangling owner_. Span::End() then crashes inside owner_.EndSpan(). The snippet from the issue makes it obvious: auto provider = std::make_unique<etw::TracerProvider>(); auto tracer = provider->GetTracer("Foo"); auto spanFoo = tracer->StartSpan("Span-Foo"); tracer = provider->GetTracer("Bar"); // Tracer "Foo" goes away auto spanBar = tracer->StartSpan("Span-Bar"); spanFoo->End(); // crash spanBar->End(); Under MSVC Debug + cdb this lands as an AV in Span::End dereferencing 0xFEEEFEEE -- the debug heap's freed-memory fill -- which is about as unambiguous as UAF gets. The fix is to let TracerProvider keep the Tracers it hands out, keyed by name and guarded by a mutex, and have GetTracer() return the cached entry on repeat calls. That makes the Tracer outlive any Span derived from it, which is the lifetime contract Span's raw owner_ already assumes. It is also the same pattern sdk::trace::TracerProvider has always used, which is why the issue reporter noted the SDK provider does not crash under the same usage. There was an earlier attempt in PR open-telemetry#4070 that switched Span to take a shared_ptr<Tracer> via shared_from_this(). That was turned down because StartSpan() is the ETW exporter's hot path and nobody wants an extra atomic refcount bump on every span just to keep the parent alive. Doing the lifetime work once in GetTracer() keeps owner_ as a plain reference and costs nothing per span. A few things worth calling out: - No API or ABI change. Only private members are added to TracerProvider, and <map>/<memory>/<mutex> are already included by etw_tracer.h. - The cache deliberately does not evict. Span holds its parent by raw reference, so dropping a Tracer that still has live Spans would put the original bug right back. In practice the ETW exporter is built around "one Tracer per provider name, reused for the lifetime of the process", and Windows itself caps registered ETW providers per process well below anything you would worry about for memory. - Keying on name only is consistent with the existing etw::Tracer constructor, which already declares args and schema_url UNREFERENCED_PARAMETER. - etw::LoggerProvider was checked for the same shape and does not have it -- Logger holds a back-reference to the Provider (long-lived), not to itself -- so no equivalent change is needed there. - Updated ETWTracer.GlobalSingletonTracer: it previously asserted that two GetTracer() calls with the same provider name produce different trace ids, which only held because of the bug being fixed here. GetTracer(name) is now idempotent per name, so the assertion is flipped to EXPECT_EQ and the stale sample-event comment updated to match. All other ETW tracer tests continue to pass. Repro before the change: AV in Span::End with the freed-fill operand. Same repro after: runs cleanly to completion. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Fixes open-telemetry#3849. etw::TracerProvider::GetTracer() was newing up a fresh Tracer per call and returning the only shared_ptr to it. The catch is that etw::Span stores its parent as a raw reference (Tracer &owner_), so the moment a caller drops or reassigns that shared_ptr, every Span that came from that Tracer is left with a dangling owner_. Span::End() then crashes inside owner_.EndSpan(). The snippet from the issue makes it obvious: auto provider = std::make_unique<etw::TracerProvider>(); auto tracer = provider->GetTracer("Foo"); auto spanFoo = tracer->StartSpan("Span-Foo"); tracer = provider->GetTracer("Bar"); // Tracer "Foo" goes away auto spanBar = tracer->StartSpan("Span-Bar"); spanFoo->End(); // crash spanBar->End(); Under MSVC Debug + cdb this lands as an AV in Span::End dereferencing 0xFEEEFEEE -- the debug heap's freed-memory fill -- which is about as unambiguous as UAF gets. The fix is to let TracerProvider keep the Tracers it hands out, keyed by name and guarded by a mutex, and have GetTracer() return the cached entry on repeat calls. That makes the Tracer outlive any Span derived from it, which is the lifetime contract Span's raw owner_ already assumes. It is also the same pattern sdk::trace::TracerProvider has always used, which is why the issue reporter noted the SDK provider does not crash under the same usage. There was an earlier attempt in PR open-telemetry#4070 that switched Span to take a shared_ptr<Tracer> via shared_from_this(). That was turned down because StartSpan() is the ETW exporter's hot path and nobody wants an extra atomic refcount bump on every span just to keep the parent alive. Doing the lifetime work once in GetTracer() keeps owner_ as a plain reference and costs nothing per span. While putting this together I tripped over a second bug that the cache exposes -- worth describing here because the fix lives in the same file. etw_tracer.h and etw_provider.h between them have three independent function-local statics that get tangled at process shutdown: 1. etw::TracerProvider itself (now owning the tracer cache). 2. Tracer::etwProvider() returns a `static ETWProvider instance`. 3. ETWProvider::providers() returns a `static std::map<...> providers` that lives inside that singleton's method, not inside the singleton object, and is lazily created on the first open()/is_registered() call. Before this change, all three were initialized in that order on the very first GetTracer() call: the TracerProvider was already constructed, then the Tracer ctor called etwProvider().open(...), which constructed open-telemetry#2 and open-telemetry#3. Reverse-order destruction therefore took out open-telemetry#3 first, while the TracerProvider (and its cached Tracers, which hold references into open-telemetry#3) was still alive. ~Tracer then ran etwProvider().close(provHandle) against a dangling reference and we got an AV at exit. With the old "fresh Tracer per call, caller drops it immediately" pattern this was hidden because the Tracer (and its provHandle reference) was almost always gone long before shutdown. The cache extends Tracer lifetime to match TracerProvider lifetime, which is exactly what tripped this over locally and on the Windows CI legs. The fix is small: both TracerProvider constructors now do (void)Tracer::etwProvider().is_registered(std::string{}); which forces both ETWProvider singletons to be constructed before the TracerProvider's body finishes. Reverse destruction then guarantees they outlive the cached Tracers. is_registered() is a public, side- effect-free read on ETWProvider that internally touches the providers map, which is exactly the static-init we need. It requires Tracer to befriend TracerProvider so the constructor can call the private etwProvider() accessor. A few things worth calling out: - No API or ABI change. Only private members are added to TracerProvider, and <map>/<memory>/<mutex> are already included by etw_tracer.h. The friend declaration is purely internal. - The cache deliberately does not evict. Span holds its parent by raw reference, so dropping a Tracer that still has live Spans would put the original bug right back. In practice the ETW exporter is built around "one Tracer per provider name, reused for the lifetime of the process", and Windows itself caps registered ETW providers per process well below anything you would worry about for memory. - Keying on name only is consistent with the existing etw::Tracer constructor, which already declares args and schema_url UNREFERENCED_PARAMETER. - etw::LoggerProvider was checked for the same shape and does not currently cache Loggers, so the destruction-order fix is not needed there today. If a similar cache is ever added to LoggerProvider it will want the same one-line force-init in its constructor. - Updated ETWTracer.GlobalSingletonTracer: it previously asserted that two GetTracer() calls with the same provider name produce different trace ids, which only held because of the bug being fixed here. GetTracer(name) is now idempotent per name, so the assertion is flipped to EXPECT_EQ and the stale sample-event comment updated to match. All other ETW tracer tests continue to pass. Repro before the change: AV in Span::End with the freed-fill operand, plus exit-time AV in ~Tracer on ctest runs that exercise a single test in isolation. Same repros after: both run cleanly to completion. Verified locally with the full ctest suite on a Windows abiv2 maintainer-mode build (1104/1104 passing) and on an abiv1 Debug build (43/43 ETW tests passing). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

lukeina2z · 2026-05-25T00:30:17Z

Applications should normally create one tracer per ETW provider name and reuse it for the process/component lifetime.

The problem is the single tracer will always use an identical trace ID. Even you are handling requests from multiple clients. #3846

lalitb · 2026-05-25T01:42:07Z

Can we cache these tracers in ETW TraceProvider. This would match the behavior of common TraceProvider in OTel. Here is the PR #4100. @fmutlu68 @lalitb @ThomsonTan

Yes, this is what I suggested here - Could we instead document the intended ETW tracer reuse model and/or have TracerProvider retain created tracers so the lifetime protection is handled at setup/GetTracer() time rather than per span?

lalitb · 2026-05-25T01:59:22Z

Applications should normally create one tracer per ETW provider name and reuse it for the process/component lifetime.

The problem is the single tracer will always use an identical trace ID. Even you are handling requests from multiple clients. #3846

Yes, Agree. I believe tracer caching will solve the lifetime issue.

My main concern is that applications should avoid calling GetTracer()->StartSpan() in the hot path. Once GetTracer() is backed by a tracer cache, it will likely involve a mutex/map lookup, so putting it on the span creation path would add unnecessary synchronization overhead. For ETW, the intended usage should be to acquire the tracer once per provider name and reuse it, then call StartSpan() on that cached tracer.

fmutlu68 added 2 commits May 8, 2026 20:47

fix: preserve delta start timestamp in fast path

c6738ce

fix: keep tracer alive for ETW spans

62446d8

fmutlu68 requested a review from a team as a code owner May 8, 2026 18:24

fmutlu68 mentioned this pull request May 8, 2026

ETW Exporter: Crash When Ending ETW Exporter Span #3849

Open

chore: remove unrelated metrics changes

005d87d

fmutlu68 changed the title ~~Fix ETW span ownership lifetime~~ Fix: ETW span ownership lifetime May 10, 2026

fmutlu68 and others added 4 commits May 11, 2026 12:47

chore: format etw tracer header

3bc2469

chore: fix format errors

add9b1e

Merge branch 'main' into fix-etw-span-ownership

c283be1

chore: fix format errors

1f76b52

lukeina2z mentioned this pull request May 24, 2026

[EXPORTER ETW] Fix Span::End() use-after-free when caller drops Tracer #4100

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: ETW span ownership lifetime#4070

Fix: ETW span ownership lifetime#4070
fmutlu68 wants to merge 7 commits into
open-telemetry:mainfrom
fmutlu68:fix-etw-span-ownership

fmutlu68 commented May 8, 2026

Uh oh!

codecov Bot commented May 11, 2026 •

edited

Loading

Uh oh!

fmutlu68 commented May 14, 2026

Uh oh!

lalitb commented May 21, 2026 •

edited

Loading

Uh oh!

fmutlu68 commented May 22, 2026

Uh oh!

lukeina2z commented May 24, 2026

Uh oh!

lukeina2z commented May 25, 2026

Uh oh!

lalitb commented May 25, 2026 •

edited

Loading

Uh oh!

lalitb commented May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

fmutlu68 commented May 8, 2026

Changes

Uh oh!

codecov Bot commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

fmutlu68 commented May 14, 2026

Uh oh!

lalitb commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fmutlu68 commented May 22, 2026

Uh oh!

lukeina2z commented May 24, 2026

Uh oh!

lukeina2z commented May 25, 2026

Uh oh!

lalitb commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lalitb commented May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov Bot commented May 11, 2026 •

edited

Loading

lalitb commented May 21, 2026 •

edited

Loading

lalitb commented May 25, 2026 •

edited

Loading