Summary
When a tracing::Tracer (or just a telemetry::Telemetry) is held by an object with static-storage lifetime and the process exits, the CurlImpl libcurl event-loop thread can call back into Telemetry::increment_counter after Telemetry::~Telemetry has already returned and its members (counters_, rates_, …) are mid-destruction. The result is a SEGV inside std::unordered_map::operator[] reaching into a destroyed hash table.
Reproducible on v2.0.0 (the version we're pinning). I read the v2.1.0 sources too — Telemetry::increment_counter and Telemetry::~Telemetry are byte-identical between v2.0.0 and v2.1.0, so the race shape is unchanged.
Symptoms
- AddressSanitizer reports a
SEGV on unknown address (typically a low, garbage-looking pointer) from inside Telemetry::increment_counter → std::unordered_map::operator[] → _M_bucket_index.
- Fires after Catch2 (or whatever test framework /
main) has already printed its success summary — i.e. during exit() / static destruction, not during the actual work the program did.
- Happens with
tracing_enabled=false and report_traces=false in the TracerConfig — the telemetry curl thread still starts and still sends app-started + queues app-closing, which is what triggers the in-flight callback.
Stack trace (ASan, g++ 15.2.1, x86_64-linux, debug build with -fsanitize=address,undefined)
I've trimmed the noisy hash-table internals; the relevant frames are #6 onwards:
==N==ERROR: AddressSanitizer: SEGV on unknown address 0x... (T1)
==> READ memory access from a background thread
#0–#5 std::__detail::_Hash_code_base<MetricContext<Metric<Counter>>, …>::_M_bucket_index
std::unordered_map<…>::operator[]
#6 Telemetry::increment_counter(
Metric<Counter> const&,
std::vector<std::string> const&) telemetry_impl.cpp:799
#7 send_payload(…)::<lambda(Error)> telemetry_impl.cpp:399
#8–#11 std::function<void(Error)>::operator()
#12 CurlImpl::handle_message(CURLMsg const&) curl.cpp:591
#13 CurlImpl::run() curl.cpp:559
#14 CurlImpl::CurlImpl(…)::<lambda()> curl.cpp:297
… std::thread invoker / asan_thread_start / start_thread
Root cause (as I read the code)
Telemetry::~Telemetry() (telemetry_impl.cpp:228 in v2.0.0) does two things:
cancel_tasks(tasks_) — cancels the heartbeat scheduler.
app_closing() → send_payload(\"app-closing\", …) — enqueues an HTTP POST onto http_client_, which is the Curl instance owned outside of Telemetry. The libcurl event-loop thread is still alive at this point; the send_payload lambda captures this (the Telemetry) and binds increment_counter calls inside the response/error callback.
The destructor then returns. Reverse-declaration-order member destruction begins: distributions_, rates_snapshot_, rates_, counters_snapshot_, counters_ (destroyed here), etc.
CurlImpl::~CurlImpl (curl.cpp:312) joins the event-loop thread, but it doesn't run until the parent Tracer / outer-owner destructor unwinds far enough to destroy it — which is later than the Telemetry member destruction inside the same enclosing object's destruction. There's a window where the event-loop thread is still processing the app-closing response and invokes the callback, which touches Telemetry::increment_counter → already-destroyed counters_ → SEGV.
This is invisible in normal use because most test programs let the OS reclaim the process before the curl thread's response lands. With ASan / under a static-storage owner, the destruction order is fully forced and the race is reliably triggered.
Minimal repro shape
We don't have a self-contained mini-repro pre-baked, but the shape is:
// In some translation unit:
static std::shared_ptr<datadog::tracing::Tracer> make_tracer() {
datadog::tracing::TracerConfig cfg;
cfg.service = \"repro\";
cfg.tracing_enabled = false;
cfg.report_traces = false;
auto vc = datadog::tracing::finalize_config(cfg);
return std::make_shared<datadog::tracing::Tracer>(*vc);
}
// And then somewhere with static lifetime (function-scope static or
// namespace-scope, doesn't matter):
static auto s_tracer = make_tracer();
int main() {
// Trivially exit. The crash is in static destruction, not in main().
}
Build with -fsanitize=address,undefined and run. The ASan SEGV from the curl thread is the symptom.
Suggested fix directions
(I read v2.0.0 and v2.1.0 of telemetry_impl.cpp; the same options apply to both.)
- Synchronously drain in flight callbacks in
Telemetry::~Telemetry before letting the destructor return — e.g. call http_client_->drain(deadline) (the API already exists on Curl, see curl.cpp:421). After drain, no callback referencing this can still be queued.
- Capture by
shared_ptr rather than by raw this in send_payload's response/error lambda, with the Telemetry either being internally shared_from_this() or wrapping its mutable state in a shared_ptr<State>. This keeps the closure self-contained and removes the cross-object ordering coupling.
- Document the lifetime contract — currently nothing in the public API warns the host that a
Tracer held in static storage may crash during process exit. If neither of the above fixes lands, at minimum a comment in tracer_config.h saying "do not hold Tracer in static storage" would save downstream debugging.
Option (1) feels like the smallest, most targeted change. (2) is more invasive but would also fix any future shape of this same bug class.
Context
Surfaced during a sanitizer sweep on a third-party C++ proxy. Both the Catch2 unit suite and the Jest integration suite under an ASan-instrumented proxy are otherwise clean (zero proxy-side findings); this is the only artifact, and it's 100% in dd-trace-cpp + libstdc++ frames.
Happy to provide more detail or test a proposed patch.
Summary
When a
tracing::Tracer(or just atelemetry::Telemetry) is held by an object with static-storage lifetime and the process exits, theCurlImpllibcurl event-loop thread can call back intoTelemetry::increment_counterafterTelemetry::~Telemetryhas already returned and its members (counters_,rates_, …) are mid-destruction. The result is a SEGV insidestd::unordered_map::operator[]reaching into a destroyed hash table.Reproducible on v2.0.0 (the version we're pinning). I read the v2.1.0 sources too —
Telemetry::increment_counterandTelemetry::~Telemetryare byte-identical between v2.0.0 and v2.1.0, so the race shape is unchanged.Symptoms
SEGV on unknown address(typically a low, garbage-looking pointer) from insideTelemetry::increment_counter→std::unordered_map::operator[]→_M_bucket_index.main) has already printed its success summary — i.e. duringexit()/ static destruction, not during the actual work the program did.tracing_enabled=falseandreport_traces=falsein theTracerConfig— the telemetry curl thread still starts and still sendsapp-started+ queuesapp-closing, which is what triggers the in-flight callback.Stack trace (ASan, g++ 15.2.1, x86_64-linux, debug build with
-fsanitize=address,undefined)I've trimmed the noisy hash-table internals; the relevant frames are #6 onwards:
Root cause (as I read the code)
Telemetry::~Telemetry()(telemetry_impl.cpp:228 in v2.0.0) does two things:cancel_tasks(tasks_)— cancels the heartbeat scheduler.app_closing()→send_payload(\"app-closing\", …)— enqueues an HTTP POST ontohttp_client_, which is theCurlinstance owned outside ofTelemetry. The libcurl event-loop thread is still alive at this point; thesend_payloadlambda capturesthis(theTelemetry) and bindsincrement_countercalls inside the response/error callback.The destructor then returns. Reverse-declaration-order member destruction begins:
distributions_,rates_snapshot_,rates_,counters_snapshot_,counters_(destroyed here), etc.CurlImpl::~CurlImpl(curl.cpp:312) joins the event-loop thread, but it doesn't run until the parentTracer/ outer-owner destructor unwinds far enough to destroy it — which is later than theTelemetrymember destruction inside the same enclosing object's destruction. There's a window where the event-loop thread is still processing theapp-closingresponse and invokes the callback, which touchesTelemetry::increment_counter→ already-destroyedcounters_→ SEGV.This is invisible in normal use because most test programs let the OS reclaim the process before the curl thread's response lands. With ASan / under a static-storage owner, the destruction order is fully forced and the race is reliably triggered.
Minimal repro shape
We don't have a self-contained mini-repro pre-baked, but the shape is:
Build with
-fsanitize=address,undefinedand run. The ASan SEGV from the curl thread is the symptom.Suggested fix directions
(I read v2.0.0 and v2.1.0 of
telemetry_impl.cpp; the same options apply to both.)Telemetry::~Telemetrybefore letting the destructor return — e.g. callhttp_client_->drain(deadline)(the API already exists onCurl, see curl.cpp:421). Afterdrain, no callback referencingthiscan still be queued.shared_ptrrather than by rawthisinsend_payload's response/error lambda, with theTelemetryeither being internallyshared_from_this()or wrapping its mutable state in ashared_ptr<State>. This keeps the closure self-contained and removes the cross-object ordering coupling.Tracerheld in static storage may crash during process exit. If neither of the above fixes lands, at minimum a comment intracer_config.hsaying "do not holdTracerin static storage" would save downstream debugging.Option (1) feels like the smallest, most targeted change. (2) is more invasive but would also fix any future shape of this same bug class.
Context
Surfaced during a sanitizer sweep on a third-party C++ proxy. Both the Catch2 unit suite and the Jest integration suite under an ASan-instrumented proxy are otherwise clean (zero proxy-side findings); this is the only artifact, and it's 100% in
dd-trace-cpp+ libstdc++ frames.Happy to provide more detail or test a proposed patch.