-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Description
Component(s)
receiver/datadog
Is your feature request related to a problem? Please describe.
HI!
As far as I see, currently restoring full 128bit trace id require caching root metadata #39654
I have concern that if we have really long running script and while it running we will receive more than trace_id_cache_size unique traceIds, it will not able to restore full traceId and we will miss the parent span.
But here is the trick: at least in PHP it is possible to add tag manually,
$corellationId = \DDTrace\logs_correlation_trace_id();
$corellationIdHigh = substr($corellationId, 0, 16);
\DDTrace\add_global_tag('_dd.p.tid', $corellationIdHigh);
\DDTrace\add_global_tag('otel.trace_id', $corellationId);
\DDTrace\add_distributed_tag('otel.trace_id', $corellationId);
\DDTrace\add_distributed_tag('otel.tid.high', $corellationIdHigh);From my observations add_global_tag will add tag to all child spans like so, but not the first, the one with _dd.p.tid set by extension :
\DDTrace\add_global_tag('_dd.p.tid', $corellationIdHigh);
\DDTrace\add_global_tag('otel.trace_id', $corellationId);Debug output of child span with _dd.p.tid from \DDTrace\add_global_tag('_dd.p.tid', $corellationIdHigh);:
Here's the log
otel-collector-1 | Span #36
otel-collector-1 | Trace ID : 8c10b1ae722c274628cf9fe2f4235627
otel-collector-1 | Parent ID : 18f5c3e88f648208
otel-collector-1 | ID : 788771bdf03c631e
otel-collector-1 | Name : Redis.connect
otel-collector-1 | Kind : Client
otel-collector-1 | Start time : 2025-07-06 10:18:25.187458707 +0000 UTC
otel-collector-1 | End time : 2025-07-06 10:18:25.187839441 +0000 UTC
otel-collector-1 | Status code : Ok
otel-collector-1 | Status message :
otel-collector-1 | Attributes:
otel-collector-1 | -> dd.span.Resource: Str(Redis.connect)
otel-collector-1 | -> datadog.span.id: Str(8685035467000537886)
otel-collector-1 | -> datadog.trace.id: Str(2940744878803605031)
otel-collector-1 | -> component: Str(phpredis)
otel-collector-1 | -> db.system: Str(redis)
otel-collector-1 | -> _dd.base_service: Str(symfony)
otel-collector-1 | -> _dd.p.tid: Str(8c10b1ae722c2746) # it not added by datadog extension, this is the one from `add_global_tag`
otel-collector-1 | -> otel.trace_id: Str(8c10b1ae722c274628cf9fe2f4235627) # and this one from `add_global_tag` too
otel-collector-1 | -> out.host: Str(172.17.0.1)
otel-collector-1 | -> out.port: Str(49028)
otel-collector-1 | -> span.kind: Str(client)And add_distributed_tag will add tag with _dd.p. prefix to the first span, the one with _dd.p.tid
Here's the log
otel-collector-1 | ScopeSpans #0
otel-collector-1 | ScopeSpans SchemaURL:
otel-collector-1 | InstrumentationScope Datadog 1.10.0
otel-collector-1 | Span #0
otel-collector-1 | Trace ID : 8c10b1ae722c274628cf9fe2f4235627
otel-collector-1 | Parent ID : e877e62adfe40b9b
otel-collector-1 | ID : 9a9d0e22fdc2bca8
otel-collector-1 | Name : symfony.request
otel-collector-1 | Kind : Server
otel-collector-1 | Start time : 2025-07-06 10:18:25.149190661 +0000 UTC
otel-collector-1 | End time : 2025-07-06 10:18:43.034958011 +0000 UTC
otel-collector-1 | Status code : Ok
otel-collector-1 | Status message :
otel-collector-1 | Attributes:
otel-collector-1 | -> dd.span.Resource: Str(event-stream/subscribe)
otel-collector-1 | -> sampling.priority: Str(1.000000)
otel-collector-1 | -> datadog.span.id: Str(11141076596633549992)
otel-collector-1 | -> datadog.trace.id: Str(2940744878803605031)
otel-collector-1 | -> _dd.p.otel.trace_id: Str(8c10b1ae722c274628cf9fe2f4235627) # this one comes from `add_distributed_tag`
otel-collector-1 | -> span.kind: Str(server)
otel-collector-1 | -> _dd.p.dm: Str(-0)
otel-collector-1 | -> _dd.p.otel.tid.high: Str(8c10b1ae722c2746) # this one also comes from `add_distributed_tag`
otel-collector-1 | -> _dd.p.tid: Str(8c10b1ae722c2746) # set by application, can not be overridden by `add_distributed_tag`
otel-collector-1 | -> runtime-id: Str(92e3eb83-f08e-43f7-a24a-0a52d5124e7f)
otel-collector-1 | -> user_agent.original: Str(Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:140.0) Gecko/20100101 Firefox/140.0)
otel-collector-1 | -> component: Str(symfony)
otel-collector-1 | -> symfony.route.action: Str(App\Controller\EventStreamController@subscribe)
otel-collector-1 | -> http.response.status_code: Str(200)
otel-collector-1 | -> _dd.parent_id: Str(0000000000000000)
otel-collector-1 | -> http.request.method: Str(GET)
otel-collector-1 | -> symfony.route.name: Str(event-stream/subscribe)
otel-collector-1 | -> process.pid: Double(84)
otel-collector-1 | -> _sampling_priority_v1: Double(1)
otel-collector-1 | -> php.compilation.total_time_ms: Double(1673.82)
otel-collector-1 | -> php.memory.peak_usage_bytes: Double(22235784)
otel-collector-1 | -> php.memory.peak_real_usage_bytes: Double(23072768)In other words, if I got it correctly, then with such addition if should be able to restore full traceId even if there no full traceId in cache.
opentelemetry-collector-contrib/receiver/datadogreceiver/internal/translator/traces_translator.go
Lines 88 to 90 in f4d39f9
| if val, ok := traceIDCache.Get(span.TraceID); ok { | |
| return val, nil | |
| } else if val, ok := span.Meta["_dd.p.tid"]; ok { |
But there are two problems:
-
Here is the bug with concurrent access to LRU cache (seems like it is datadog receiver causing collector to panic/crash with invalid memory address or nil pointer deference #40557):
See the log
otel-collector-1 | fatal error: concurrent map read and map write otel-collector-1 | otel-collector-1 | goroutine 18852 [running]: otel-collector-1 | internal/runtime/maps.fatal({0xc17b2da?, 0x79c87ac3b108?}) otel-collector-1 | runtime/panic.go:1058 +0x18 otel-collector-1 | github.com/hashicorp/golang-lru/v2/simplelru.(*LRU[...]).Get(0xd5660c0, 0xbe259c0) otel-collector-1 | github.com/hashicorp/golang-lru/v2@v2.0.7/simplelru/lru.go:72 +0x35 otel-collector-1 | github.com/open-telemetry/opentelemetry-collector-contrib/receiver/datadogreceiver/internal/translator.traceID64to128(0xc0018602a0, 0xc000df3760) otel-collector-1 | github.com/open-telemetry/opentelemetry-collector-contrib/receiver/datadogreceiver@v0.129.0/internal/translator/traces_translator.go:88 +0x3e otel-collector-1 | github.com/open-telemetry/opentelemetry-collector-contrib/receiver/datadogreceiver/internal/translator.ToTraces(0xc000bca680, 0xc001c42f70, 0xc0013752c0, 0xc000df3760) otel-collector-1 | github.com/open-telemetry/opentelemetry-collector-contrib/receiver/datadogreceiver@v0.129.0/internal/translator/traces_translator.go:198 +0xfda otel-collector-1 | github.com/open-telemetry/opentelemetry-collector-contrib/receiver/datadogreceiver.(*datadogReceiver).handleTraces(0xc000c930e0, {0xd44e7e0, 0xc00152d980}, 0xc0013752c0) otel-collector-1 | github.com/open-telemetry/opentelemetry-collector-contrib/receiver/datadogreceiver@v0.129.0/receiver.go:285 +0x398 otel-collector-1 | net/http.HandlerFunc.ServeHTTP(0xc000e75ec0?, {0xd44e7e0?, 0xc00152d980?}, 0x0?) otel-collector-1 | net/http/server.go:2294 +0x29 -
It looks a bit like dirty trick and I can't be sure implementation will not change in future and it still will work.
Describe the solution you'd like
It would be nice to have option to pick traceId from custom tag, like:
otel-collector-1 | Span #36
otel-collector-1 | Trace ID : 8c10b1ae722c274628cf9fe2f4235627
otel-collector-1 | Parent ID : 18f5c3e88f648208
otel-collector-1 | ID : 788771bdf03c631e
otel-collector-1 | Name : Redis.connect
otel-collector-1 | Kind : Client
otel-collector-1 | Attributes:
otel-collector-1 | -> otel.trace_id: Str(8c10b1ae722c274628cf9fe2f4235627) # Set in app by user, like with `add_global_tag` or any other way.Config to pick traceId from otel.trace_id
receivers:
datadog:
full_trace_id_tag: otel.trace_idTo be able to pick full 128bit traceId (or override it if we want) without using cache.
Describe alternatives you've considered
No response
Additional context
No response
Tip
React with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding +1 or me too, to help us triage it. Learn more here.