Skip to content

Listen on both TCP and named pipes when a pipe name is configured#94

Merged
Lewis-E merged 4 commits into
mainfrom
lewis/SVLS-8243/named-pipe-backwards-compatability
May 6, 2026
Merged

Listen on both TCP and named pipes when a pipe name is configured#94
Lewis-E merged 4 commits into
mainfrom
lewis/SVLS-8243/named-pipe-backwards-compatability

Conversation

@Lewis-E
Copy link
Copy Markdown
Contributor

@Lewis-E Lewis-E commented Mar 6, 2026

Tracers running on Azure Functions Premium (multifunction) plans need named-pipe transport so each function gets isolated trace/dogstatsd channels. Older tracers without pipe support still need to reach the agent over TCP. Today the binary picks one transport: configuring a pipe name disables TCP, which would strand any old-tracer + new-compat combination.

This PR makes the binary listen on both transports concurrently when DD_APM_WINDOWS_PIPE_NAME is set: TCP stays on DD_APM_RECEIVER_PORT (default 8126), named pipe is added alongside. When no pipe name is configured, behavior is unchanged (TCP-only, identical to before).

What changed

  • mini_agent.rs: spawn TCP and (optionally) named-pipe accept loops, supervised by a single serve() function. Any task dying unexpectedly is fatal — tracers pick one transport at startup and don't switch, so silently degrading a transport would strand tracers using it.
  • mini_agent.rs and main.rs: start_mini_agent now takes shutdown_rx: tokio::sync::watch::Receiver<bool> (was oneshot::Receiver<()>). Each accept loop subscribes via .clone(), so the shutdown signal fans out to both transports. Production caller in main.rs also updated.
  • mini_agent.rs: graceful shutdown is preserved across the dual-transport architecture — on shutdown_rx, both accept loops drain their in-flight HTTP handlers before the supervisor signals the stats flusher (flusher_shutdown_tx) to do its final emit. The trace-stats invariant ("no AddChunk/ClientStatsPayload messages lost on shutdown") stays intact.
  • config.rs: drop the override that forced dd_apm_receiver_port = 0 whenever a pipe was configured. That override was load-bearing only when pipe-mode meant "no TCP."
  • Pipe code remains fully cfg-gated. On non-Windows / no windows-pipes builds (Lambda), zero pipe symbols make it into the binary.

Test plan

  • Existing TCP-only and pipe-only integration tests pass (integration_test.rs)
  • Unit test for default-port behavior when a pipe is configured (test_apm_windows_pipe_name)
  • New test_mini_agent_dual_transport_with_real_flushers integration test (Windows + windows-pipes): boots one agent with both transports configured, sends a service-tagged trace through each, fires shutdown_tx, and asserts both traces and the final stats flush reach the mock backend. Exercises the watch fan-out and graceful drain on both accept loops in one go.
  • Cross-target compile-check: macOS host, x86_64-pc-windows-gnu --features windows-pipes, and the default Lambda build
  • E2e on a multifunction Premium plan
Screenshot 2026-04-30 at 4 33 04 PM

Related PRs and context

Part of the multifunction-named-pipes workstream — epic SVLS-8429, this PR's ticket SVLS-8243.

  • #58 — original Windows named pipes for the trace agent (the feature this PR makes backwards-compatible)
  • #51 — trace stats computation (this PR rebased on top; integrated stats_concentrator_service_handle and the graceful drain into the new serve() supervisor)
  • datadog-serverless-compat-dotnet#20 — compat layer wiring the per-function pipe coordination
  • dd-trace-dotnet#8164 — .NET tracer side that consumes the pipe contract

This PR's binary release must precede datadog-serverless-compat-dotnet v1.5; v1.4 stays TCP-only and is unaffected.

Notes

  • A symmetric port-0 override exists for dogstatsd (config.rs, dd_dogstatsd_port). Out of scope here; flagged for a follow-up.

@Lewis-E Lewis-E force-pushed the lewis/SVLS-8243/named-pipe-backwards-compatability branch from 28db9c8 to 131147e Compare March 6, 2026 15:45
@Lewis-E Lewis-E force-pushed the lewis/SVLS-8243/named-pipe-backwards-compatability branch from 2c71ea0 to 9d56bcc Compare April 30, 2026 18:58
@Lewis-E Lewis-E force-pushed the lewis/SVLS-8243/named-pipe-backwards-compatability branch 2 times, most recently from 2f5764b to f9c920d Compare May 1, 2026 15:25
@Lewis-E Lewis-E marked this pull request as ready for review May 1, 2026 15:35
@Lewis-E Lewis-E requested a review from a team as a code owner May 1, 2026 15:35
@Lewis-E Lewis-E requested review from Chronobreak, Copilot and duncanpharvey and removed request for a team and Chronobreak May 1, 2026 15:35
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the Windows named-pipe transport behavior in the trace mini-agent so that, when DD_APM_WINDOWS_PIPE_NAME is set, the agent listens on both TCP (DD_APM_RECEIVER_PORT, default 8126) and the configured named pipe concurrently—preserving connectivity for legacy tracers that only support TCP.

Changes:

  • Refactors MiniAgent startup to spawn TCP and (optionally) named-pipe accept loops and supervise them via a centralized serve() function.
  • Removes the config behavior that forced dd_apm_receiver_port = 0 when a Windows pipe name is configured.
  • Updates the related config unit test to assert the default TCP port is preserved when a pipe name is set.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
crates/datadog-trace-agent/src/mini_agent.rs Starts TCP unconditionally, optionally starts named-pipe accept loop, and centralizes task supervision.
crates/datadog-trace-agent/src/config.rs Keeps TCP port enabled even when DD_APM_WINDOWS_PIPE_NAME is configured; updates test expectation accordingly.
Comments suppressed due to low confidence (1)

crates/datadog-trace-agent/src/mini_agent.rs:93

  • Same as above: this mut is unused and suppressed via #[allow(unused_mut)]. Prefer removing the mut binding (and the allow) to avoid masking future cases where mutability is accidentally unused.
        #[allow(unused_mut)]
        let mut stats_flusher_handle = tokio::spawn(async move {
            stats_flusher
                .start_stats_flusher(stats_config, stats_rx)
                .await;
        });

Comment thread crates/datadog-trace-agent/src/mini_agent.rs Outdated
Comment thread crates/datadog-trace-agent/src/mini_agent.rs Outdated
Copy link
Copy Markdown
Collaborator

@duncanpharvey duncanpharvey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What versions of the .NET, Java, and Node.js tracers was named pipe support added in? Can we validate that anyone is using one of these older version to justify the added complexity in listening over TCP and Named Pipes?

If we do move forward with adding this backwards compatibility for older tracers, do we have a plan for deprecation?

@Lewis-E
Copy link
Copy Markdown
Contributor Author

Lewis-E commented May 1, 2026

@duncanpharvey We need this for the .net tracer specifically, where this support does not yet exist: DataDog/dd-trace-dotnet#8164 -- see the table in the PR description.

I actually think this complexity is worth keeping-- it's under 100 lines, I wouldn't prioritize deprecating this against any of the other work we have. At best, I would throw it at the bottom of the backlog. Do you expect performance issues or similar that would change the calculus there?

@Lewis-E Lewis-E force-pushed the lewis/SVLS-8243/named-pipe-backwards-compatability branch from 6751361 to 48b70aa Compare May 4, 2026 13:58
@Lewis-E Lewis-E marked this pull request as draft May 4, 2026 14:25
Previously, configuring DD_APM_WINDOWS_PIPE_NAME forced the TCP receiver
port to 0, leaving legacy TCP-only tracers stranded. The mini agent now
listens on both transports concurrently when a pipe name is set: TCP
stays on dd_apm_receiver_port (default 8126), and the named pipe is
added alongside. Behavior is unchanged when no pipe name is configured.

A single serve() function supervises the TCP and named-pipe accept
loops plus the trace flusher, stats flusher, and stats concentrator.
Any task dying unexpectedly is fatal — tracers pick one transport at
startup and stay on it, so silently degrading a transport would strand
the tracers using it. On shutdown_rx, accept loops drain in-flight
handlers via a watch fan-out before the supervisor signals the stats
flusher to do its final emit, preserving the trace-stats invariant
that no AddChunk/ClientStatsPayload messages are lost. Survivors are
aborted before serve() returns so they don't detach and hold sockets,
named pipes, or buffered channel state alive.

shutdown_rx now uses tokio::sync::watch::Receiver<bool> instead of
oneshot::Receiver<()> so accept loops can subscribe directly via
.clone(); no separate accept_shutdown channel is needed.

Pipe code stays fully cfg-gated. On non-Windows or no-windows-pipes
builds, no pipe symbols are emitted.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Lewis-E Lewis-E force-pushed the lewis/SVLS-8243/named-pipe-backwards-compatability branch from 48b70aa to 6efdb89 Compare May 4, 2026 15:33
Boots one mini agent with both a non-zero dd_apm_receiver_port and a
named pipe configured, sends a service-tagged trace through each, and
asserts both service-name needles appear in the mock backend's request
bodies. Then fires shutdown_tx and asserts the final stats flush
arrives — exercising the watch-channel fan-out to both accept loops,
the drain of each transport's in-flight handlers, and the supervisor's
sequenced final flusher signal. Covers the dropped receiver_port=0
override in config.rs and the serve() supervisor's responsibility to
keep both accept loops alive.

create_test_trace_payload now takes Option<&str> so the dual-transport
test can distinguish payloads by service name without a second helper.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Lewis-E Lewis-E force-pushed the lewis/SVLS-8243/named-pipe-backwards-compatability branch from 6efdb89 to 1957ae3 Compare May 4, 2026 16:39
@Lewis-E Lewis-E marked this pull request as ready for review May 4, 2026 16:39
@Lewis-E Lewis-E requested a review from a team as a code owner May 4, 2026 16:39
@Lewis-E Lewis-E requested a review from shreyamalpani May 4, 2026 16:39
Lewis-E and others added 2 commits May 4, 2026 12:58
test_mini_agent_tcp_with_real_flushers flaked on Ubuntu 24.04 ARM CI
(and was reproducible occasionally on macOS too): the test would call
verify_stats_request immediately after agent_handle.await, but on slow
runners the mock server hadn't finished reading and storing the body
of the final stats POST in time. The fixed-duration FLUSH_WAIT_DURATION
sleep before each verify_* call had the same race in the other
direction — fast runners waited longer than necessary, slow runners
sometimes weren't enough.

Replace both timed sleeps with a wait_for_request_at_path helper that
polls the mock server every 50ms until at least one request arrives
at the expected path, with a 60-second hard ceiling that panics with a
clear message if exceeded. verify_trace_request and verify_stats_request
are now async and use the helper internally, so callers just .await
them — no separate sleep.

FLUSH_WAIT_DURATION stays only for verify_no_stats_request, where a
bounded wait is fundamental to the negative assertion (we can't poll
for the absence of an event).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Lewis-E Lewis-E force-pushed the lewis/SVLS-8243/named-pipe-backwards-compatability branch from 557b7f3 to f89f181 Compare May 4, 2026 18:47
@duncanpharvey
Copy link
Copy Markdown
Collaborator

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f89f181bbf

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

if let Some(h) = pipe_handle.as_ref() {
h.abort();
}
trace_flusher_handle.abort();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid aborting the trace flusher on graceful shutdown

When shutdown is requested after a trace handler has returned but before the next trace flush interval, this aborts trace_flusher_handle immediately after the accept loops drain. start_trace_flusher has no shutdown/final-flush path (it only flushes on its timer), so any SendData just queued by the drained handlers can remain in the receiver/aggregator and never be sent; the new real-flusher tests avoid this by waiting for verify_trace_request before shutdown, but production callers can lose traces during graceful shutdown.

Useful? React with 👍 / 👎.

@Lewis-E Lewis-E merged commit b7af312 into main May 6, 2026
27 checks passed
@Lewis-E Lewis-E deleted the lewis/SVLS-8243/named-pipe-backwards-compatability branch May 6, 2026 12:52
Lewis-E added a commit to DataDog/dd-trace-dotnet that referenced this pull request May 6, 2026
## Summary of changes
* Allow use of unique windows named pipes for IPC, coordinated with the
serverless compatibility layer's mini-agent.
* See:
DataDog/datadog-serverless-compat-dotnet#20 for
compat-side changes.
* See:
DataDog/serverless-compat-self-monitoring#54 for
integration testing in Azure.

## Reason for change
* Improve support for running multiple Azure Functions in the same
hosting plan. [[jira](https://datadoghq.atlassian.net/browse/SVLS-8244)]

## Implementation details
For [Azure
Functions](https://docs.datadoghq.com/serverless/azure_functions/?tab=dotnet#setup),
the tracer loads first, then the compat layer, and then the compat layer
launches a rust binary to act as the agent. Additionally, the client may
or may not be using the DogstatsD client, and it is also possible to
have the compat layer and DogstatsD in place without the tracer. We need
to coordinate named pipes across all three/four of these codebases,
noting that the tracer will use DogstatsD's pipe for runtime metrics if
available.

Given this order of operations, and that we want this to work with
immutable configuration, the tracer should set pipe names for the
compatibility layer via instrumentation if the tracer is present.

1. Tracer loads -- pipe name is generated
2. We check the presence of the compat layer + windows configs (using
undocumented envvar `DD_SERVERLESS_COMPAT_BINARY_PATH` if necessary to
adapt to Azure changes) to see if named pipes can be used or if we
should rely on TCP
4. Compat layer loads
5. Tracer uses instrumentation to force the compat layer to generate the
pipe names <pipe-a>, <pipe-b>
6. Compat layer passes the name to the mini-agent tracer <pipe-a> and
dogstatsd <pipe-b>.

| Scenario | Pipe generated? | Transport | Result |
  |---|---|---|---|                               
| New tracer + no compat at all | No (binary not found) | TCP | Works |
| New tracer + old compat (< 1.5) | No (version check fails) | TCP |
Works |
| New tracer + new compat (≥ 1.5) | Yes | Named pipe | Works
(integration coordinates) |
| New tracer + dev compat (0.0.0) | Yes | Named pipe | Works
(integration coordinates) |
  | Old tracer + no compat | No | TCP | Works |
| Old tracer + old compat (< 1.5) | No | TCP | Works (neither side knows
about pipes) |
| Old tracer + new compat (≥ 1.5) | No | TCP | **Broken** — tracer sends
via TCP, but compat binary listens on pipe only. Requires [more
serverless-components
changes](DataDog/serverless-components#94). |
  | Old tracer + dev compat (0.0.0) | No | TCP | Not supported |

## Test coverage
* Self Monitoring
* New Unit Tests

---------

Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Co-authored-by: Andrew Lock <andrewlock.net@gmail.com>
Lewis-E added a commit to DataDog/dd-trace-dotnet that referenced this pull request May 12, 2026
## Summary

- Bumps the `Datadog.Serverless.Compat` `PackageReference` in
`tracer/src/Datadog.AzureFunctions/Datadog.AzureFunctions.csproj` from
`1.4.0` → `1.5.0`.
- 1.5.0 ships multifunction named-pipe coordination for .NET on Azure
Functions Premium plans. It bundles the
[`datadog-serverless-compat/v0.24.0`](https://github.com/DataDog/serverless-components/releases/tag/datadog-serverless-compat%2Fv0.24.0)
binary, which contains the dual TCP+pipe listener
([serverless-components#94](DataDog/serverless-components#94))
and best-effort TCP-8126 bind
([serverless-components#129](DataDog/serverless-components#129))
— both required to land the named-pipe path merged in
[#8164](#8164).

## Test plan

- [ ] CI green (csproj-only change; should be a no-op outside of
`Datadog.AzureFunctions` packaging)
- [ ] Confirm a built `Datadog.AzureFunctions` package resolves
`Datadog.Serverless.Compat 1.5.0` from NuGet at restore time

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants