Skip to content

feat: implement stable configuration#294

Draft
bm1549 wants to merge 1 commit into
bm1549/config-providerfrom
bm1549/stable-configuration
Draft

feat: implement stable configuration#294
bm1549 wants to merge 1 commit into
bm1549/config-providerfrom
bm1549/stable-configuration

Conversation

@bm1549
Copy link
Copy Markdown
Contributor

@bm1549 bm1549 commented Mar 20, 2026

Description

Phase 1 Stable Configuration support for dd-trace-cpp, rebased onto the ConfigProvider refactor from #321. Scalar apm_configuration_default keys only; Linux-only.

At tracer init, the tracer reads two YAML files:

  • Local (user-managed): /etc/datadog-agent/application_monitoring.yaml
  • Fleet (Agent-managed): /etc/datadog-agent/managed/datadog-agent/stable/application_monitoring.yaml

Each file may contain apm_configuration_default, a flat map of DD_* environment-variable-style keys to scalar values. These participate in configuration precedence: fleet_stable > env > user/code > local_stable > default. Fleet files may include a config_id reported in telemetry against fleet-sourced entries.

How it's wired

This change plugs stable configuration into the ConfigProvider infrastructure landed in #321. Two ConfigSource implementations adapt parsed YAML into the provider's source list:

  • LocalStableConfigSource (origin: LOCAL_STABLE_CONFIG)
  • FleetStableConfigSource (origin: FLEET_STABLE_CONFIG, optional config_id)

finalize_config constructs both sources from the parsed YAML and passes them to ConfigProvider alongside EnvironmentSource. Every typed accessor on the provider (get_string, get_bool, get_uint64, get_tags, get_propagation_styles*) walks all three sources.

Keys covered: DD_SERVICE, DD_ENV, DD_VERSION, DD_TAGS, DD_TRACE_ENABLED, DD_TRACE_STARTUP_LOGS, DD_TRACE_128_BIT_TRACEID_GENERATION_ENABLED, DD_APM_TRACING_ENABLED, DD_TRACE_BAGGAGE_MAX_ITEMS, DD_TRACE_BAGGAGE_MAX_BYTES, DD_TRACE_PROPAGATION_STYLE*, DD_TRACE_RESOURCE_RENAMING_*.

Out of scope (deferred to follow-ups)

  • Nested finalize_config integration: trace_sampler_config, span_sampler_config, and datadog_agent_config finalize independently of the top-level ConfigProvider. Wiring stable config into them requires threading the sources through. Keys affected include DD_TRACE_SAMPLE_RATE, DD_TRACE_RATE_LIMIT, DD_TRACE_SAMPLING_RULES, DD_SPAN_SAMPLING_RULES, DD_TRACE_AGENT_URL. dd-trace-go's Phase 1 made the same choice; dd-trace-java covers them because its provider resolves universally.
  • Alias-precedence cleanup: get_propagation_styles_with_aliases picks the first alias any source has and resolves only that alias. A specific alias set in env can shadow a more general alias set by fleet. Tracked separately; predates this PR (came in via refactor: introduce ConfigProvider for unified env-var resolution #321).
  • Windows paths: no peer reference for the HKLM\SOFTWARE\Datadog\Datadog Agent registry lookup in other native-parsing tracers. Defer to a follow-up aligned with the Agent's registry layout.
  • apm_configuration_rules (targeting rules with selectors): not in any Phase 1 native implementation.

Key design choices

  • YAML parsing via yaml-cpp 0.8.0 (src/datadog/yaml_parser.{h,cpp}). Linked PRIVATE in CMake; in Bazel, lives in deps with a comment documenting the asymmetry.
  • 256KB file size limit on each YAML file. Larger files are skipped with an error log.
  • YAML alias-bomb mitigation: documents with more than 100 alias references are rejected before invoking YAML::Load. Heuristic guard, not a parser.
  • Non-scalar conversion safety: Node::as<std::string>() calls are gated on IsScalar() and wrapped in try/catch.
  • Test-accessible load_one in stable_config.h lets TempDir-based unit tests exercise file I/O paths directly.

Files

File Change
cmake/deps/yaml.cmake New: yaml-cpp 0.8.0 FetchContent dependency
BUILD.bazel, MODULE.bazel, WORKSPACE New: yaml-cpp Bazel dep
CMakeLists.txt, test/CMakeLists.txt yaml-cpp + new test files
src/datadog/yaml_parser.{h,cpp} New: yaml-cpp wrapper with alias-bomb and non-scalar guards
src/datadog/stable_config.{h,cpp} New: StableConfig, Linux file paths, 256KB file loader
src/datadog/stable_config_source.h New: LocalStableConfigSource and FleetStableConfigSource adapters
src/datadog/tracer_config.cpp Wire the two stable-config sources into ConfigProvider
src/datadog/telemetry/telemetry_impl.cpp Emit config_id per fleet-origin metadata entry
test/test_yaml_parser.cpp YAML parser tests including non-scalar and alias-bomb cases
test/test_stable_config.cpp StableConfig, file I/O via TempDir, source-adapter smoke tests
test/system-tests/request_handler.{cpp,h} /config endpoint reads from tracer_.config()

Rebased onto #321

The earlier 40-commit history collapsed to one feature commit on top of bm1549/config-provider. Behavior matches the original PR's intent for in-scope keys; resolution machinery now lives in ConfigProvider and ConfigSource rather than in a 5-parameter resolve_and_record_config template. The legacy template was retired in #321.

Testing

199,670 assertions / 172 test cases pass locally. Coverage includes:

  • YAML parser happy paths, non-scalar config_id / map key handling, alias-bomb rejection
  • StableConfig::lookup
  • File I/O: valid YAML, 256KB rejection, malformed YAML on disk, missing file, unreadable file (POSIX-guarded)
  • LocalStableConfigSource / FleetStableConfigSource adapter behavior including config_id forwarding
  • ConfigProvider precedence chain (covered in refactor: introduce ConfigProvider for unified env-var resolution #321's tests)

Related system-tests PR: DataDog/system-tests#6554.

Motivation

Enable fleet-managed and user-managed APM configuration for C++ tracers (dd-trace-cpp, nginx-datadog, httpd-datadog). C++ is the last major language without stable config support. This is Phase 1; extended configs (sampler integration), nested finalizer wiring, and Windows support follow.

Additional Notes

Jira ticket: [PROJ-IDENT]

@pr-commenter
Copy link
Copy Markdown

pr-commenter Bot commented Mar 20, 2026

Benchmarks

Benchmark execution time: 2026-05-15 12:02:04

Comparing candidate commit c069f80 in PR branch bm1549/stable-configuration with baseline commit dafd944 in branch bm1549/config-provider.

Found 0 performance improvements and 1 performance regressions! Performance is the same for 0 metrics, 0 unstable metrics.

Explanation

This is an A/B test comparing a candidate commit's performance against that of a baseline commit. Performance changes are noted in the tables below as:

  • 🟩 = significantly better candidate vs. baseline
  • 🟥 = significantly worse candidate vs. baseline

We compute a confidence interval (CI) over the relative difference of means between metrics from the candidate and baseline commits, considering the baseline as the reference.

If the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD), the change is considered significant.

Feel free to reach out to #apm-benchmarking-platform on Slack if you have any questions.

More details about the CI and significant changes

You can imagine this CI as a range of values that is likely to contain the true difference of means between the candidate and baseline commits.

CIs of the difference of means are often centered around 0%, because often changes are not that big:

---------------------------------(------|---^--------)-------------------------------->
                              -0.6%    0%  0.3%     +1.2%
                                 |          |        |
         lower bound of the CI --'          |        |
sample mean (center of the CI) -------------'        |
         upper bound of the CI ----------------------'

As described above, a change is considered significant if the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD).

For instance, for an execution time metric, this confidence interval indicates a significantly worse performance:

----------------------------------------|---------|---(---------^---------)---------->
                                       0%        1%  1.3%      2.2%      3.1%
                                                  |   |         |         |
       significant impact threshold --------------'   |         |         |
                      lower bound of CI --------------'         |         |
       sample mean (center of the CI) --------------------------'         |
                      upper bound of CI ----------------------------------'

scenario:BM_TraceTinyCCSource

  • 🟥 execution_time [+2.925ms; +3.309ms] or [+3.839%; +4.343%]

@datadog-datadog-prod-us1-2
Copy link
Copy Markdown

datadog-datadog-prod-us1-2 Bot commented Mar 20, 2026

🎯 Code Coverage (details)
Patch Coverage: 84.09%
Overall Coverage: 89.45% (-1.33%)

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: c069f80 | Docs | Datadog PR Page | Give us feedback!

@bm1549 bm1549 marked this pull request as ready for review March 20, 2026 20:13
@bm1549 bm1549 requested review from a team as code owners March 20, 2026 20:13
@bm1549 bm1549 requested review from Anilm3 and removed request for a team March 20, 2026 20:13
@bm1549 bm1549 changed the title feat: implement stable configuration (Phase 1) feat: implement stable configuration Mar 20, 2026
@bm1549 bm1549 marked this pull request as draft March 20, 2026 20:26
@bm1549 bm1549 force-pushed the bm1549/stable-configuration branch from b754683 to 62d83fb Compare March 21, 2026 02:18
Copy link
Copy Markdown
Contributor

@dmehala dmehala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a quick look, and this PR is a hard no. There are many problems, with the most obvious being the custom YAML parser. I sincerely appreciate your effort to add feature to the tracer. Coding agents can be useful, but they need an experienced person to guide them properly. Without that, the result is just a pile of code that is hard to maintain and fix.

@bm1549
Copy link
Copy Markdown
Contributor Author

bm1549 commented Mar 21, 2026

I took a quick look, and this PR is a hard no. There are many problems, with the most obvious being the custom YAML parser. I sincerely appreciate your effort to add feature to the tracer. Coding agents can be useful, but they need an experienced person to guide them properly. Without that, the result is just a pile of code that is hard to maintain and fix.

@dmehala I was the one that nudged it towards hand-rolled yaml 😅 justification was because it was a 600KB dependency (8% of library size). After reviewing further you're right

Made a bunch of tweaks that improve it, but still happy to close if you think this is far from solving the problem correctly

Read configuration from two YAML files at tracer initialization:

  - /etc/datadog-agent/application_monitoring.yaml (local, user-managed)
  - /etc/datadog-agent/managed/datadog-agent/stable/application_monitoring.yaml (fleet)

Each file holds a flat map of DD_* env var names to scalar values under
apm_configuration_default. Fleet files may include a config_id for
telemetry attribution.

The values slot into ConfigProvider's precedence chain via two new
ConfigSource implementations, LocalStableConfigSource and
FleetStableConfigSource, so the precedence is:

  fleet_stable > env > user/code > local_stable > default

Build:
  - Add yaml-cpp 0.8.0 as a dependency (CMake + Bazel).

Telemetry:
  - Emit config_id per ConfigMetadata entry when the origin is
    FLEET_STABLE_CONFIG (read from FleetStableConfigSource::config_id()).

Scope (Phase 1):
  - Linux paths only; Windows path resolution is deferred.
  - Wired into the keys finalize_config currently resolves through
    ConfigProvider (DD_SERVICE, DD_ENV, DD_VERSION, DD_TAGS, DD_TRACE_*,
    DD_APM_*, baggage limits, propagation styles, resource renaming).
    Nested finalize_config calls (trace_sampler, span_sampler, agent)
    still resolve env-only; extending them is a follow-up.
  - get_propagation_styles_with_aliases retains the alias-first
    resolution introduced in the ConfigProvider refactor. A more
    specific alias set by a lower-precedence source can shadow a more
    general alias set by fleet. Tracked as a follow-up.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AI Generated Largely based on code generated by an AI or LLM. This label is the same across all dd-trace-* repos

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants