Skip to content

refactor: introduce ConfigProvider for unified env-var resolution#321

Draft
bm1549 wants to merge 14 commits into
mainfrom
bm1549/config-provider
Draft

refactor: introduce ConfigProvider for unified env-var resolution#321
bm1549 wants to merge 14 commits into
mainfrom
bm1549/config-provider

Conversation

@bm1549
Copy link
Copy Markdown
Contributor

@bm1549 bm1549 commented May 15, 2026

Description

Introduces ConfigProvider, a source-list resolver that walks ordered ConfigSource implementations to resolve DD_* configuration values with telemetry origin attribution. This is a pure refactor — every existing test passes unchanged. The point is to lay an extensible foundation: PR follow-ups (notably the stable-configuration work that originally tried to plumb fleet/local YAML sources per call site) can add new sources by implementing ConfigSource once, and every consumer that goes through ConfigProvider automatically participates in the precedence chain.

Architecture

Type Role
ConfigSource (interface) One origin of DD_* values. Returns Optional<std::string> for a key plus a ConfigMetadata::Origin and optional config_id.
EnvironmentSource Concrete source backed by std::getenv.
ConfigProvider Owns up to three sources (fleet, env, local) plus a metadata sink. Templated resolve<T> walks the source list in precedence order, records each non-empty source into the metadata vector, returns the highest-precedence value.

Precedence: fleet_stable > env > user/code > local_stable > default. PR R registers only EnvironmentSource (fleet and local source pointers are nullptr and skipped); a follow-up PR will register FleetStableConfigSource and LocalStableConfigSource, at which point every existing provider.get_*(...) call site automatically gains stable-config support.

Accessors

Each accessor parses the source's raw string into the target type and writes telemetry origin into the metadata map:

  • get_string(name, env_key, user_value, default_value)
  • get_bool(name, env_key, user_value, default_value) — matches existing env-var convention (any non-falsy string is true)
  • get_uint64(name, env_key, user_value, default_value, logger) — logs and falls through on parse failure
  • get_double(name, env_key, user_value, default_value, logger) — same
  • get_tags(name, env_key, user_value, default_value, logger) — parses comma-separated key:value pairs
  • get_propagation_styles(name, env_key, user_value, default_value, logger) — parses style-name lists

Internally each accessor reuses a generic resolve<T> template that takes a ParseFn and a StringifyFn. Adding a new typed accessor is one method + one parse/stringify pair.

Migrations

Out of 17 call sites of resolve_and_record_config on main, 12 migrate cleanly to ConfigProvider:

File Sites migrated Sites kept on resolve_and_record_config
trace_sampler_config.cpp 2 (sample_rate, max_per_second) 0
tracer_config.cpp 10 (service, tags, baggage limits, all bool flags) 4 (DD_ENV, DD_VERSION, extraction_styles, injection_styles)
span_sampler_config.cpp 0 1 (sampling rules)

The 5 sites that stay use semantics the provider does not yet model:

  • DD_ENV / DD_VERSION — record no DEFAULT metadata entry when unset (the provider always records one).
  • DD_TRACE_PROPAGATION_STYLE_{EXTRACT,INJECT} — fall back across a specific → legacy → global env-key chain, which the env_config loader resolves before resolve_and_record_config sees a single value.
  • DD_SPAN_SAMPLING_RULES — operates on a pre-parsed Optional<vector<Rule>> rather than a string. The rule-parsing JSON validator is non-trivial and lives in load_span_sampler_env_config.

These can be migrated in follow-ups once ConfigProvider gains a "no-default" mode and a sampling-rules accessor. The 3-arg resolve_and_record_config overload stays in include/datadog/config.h for now.

Other changes

  • ConfigMetadata grows two new Origin enum values (LOCAL_STABLE_CONFIG, FLEET_STABLE_CONFIG) and an optional config_id field. No production code uses them yet. They exist so ConfigProvider's telemetry recording is well-typed when a follow-up adds the stable sources.
  • parse_propagation_styles moves from tracer_config.cpp's anonymous namespace to the public propagation_style.h so ConfigProvider::get_propagation_styles can reuse it.
  • telemetry_impl.cpp's origin-serialization switch handles the new enum values (emits local_stable_config / fleet_stable_config and forwards config_id for the fleet case).
  • env_config loaders still parse the keys the provider also reads. This preserves the existing parse-failure-as-Error behavior (the provider's "log and fall through" semantics are reached only in success cases).

Files

File Change
src/datadog/config_source.h New — interface
src/datadog/environment_source.{h,cpp} New — std::getenv source
src/datadog/config_provider.{h,cpp} New — source-list resolver
include/datadog/config.h Extends ConfigMetadata::Origin; adds config_id field
include/datadog/propagation_style.h Adds parse_propagation_styles declaration
src/datadog/propagation_style.cpp Adds parse_propagation_styles implementation
src/datadog/telemetry/telemetry_impl.cpp Handle the two new origin values
src/datadog/trace_sampler_config.cpp Migrate 2 sites
src/datadog/tracer_config.cpp Migrate 10 sites; drop the duplicate parse_propagation_styles
BUILD.bazel, CMakeLists.txt, test/CMakeLists.txt Wire the new files
test/test_config_provider.cpp New — 19 unit tests covering the source interface, env source, and every accessor

Testing

  • 199,545 assertions across 126 test cases pass — every pre-existing test passes unchanged.
  • 19 new [config_provider] tests cover: MapSource fixture, EnvironmentSource (present / missing / empty), get_string (env-wins / user-wins-over-default / env-beats-user / default-only / null-source-pointers-skipped / full-precedence-chain), get_bool (true / false), get_uint64 (valid / invalid-falls-through), get_double, get_tags (valid / malformed), get_propagation_styles (valid / invalid-falls-through).

Motivation

Lays the source-list-resolver foundation so a follow-up PR can add universal stable-configuration support. Once FleetStableConfigSource and LocalStableConfigSource are added to ConfigProvider's source list, every DD_* config key resolved through the provider automatically participates in fleet_stable > env > user/code > local_stable > default precedence, without per-call-site wiring.

Additional Notes

Jira ticket: [PROJ-IDENT]

bm1549 added 10 commits May 14, 2026 21:00
ConfigSource is the building block of the upcoming ConfigProvider
refactor.  Each implementation represents one origin from which DD_*
configuration values can be read; ConfigProvider walks them in fixed
precedence order.  This commit introduces the interface only; the env
var implementation follows.
First concrete ConfigSource implementation.  Reads from process
environment via std::getenv; treats empty values the same as missing
(consistent with existing dd-trace-cpp env-var convention).
ConfigProvider walks a source list (fleet, env, local) in precedence
order, records every non-empty source's value into the existing
ConfigMetadata sink, and returns the highest-precedence value.  User
values plumb in as an Optional<T> parameter on the accessor; defaults
plumb in as a value parameter.

Source pointers may be null (the source is skipped), so PR R registers
only an EnvironmentSource; later PRs add fleet and local sources without
changing call sites.

Also extends ConfigMetadata with two new origin enum values
(LOCAL_STABLE_CONFIG, FLEET_STABLE_CONFIG) and a config_id field, both
referenced by ConfigProvider but only populated when a stable source is
present.  telemetry_impl.cpp's serialization switch handles the new
origins.
Refactors get_string into a generic resolve<T> template that takes a
parse_fn (string -> Expected<T>) and a stringify_fn (T -> string).
Adds get_bool, get_uint64, get_double on top of the same template.
Bool follows the env-var convention (any non-falsy string is true,
never logs).  Uint64 and double log a parse error and fall through to
the next-lower-precedence source.
The list-form parser was file-local to tracer_config.cpp.  ConfigProvider
will need it for its get_propagation_styles accessor, so make it
publicly accessible alongside its singular cousin.  Behavior unchanged.
Replaces both resolve_and_record_config call sites with ConfigProvider
accessors.  sample_rate and max_per_second now flow through
provider.get_double; the "is_sample_rate_provided" flag is now derived
by inspecting the metadata vector size (DEFAULT-only means no source
contributed).

env_config still loads sample_rate / max_per_second so its error path
on parse failure remains.  The provider also parses the same env var,
which is harmless duplication kept for behavior preservation.
…nfig to ConfigProvider

Replaces the scalar resolve_and_record_config calls (DD_SERVICE, DD_TAGS,
DD_TRACE_STARTUP_LOGS, DD_TRACE_ENABLED, DD_TRACE_128_BIT_TRACEID_*,
DD_TRACE_BAGGAGE_MAX_*, DD_APM_TRACING_ENABLED, resource-renaming
booleans) with the corresponding ConfigProvider accessors.

Kept on resolve_and_record_config for now:
- DD_ENV / DD_VERSION: rely on the "no default recorded" semantic that
  the provider does not yet support.
- DD_TRACE_PROPAGATION_STYLE_{EXTRACT,INJECT}: rely on the
  specific>legacy>global env-key fallback chain that the env_config
  loader already resolves before reaching the resolver.

env_config still loads each of the migrated keys so its existing
error-on-bad-input behavior is preserved.  The provider parses the
same env vars again (harmless duplication).
- EnvironmentSource now preserves empty values (returns empty string)
  to match environment::lookup's contract.  Previously empty env vars
  were treated as unset, which silently broke DD_SERVICE= explicit-empty
  overrides for migrated tracer settings.
- ConfigProvider::resolve now includes the underlying Error::message in
  the parse-failure fallthrough log, so operators see what went wrong
  (matching how load_*_env_config formats errors).
- trace_sampler_config replaces the fragile size() > 1 sentinel for
  is_sample_rate_provided with an explicit check on the last entry's
  origin.  Same semantics, less likely to break under future changes.
@bm1549 bm1549 added the AI Generated Largely based on code generated by an AI or LLM. This label is the same across all dd-trace-* repos label May 15, 2026
@pr-commenter
Copy link
Copy Markdown

pr-commenter Bot commented May 15, 2026

Benchmarks

Benchmark execution time: 2026-05-15 02:09:32

Comparing candidate commit dafd944 in PR branch bm1549/config-provider with baseline commit 34520f2 in branch main.

Found 1 performance improvements and 0 performance regressions! Performance is the same for 0 metrics, 0 unstable metrics.

Explanation

This is an A/B test comparing a candidate commit's performance against that of a baseline commit. Performance changes are noted in the tables below as:

  • 🟩 = significantly better candidate vs. baseline
  • 🟥 = significantly worse candidate vs. baseline

We compute a confidence interval (CI) over the relative difference of means between metrics from the candidate and baseline commits, considering the baseline as the reference.

If the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD), the change is considered significant.

Feel free to reach out to #apm-benchmarking-platform on Slack if you have any questions.

More details about the CI and significant changes

You can imagine this CI as a range of values that is likely to contain the true difference of means between the candidate and baseline commits.

CIs of the difference of means are often centered around 0%, because often changes are not that big:

---------------------------------(------|---^--------)-------------------------------->
                              -0.6%    0%  0.3%     +1.2%
                                 |          |        |
         lower bound of the CI --'          |        |
sample mean (center of the CI) -------------'        |
         upper bound of the CI ----------------------'

As described above, a change is considered significant if the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD).

For instance, for an execution time metric, this confidence interval indicates a significantly worse performance:

----------------------------------------|---------|---(---------^---------)---------->
                                       0%        1%  1.3%      2.2%      3.1%
                                                  |   |         |         |
       significant impact threshold --------------'   |         |         |
                      lower bound of CI --------------'         |         |
       sample mean (center of the CI) --------------------------'         |
                      upper bound of CI ----------------------------------'

scenario:BM_TraceTinyCCSource

  • 🟩 execution_time [-3.266ms; -2.825ms] or [-4.137%; -3.578%]

The CI 'verify' job runs bin/check-environment-variables which flags
direct std::getenv usage outside environment.cpp.  Add a name-keyed
lookup overload to environment::lookup and route EnvironmentSource
through it so the project's env-var allowlist convention stays intact.
@datadog-datadog-prod-us1
Copy link
Copy Markdown

datadog-datadog-prod-us1 Bot commented May 15, 2026

🎯 Code Coverage (details)
Patch Coverage: 93.66%
Overall Coverage: 90.78% (-0.07%)

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: dafd944 | Docs | Datadog PR Page | Give us feedback!

setenv / unsetenv are POSIX-only; MSVC fails to build them.  The
project already has an EnvGuard RAII helper that handles _putenv on
Windows; use it.
…et<T>

- Drop the "no-default" semantic for DD_ENV / DD_VERSION.  Java, Go, and
  .NET all record a DEFAULT entry; the cpp-specific omission was an
  artifact of resolve_and_record_config's template specialization.
  Now DD_ENV and DD_VERSION use provider.get_string with an empty-string
  default, so they participate in stable config automatically.  Updated
  two tracer_config test sections to expect the new DEFAULT entry.
- Add get_propagation_styles_with_aliases for the
  DD_TRACE_PROPAGATION_STYLE_EXTRACT > DD_PROPAGATION_STYLE_EXTRACT >
  DD_TRACE_PROPAGATION_STYLE chain (and symmetric injection chain).
  Migrate both call sites in tracer_config.cpp to it.
- Add a generic templated get<T> accessor on ConfigProvider that
  accepts a caller-supplied parse and stringify function.  Refactor
  the existing typed accessors (get_string / get_bool / etc.) to share
  the same machinery.  Future complex types (e.g., sampling rules)
  can use the generic accessor without growing ConfigProvider's API.

The span_sampler resolve_and_record_config site is not migrated:
DD_SPAN_SAMPLING_RULES_FILE fallback pre-populates env_config->rules
from a file, which doesn't fit the env-source-string model without
either layering manual file fallback after the provider call or
pre-populating the provider's metadata.  Deferred to a follow-up.

The EnvironmentSource empty-value test is gated #ifndef _MSC_VER:
EnvGuard's _putenv("NAME=") deletes the variable on Windows rather
than setting it to empty.  The behavior cpp implements (preserving
empty values from environment::lookup) is verified on Linux/macOS.
…onfig

The span sampling rules call site was the last consumer of the 3-arg
resolve_and_record_config overload.  Inline the env > user precedence
logic at the call site (same shape Java uses for SPAN_SAMPLING_RULES
plus SPAN_SAMPLING_RULES_FILE — read them as two separate knobs and
compose at the application layer).

env_config still pre-resolves DD_SPAN_SAMPLING_RULES_FILE into
env_config->rules; the call site treats env_config->rules as the
combined ENVIRONMENT_VARIABLE-origin value.  Behavior preserved.

With zero callers, delete the resolve_and_record_config template
overload entirely.  ConfigProvider is now the sole config-resolution
mechanism.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AI Generated Largely based on code generated by an AI or LLM. This label is the same across all dd-trace-* repos

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant