Add end-to-end Test_ExtendedHeartbeat for app-extended-heartbeat telemetry event#6338
Conversation
Add comprehensive parametric test suite for app-extended-heartbeat telemetry event across all SDK languages. Tests verify timing, payload structure, consistency, and compliance with API spec. Tests added: - test_extended_heartbeat_emission: Verifies interval timing - test_extended_heartbeat_sequence: Validates multiple events - test_extended_heartbeat_payload_content: Checks required fields - test_extended_heartbeat_matches_app_started: Ensures consistency - test_extended_heartbeat_excludes_products_and_install_signature - test_extended_heartbeat_default_interval: Validates 24h default All tests run parametrically across 9 SDK languages (Go, Java, .NET, C++, PHP, Ruby, Rust, Python, Node.js). Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
Register comprehensive parametric tests for app-extended-heartbeat event validation across all SDK languages. Tests verify: - Periodic emission timing (24h default, configurable) - Payload structure (config, deps, integrations) - Consistency with app-started event - Proper field exclusions (products, install_signature) Language version markers: - Go: v1.73.0-dev - Java: v1.40.0 - Node.js: v5.0.0 - Python: v2.0.0 - Ruby: v2.1.0-dev - C++: v0.2.0-dev - PHP: v1.0.0-dev - .NET: missing_feature (intentionally skipped - no forked runtime-id issues) Tests use fast intervals (0.3-0.5s) for CI efficiency. Test location: tests/parametric/test_telemetry.py::Test_ExtendedHeartbeat (lines 1256-1514) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Changed Test_ExtendedHeartbeat manifest entries from development version strings (e.g., v1.73.0-dev, v2.1.0-dev) to missing_feature to follow repository conventions. Development versions should not be specified in manifests. Tests for unreleased features should use missing_feature until the feature is actually released in a production version. Changes: - cpp.yml: v0.2.0-dev → missing_feature - golang.yml: v1.73.0-dev → missing_feature - nodejs.yml: '>=5.0.0' → missing_feature - php.yml: v1.0.0-dev → missing_feature - ruby.yml: v2.1.0-dev → missing_feature 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
…tionality Replace 6 complex test methods with a single simplified test that focuses on the core requirement: comparing configurations across telemetry events. The new test: - Grabs app-started, app-extended-heartbeat, and app-client-configuration-change events - Extracts configuration data from each - Asserts configs match between app-started and app-extended-heartbeat - Optionally validates config-change event configs match as well This reduces test complexity from ~260 lines to ~40 lines while maintaining coverage of the critical functionality. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Configure manifests to run app-extended-heartbeat tests only on Node.js for initial validation: - Enable Node.js: *ref_5_0_0 (>=5.0.0) - Disable Python: v2.0.0 -> missing_feature - Disable Java: v1.40.0 -> missing_feature - Keep all others disabled: missing_feature (cpp, dotnet, php, golang, ruby) This allows focused testing of Node.js implementation before enabling other languages. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Add SLF001 exception for test_telemetry.py to allow _get_telemetry_event usage - Fix nodejs.yml manifest to use '>=5.0.0' instead of non-existent anchor - All linting checks now pass (mypy, ruff, yamlfmt, yamllint, shellcheck) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This comment has been minimized.
This comment has been minimized.
Java tracer supports configurable extended heartbeat interval via DD_TELEMETRY_EXTENDED_HEARTBEAT_INTERVAL, so enable the parametric tests.
Resolve merge conflicts in manifests/java.yml, manifests/ruby.yml, and tests/parametric/test_telemetry.py. Keep Test_ExtendedHeartbeat additions while incorporating main's telemetry key normalization and stable configuration origin updates. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Revert pyproject.toml changes (not needed for this PR) and remove DD_TELEMETRY_HEARTBEAT_INTERVAL from Test_ExtendedHeartbeat since we only need to configure the extended heartbeat interval. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Use integer interval (1s) for Java compatibility (getLong) - Assert configs as superset with value matching, order-agnostic - Build expected config from app-started + config-change overlay - Collect all config-change events, not just the last one Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…eat payload (#17203) ## Summary Fix the extended heartbeat payload key from `"configurations"` (plural) to `"configuration"` (singular) to match the [telemetry v2 API spec](https://github.com/DataDog/instrumentation-telemetry-api-docs/blob/main/GeneratedDocumentation/ApiDocs/v2/SchemaDocumentation/Schemas/app_extended_heartbeat.md) and align with other SDKs (Java, .NET, Node.js). ## Changes - **`ddtrace/internal/telemetry/writer.py`**: `payload["configurations"]` → `payload["configuration"]` - **`tests/telemetry/test_telemetry.py`**: Updated test assertions to match ## Motivation Cross-SDK system tests validate that `app-extended-heartbeat` payloads use the same schema. The spec and all other SDKs use `"configuration"` (singular). This mismatch would cause system test & dropped telemetry payloads failures for Python. ## Related - System test PR: DataDog/system-tests#6338 - Original Python implementation: #16628 Co-authored-by: ayan.khan <ayan.khan@datadoghq.com>
Resolve merge conflict in manifests/ruby.yml (kept both Test_ExtendedHeartbeat and updated Test_Stable_Configuration_Origin). Fix ruff lint issues: D209 docstring closing quotes and SLF001 private member access noqa annotations in test_telemetry.py. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
cbeauchesne
left a comment
There was a problem hiding this comment.
There is already test on heartbeats in DEFAULT scenario.
If you plan to test only one or two set of parameters, then the end-to-end DEFAULT scenario is a best fit from a quality POV as the coverage is by far more complete.
We can chat more about what are the good/bad use cases for parametric/end-to-end if you want, just ping me !
…cheduler (#1824) ## Summary Wire up the `DD_TELEMETRY_EXTENDED_HEARTBEAT_INTERVAL` config value to the telemetry scheduler. The env var was already parsed into `config.telemetry_extended_heartbeat_interval` but the scheduler hardcoded `Duration::from_secs(60 * 60 * 24)` instead of using it. Default remains 24h this only enables system tests to use a shorter interval to validate the `app-extended-heartbeat` event fires correctly. ## Changes - **`libdd-telemetry/src/worker/mod.rs`**: Replace hardcoded `60 * 60 * 24` with `config.telemetry_extended_heartbeat_interval` ## Motivation Cross-SDK system tests need to set a short extended heartbeat interval (e.g., 2s) to validate parity of the `app-extended-heartbeat` telemetry event across all SDKs. Without this fix, PHP and other libdatadog consumers cannot be system-tested for this feature. ## Related - System test PR: DataDog/system-tests#6338 Co-authored-by: edmund.kump <edmund.kump@datadoghq.com>
Add Test_ExtendedHeartbeat to tests/test_telemetry.py for the DEFAULT scenario, validating that extended heartbeat config is a superset of app-started plus any config changes. Set DD_TELEMETRY_EXTENDED_HEARTBEAT_INTERVAL env var in weblog containers. Add manifest entries for all libraries. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move DD_TELEMETRY_EXTENDED_HEARTBEAT_INTERVAL from base container env to the DEFAULT scenario weblog_env where it belongs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The extended heartbeat validation is better suited as an end-to-end test in the DEFAULT scenario where it exercises the full tracer stack. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| doc="Test env var `DD_TELEMETRY_METRICS_ENABLED=false`", | ||
| scenario_groups=[scenario_groups.telemetry], | ||
| ) | ||
| telemetry_extended_heartbeat = EndToEndScenario( |
There was a problem hiding this comment.
you need to add this scenario in .github/workflows/run-end-to-end.yml
There was a problem hiding this comment.
Added in 4f7f76e — the TELEMETRY_EXTENDED_HEARTBEAT step is now registered between TELEMETRY_ENHANCED_CONFIG_REPORTING and TELEMETRY_LOG_GENERATION_DISABLED.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 49f6dc1a56
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| scenario_groups=[scenario_groups.telemetry], | ||
| ) | ||
| telemetry_extended_heartbeat = EndToEndScenario( | ||
| "TELEMETRY_EXTENDED_HEARTBEAT", |
There was a problem hiding this comment.
Wire TELEMETRY_EXTENDED_HEARTBEAT into end-to-end CI
This adds a new TELEMETRY_EXTENDED_HEARTBEAT scenario and binds Test_ExtendedHeartbeat to it, but the end-to-end workflow is not updated to execute that scenario: in .github/workflows/run-end-to-end.yml the telemetry section only runs TELEMETRY_APP_STARTED_PRODUCTS_DISABLED, TELEMETRY_ENHANCED_CONFIG_REPORTING, TELEMETRY_LOG_GENERATION_DISABLED, TELEMETRY_METRIC_GENERATION_DISABLED, and TELEMETRY_DEPENDENCY_LOADED_TEST_FOR_DEPENDENCY_COLLECTION_DISABLED. That means the new test path is effectively untested in CI, so regressions in this feature can merge without detection.
Useful? React with 👍 / 👎.
Addresses review feedback — the new scenario needs a step in .github/workflows/run-end-to-end.yml so CI can actually run it.
| tests/test_standard_tags.py: irrelevant | ||
| tests/test_telemetry.py::Test_APMOnboardingInstallID: '>=1.12.0' # Modified by easy win activation script | ||
| tests/test_telemetry.py::Test_DependencyEnable: '>=1.12.0' # Modified by easy win activation script | ||
| tests/test_telemetry.py::Test_ExtendedHeartbeat: ">1.14.0" |
There was a problem hiding this comment.
I don't expect this to work, as nginx-datadog has not received an update of its dd-trace-cpp since the app-extended-heartbeat telemetry event was implemented in DataDog/dd-trace-cpp@1603dce
There was a problem hiding this comment.
Good catch — addressed in c78fad7. Marked Test_ExtendedHeartbeat as missing_feature for both cpp_nginx and cpp_httpd until those modules pick up a dd-trace-cpp release that includes the implementation. CI was indeed failing with app-extended-heartbeat event not found on cpp_nginx as you predicted.
Address CI failures across cpp_nginx, cpp_httpd, dotnet, golang, java (spring-boot-3-native), and php for the new TELEMETRY_EXTENDED_HEARTBEAT scenario: - cpp_nginx / cpp_httpd: nginx-datadog and httpd-datadog have not yet shipped a release that includes the dd-trace-cpp commit implementing app-extended-heartbeat (per zacharycmontoya feedback) - dotnet: extended-heartbeat payload contains configs missing the required `value` field, failing schema validation - golang: extended-heartbeat only includes app-started configs, not app-client-configuration-change configs - java spring-boot-3-native: native image doesn't capture app-started telemetry event - php: telemetry runs via a sidecar process that does not emit app-extended-heartbeat events yet
## Bug `ExtendedHeartbeat` is in the scheduler's `delays` catalog but never gets a deadline, so its handler never fires and `app-extended-heartbeat` is never emitted by any libdatadog-based tracer. `Scheduler::new` initializes `deadlines: Vec::new()`. `next_deadline()` only reads `deadlines`. Events must be moved from `delays` → `deadlines` via `schedule_event(event)`. `Lifecycle(Start)` only scheduled `FlushMetricAggr` + `FlushData`; the `ExtendedHeartbeat` handler self-reschedules but needs a first fire to do so — chicken-and-egg. Hidden until now because the default interval is 24h, and prior to #1824 it was hardcoded to 24h regardless of config. Surfaced by [system-tests PR 6338](DataDog/system-tests#6338) (`TELEMETRY_EXTENDED_HEARTBEAT` scenario, 2s interval). ## Fix Schedule `ExtendedHeartbeat` alongside the others in `dispatch_action`'s `Lifecycle(Start)`. ## Tests - `full_flavor_start_schedules_every_periodic_action` — invariant: walks `delays`, asserts each is in `deadlines` after Start. Catches the whole class of bug, not just this variant. Verified to fail without the fix and pass with it. - `metrics_logs_flavor_start_does_not_schedule_extended_heartbeat` — negative guard locking in the `MetricsLogs` flavor's intentional exclusion of lifecycle events. Both are `#[cfg_attr(miri, ignore)]` since they exercise reqwest. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: bob.weinand <bob.weinand@datadoghq.com>
… dotnet Aligns the v2 telemetry schema with the just-merged authoritative spec in DataDog/instrumentation-telemetry-api-docs#124, which dropped `value` from the `required` list because backend ingest (UKV, EVP) silently accepts a missing `value` and stores it as an empty string. - tests/schemas/utils/miscs/telemetry/v2/objects/configuration.json: remove `value` from `required`. - tests/schemas/test_schemas.py: remove the two golang SchemaBug entries at `$.payload.configuration[].value` (ticket APMS-12697) — these worked around the now-relaxed schema and are no longer needed. - manifests/dotnet.yml: flip Test_ExtendedHeartbeat from missing_feature to v3.39.0 — the dotnet implementation has shipped (dd-trace-dotnet#8227, in v3.39.0+) and the schema-mismatch blocker is gone. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
All three upstream blockers landed: - golang: dd-trace-go#4633 (fix(telemetry): app-extended-heartbeat event not correctly reporting entire config state) shipped in v2.8.0 (2026-05-04). Configs from app-client-configuration-change now flow into the extended heartbeat. - cpp_nginx: nginx-datadog v1.16.1 (2026-05-07) bumps to dd-trace-cpp v2.1.0, which includes dd-trace-cpp#301 implementing app-extended-heartbeat. - cpp_httpd: httpd-datadog v1.0.5 (2026-05-07) bumps to dd-trace-cpp v2.1.0, same as above. CI install scripts already pull latest releases for both cpp modules, so no Dockerfile or pinning changes are needed. php and java spring-boot-3-native remain missing_feature: php awaits libdatadog#1910 in a release (not in v32.0.0); spring-boot-3-native has a separate native-image limitation around app-started capture. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The manifest validator requires NPM-style semver ranges without the 'v' prefix. '>=v2.8.0' was rejected as 'Invalid NPM block'. Use '>=2.8.0' instead — matches the convention used in adjacent entries (e.g. cpp_nginx '>=1.16.1', cpp_httpd '>1.0.4').
Both upstream blockers landed: - libdatadog v33.0.0 (2026-05-07) includes #1910 "schedule ExtendedHeartbeat on worker start" — the sidecar now actually emits the event. - dd-trace-php#3865 (merged 2026-05-11) bumped the libdatadog submodule past v33.0.0 to also include #1962 "include dependencies and integrations in app-extended-heartbeat". #3865 is post-1.19.2, so gate at >=1.20.0 (the next release). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Invert the weblog matrix from "enable all, except spring-boot-3-native" to "disable all, except spring-boot". The test exercises a single canonical weblog rather than implicitly running against every spring-boot-* and unrelated variants (vertx3, play, resteasy-netty3, akka-http, openliberty, payara, jetty, undertow). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PHP v1.20.0 emits null for dependencies[].version in the
app-extended-heartbeat payload, failing schema validation in
Test_DdtraceSchemas:
None is not of type 'string' on instance
$.payload.dependencies[12].version
The dependencies are populated in the extended heartbeat (per
libdatadog#1962), but the version field is null for at least some
entries. This is a real PHP tracer bug — disable the test until
dd-trace-php emits valid versions.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4aabacfb75
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
The step was added with always(), but always() evaluates true even when the workflow is cancelled — keeping the step running and burning runner time after a cancel. All surrounding scenario steps use !cancelled() for exactly this reason. Match the convention.
Summary
Adds a new
TELEMETRY_EXTENDED_HEARTBEATend-to-end scenario andTest_ExtendedHeartbeattest that validates theapp-extended-heartbeattelemetry event includes every config previously reported inapp-startedorapp-client-configuration-change.Also relaxes the v2 telemetry schema (
valueis no longer required onconf_key_valueentries) to match the just-merged authoritative spec change in instrumentation-telemetry-api-docs#124 — backend ingest (UKV, EVP) already accepts a missingvalueand stores it as an empty string, so the strict requirement was inconsistent with reality and a source of false-positive schema-validation failures for some tracers.Changes
New scenario
utils/_context/_scenarios/__init__.py—TELEMETRY_EXTENDED_HEARTBEATscenario with shortened intervals (DD_TELEMETRY_HEARTBEAT_INTERVAL=1,DD_TELEMETRY_EXTENDED_HEARTBEAT_INTERVAL=2, plus_DD_*alias) so the extended-heartbeat event fires within the test window..github/workflows/run-end-to-end.yml— registers the new scenario in the end-to-end CI matrix.New test
tests/test_telemetry.py—Test_ExtendedHeartbeat, gated by the existing@features.app_extended_heartbeat_eventdecorator. Collects config names from allapp-startedandapp-client-configuration-changeevents, then asserts every one appears in at least oneapp-extended-heartbeatevent across the stream (order-agnostic, to handle forked workers and remote-config re-reports).Schema relaxation
tests/schemas/utils/miscs/telemetry/v2/objects/configuration.json— dropsvaluefromrequired. Mirrors instrumentation-telemetry-api-docs#124.tests/schemas/test_schemas.py— removes both obsolete golangSchemaBugentries at$.payload.configuration[].value(ticket APMS-12697). These worked around the now-relaxed schema and are no longer needed.Manifest version gates
>1.0.4>=1.16.1v3.39.0>=2.8.0app-client-configuration-changenow flow into the extended heartbeat.v1.23.0missing_featuredue to a separate native-image limitation aroundapp-startedcapture.)>=5.97.0ref_5_97_0anchor added.>=1.20.0v4.6.5>2.30.0Related PRs
conf_key_value.valuefrom required to optional in the v2 spec. Merged.render.ymlworkflow so the spec docs can regenerate.