[receiver/prometheusremotewritereceiver] Fix silent data loss on consumer failure#45151
Merged
songy23 merged 2 commits intoopen-telemetry:mainfrom Jan 7, 2026
Merged
Conversation
23971f0 to
85f883a
Compare
85f883a to
0dc03c0
Compare
…umer failure The receiver was sending HTTP 204 No Content before calling ConsumeMetrics(), so if the consumer failed, clients incorrectly thought data was delivered. This violates the Prometheus Remote Write spec which states receivers MUST NOT return 2xx if data was not successfully written. Changes: - Move WriteHeader(204) to after ConsumeMetrics() succeeds - Return 400 Bad Request for permanent consumer errors - Return 500 Internal Server Error for retryable errors - Add tests for consumer error handling Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
0dc03c0 to
ec15cd2
Compare
dashpole
approved these changes
Jan 6, 2026
…ote-write-silent-loss Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Contributor
|
Thank you for your contribution @aknuds1! 🎉 We would like to hear from you about your experience contributing to OpenTelemetry by taking a few minutes to fill out this survey. If you are getting started contributing, you can also join the CNCF Slack channel #opentelemetry-new-contributors to ask for guidance and get help. |
6 tasks
rashmichandrashekar
added a commit
to Azure/prometheus-collector
that referenced
this pull request
Feb 18, 2026
This PR upgrades the otelcollector to the latest version available for the opentelemetry-collector and opentelemetry-operator. It was automatically generated by the GitHub Actions workflow. The summary of the OSS changelog is below: # Prometheusreceiver Changes ## v0.142.0 to v0.144.0 Generated on: 2026-01-27 07:11:01 --- ### v0.144.0 - [**FEATURE**] `receiver/prometheus`: receiver/prometheus now associates scraped _created text lines as the created timestamp of its metric family rather than its own metric series, as defined by the OpenMetricsText spec ([#45291](open-telemetry/opentelemetry-collector-contrib#45291)) - [**FEATURE**] `receiver/prometheus`: Add comprehensive troubleshooting and best practices guide to Prometheus receiver README ([#44925](open-telemetry/opentelemetry-collector-contrib#44925)) The guide includes common issues and solutions, performance optimization strategies, production deployment best practices, monitoring recommendations, and debugging tips. - [**FEATURE**] `receiver/prometheusremotewrite`: Replace labels.Map() iteration with direct label traversal to eliminate intermediate map allocations. ([#45166](open-telemetry/opentelemetry-collector-contrib#45166)) - [**BUG FIX**] `receiver/prometheusremotewrite`: Fix silent data loss when consumer fails by returning appropriate HTTP error codes instead of 204 No Content. ([#45151](open-telemetry/opentelemetry-collector-contrib#45151)) The receiver was sending HTTP 204 No Content before calling ConsumeMetrics(), causing clients to believe data was successfully delivered even when the consumer failed. Now returns 400 Bad Request for permanent errors and 500 Internal Server Error for retryable errors, as per the Prometheus Remote Write 2.0 specification. ### v0.143.0 - [**BREAKING**] `receiver/prometheus`: Remove deprecated `use_start_time_metric` and `start_time_metric_regex` configuration options. ([#44180](open-telemetry/opentelemetry-collector-contrib#44180)) The `use_start_time_metric` and `start_time_metric_regex` configuration options have been removed after being deprecated in v0.142.0. Users who have these options set in their configuration will experience collector startup failures after upgrading. To migrate, remove these configuration options and use the `metricstarttime` processor instead for equivalent functionality. - [**FEATURE**] `receiver/prometheus`: Add `receiver.prometheusreceiver.RemoveReportExtraScrapeMetricsConfig` feature gate to disable the `report_extra_scrape_metrics` config option. ([#44181](open-telemetry/opentelemetry-collector-contrib#44181)) When enabled, the `report_extra_scrape_metrics` configuration option is ignored, and extra scrape metrics are controlled solely by the `receiver.prometheusreceiver.EnableReportExtraScrapeMetrics` feature gate. This mimics Prometheus behavior where extra scrape metrics are controlled by a feature flag. ## Summary | Category | Count | |----------|-------| | Breaking Changes | 1 | | Features | 4 | | Bug Fixes | 1 | | Other Changes | 0 | | **Total** | **6** | # Target-allocator Changes ## v0.142.0 to v0.144.0 Generated on: 2026-01-27 07:11:16 --- No changes found for target-allocator between v0.142.0 and v0.144.0 --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Rashmi Chandrashekar <rashmy@microsoft.com> Co-authored-by: Grace Wehner <grace.wehner@microsoft.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The receiver was sending HTTP 204 No Content before calling
ConsumeMetrics(), so if the consumer failed, clients incorrectly thought data was delivered. This violates the Prometheus Remote Write 2.0 specification which states:Changes
WriteHeader(204)to afterConsumeMetrics()succeedsImpact
Without this fix, when a downstream consumer fails (e.g., backend unavailable, memory limiter rejecting batches, exporter failures), Prometheus clients receive a success response and won't retry, leading to silent data loss.
Testing
Added
TestHandlePRWConsumerResponsewith sub-tests:success returns 204- verifies normal operationretryable error returns 500- verifies temporary failures return 500permanent error returns 400- verifies permanent failures return 400