Skip to content

fix: no longer drop request if stream is dropped in loki.source.api#4834

Merged
kalleep merged 14 commits intomainfrom
kalleep/loki-source-api-fixes
Nov 17, 2025
Merged

fix: no longer drop request if stream is dropped in loki.source.api#4834
kalleep merged 14 commits intomainfrom
kalleep/loki-source-api-fixes

Conversation

@kalleep
Copy link
Contributor

@kalleep kalleep commented Nov 13, 2025

PR Description

This pr solves several issues in loki.source.api.

  • No longer cancel request if stream is dropped by relabel rules, this is clearly a return instead of continue bug.

  • Add function ForceShutdown that will cancel all request before shutting down server.

    • When component is shutting down we are not doing any draining like in loki.source.file. And IMO we should not do it here. It's better to just cancel all in-flight requests so that caller can retry.
  • Send batches instead of individual entries from request handler to component. This makes it a bit more transaction safe where we don't partially ingest entries if request is either canceled or a shutdown happens.

Which issue(s) this PR fixes

Notes to the Reviewer

PR Checklist

  • CHANGELOG.md updated
  • Documentation added
  • Tests updated
  • Config converters updated

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes a critical bug in loki.source.api where requests were incorrectly cancelled when relabel rules dropped a specific stream (instead of continuing to process remaining streams). Additionally, it refactors the component to send batches of entries rather than individual entries, improving transactional safety during shutdown scenarios.

Key changes:

  • Fixed bug where return should have been continue when a stream is dropped by relabel rules
  • Added ForceShutdown() method to cancel all in-flight requests before server shutdown
  • Refactored to send batches of entries instead of individual entries for better atomicity

Reviewed Changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
internal/component/loki/source/api/internal/lokipush/push_api_server.go Fixed the relabel rules bug (continue instead of return), added ForceShutdown(), and changed to batch sending
internal/component/loki/source/api/internal/lokipush/push_api_server_test.go Added fakeBatchReceiver test helper to support batch-based testing
internal/component/loki/source/api/api.go Changed handler from LogsReceiver to LogsBatchReceiver and updated Run() to process batches
internal/component/loki/source/api/api_test.go Updated tests to remove manual cleanup now handled by context cancellation
internal/component/common/loki/receiver.go New file containing receiver interfaces, extracted from entry_handler.go
internal/component/common/loki/entry.go New file containing Entry type definition, extracted from entry_handler.go
internal/component/common/loki/entry_handler.go Removed receiver and entry code that was moved to separate files
internal/component/common/loki/entry_handler_test.go Added test coverage for label middleware functionality
CHANGELOG.md Added entry documenting the bug fix

}

if lastErr != nil {
level.Warn(s.logger).Log("msg", "at least one entry in the push request failed to process", "err", lastErr.Error())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be cheap and easy to add a counter of how many entries failed out of how many in the batch. And that would be useful.

Copy link
Contributor Author

@kalleep kalleep Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm after checking what kind of metrics other exposes we could add two metrics:

  • loki_source_api_entries_processed - this would be the total number of entries in a request
  • loki_source_api_entries_written - this would be the total number of entries that we have forwarded

With that you could derive how many are dropped either by relabeling or by invalid labels, WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went with my suggestion. Let me know what you think :)

w.WriteHeader(http.StatusServiceUnavailable)
return
}
entries = append(entries, e)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might want to reuse the slices in a pool if this comes up in allocations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, do you think we should do that now or wait and see if it shows up?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we go down the pooling path could we try to avoid sending it through a channel? It's fine today because we know the implementation is an unbuffered channel but it feels risky to depend upon that given the server can't enforce it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's wait to see if this comes up in profiles.

@github-actions
Copy link
Contributor

github-actions bot commented Nov 14, 2025

💻 Deploy preview deleted (fix: no longer drop request if stream is dropped in loki.source.api).

@kalleep kalleep requested a review from thampiotr November 14, 2025 10:20
@kalleep kalleep force-pushed the kalleep/loki-source-api-fixes branch 2 times, most recently from 2b5b401 to 1dfccfc Compare November 17, 2025 08:02
Comment on lines +108 to +109
* `loki_source_api_entries_processed` (counter): Total number of log entries processed.
* `loki_source_api_entries_written` (counter): Total number of log entries forwarded.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we do want some metrics, but we also don't want to have too many metrics. Should we settle on only one off them? I think loki_source_api_entries_written would be better option to keep. The idea is to be able to spot some major issues and then continue debugging using other means like logging, profiles or local repro.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, only kept loki_source_api_entries_written

Copy link
Contributor

@thampiotr thampiotr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGMT % adding only one debug metric instead of two

@kalleep kalleep force-pushed the kalleep/loki-source-api-fixes branch from a66e3fc to 73e08d5 Compare November 17, 2025 12:02
@kalleep kalleep merged commit 74653ac into main Nov 17, 2025
43 of 46 checks passed
@kalleep kalleep deleted the kalleep/loki-source-api-fixes branch November 17, 2025 13:38
jharvey10 pushed a commit that referenced this pull request Nov 18, 2025
…4834)

* fix: add ForceShutdown that will cancel in-flight requests before
stopping server

* Split into multiple files and add LogsBatchReceiver

* don't drop request when relabel rules drops a specific stream

* fix: use loki.LogsBatchReceiver to ensure all entries in a request is sent down the
pipeline

* add changelog

* add checks for entries and use sync once to close channel
@jharvey10 jharvey10 mentioned this pull request Nov 18, 2025
jharvey10 added a commit that referenced this pull request Nov 19, 2025
* fix: no longer drop request if stream is dropped in loki.source.api (#4834)

* fix: add ForceShutdown that will cancel in-flight requests before
stopping server

* Split into multiple files and add LogsBatchReceiver

* don't drop request when relabel rules drops a specific stream

* fix: use loki.LogsBatchReceiver to ensure all entries in a request is sent down the
pipeline

* add changelog

* add checks for entries and use sync once to close channel

* update changelog for next rc

* Fix flaky tests: port in for loki source api tests and logs integration test (#4875)

* Fix port in use flakyness for loki source api tests

* Pin loki container version for integration tests

* Add a new mimir.alerts.kubernetes component (#3448)

* Add a new mimir.alerts.kubernetes component

* Sync Mimir periodically, test the case of a CRD deletion

* Add TODOs

* Longer test timeout

* Check if pods are running

* Apply suggestions from code review

Co-authored-by: Clayton Cornell <131809008+clayton-cornell@users.noreply.github.com>

* Fix metric doc

* Fix changelog

---------

Co-authored-by: Clayton Cornell <131809008+clayton-cornell@users.noreply.github.com>

* Remove experimental flag from stage.windowsevent (#4879)

---------

Co-authored-by: Karl Persson <23356117+kalleep@users.noreply.github.com>
Co-authored-by: Kyle Eckhart <kgeckhart@users.noreply.github.com>
Co-authored-by: Paulin Todev <paulin.todev@gmail.com>
Co-authored-by: Clayton Cornell <131809008+clayton-cornell@users.noreply.github.com>
dehaansa pushed a commit to madhub/alloy that referenced this pull request Dec 10, 2025
…rafana#4834)

* fix: add ForceShutdown that will cancel in-flight requests before
stopping server

* Split into multiple files and add LogsBatchReceiver

* don't drop request when relabel rules drops a specific stream

* fix: use loki.LogsBatchReceiver to ensure all entries in a request is sent down the
pipeline

* add changelog

* add checks for entries and use sync once to close channel
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 19, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants