Skip to content

fix: deadlock in loki.source.file when target is removed#3488

Merged
kalleep merged 9 commits intomainfrom
fix-loki-file-deadlock
May 2, 2025
Merged

fix: deadlock in loki.source.file when target is removed#3488
kalleep merged 9 commits intomainfrom
fix-loki-file-deadlock

Conversation

@kalleep
Copy link
Contributor

@kalleep kalleep commented May 2, 2025

PR Description

#3456 reports that some alloy nodes stopped sending logs. Alloy itself still produced some logs looking like this:

ts=2025-04-26T14:25:41.171363493Z level=info msg="skipping update of position for a file which does not currently exist" component_path=/ component_id=loki.source.file.pods component=tailer

Trying to access /api/v0/web/components hangs and the profile shows that most go routines are parked. This is most likely symptoms of a deadlock.

While going through the changes we have made to this component I noticed this.

So what happens is that when we are stopping tasks we wait for them to finish, that includes flushing any logs that have not been sent over the handler channel yet. Because nothing is reading from that channel when we are performing this update we get stuck waiting there indefinitely.

To fix this we need to keep on reading while we perform the update and stop that when we are done

Which issue(s) this PR fixes

fixes: #3456
fixes: #3472

Notes to the Reviewer

PR Checklist

  • CHANGELOG.md updated
  • Documentation added
  • Tests updated
  • Config converters updated

@kalleep kalleep requested a review from a team as a code owner May 2, 2025 11:58
opts component.Options
metrics *metrics

updateMut sync.Mutex
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was only used in Update so we don't need it

@kalleep kalleep force-pushed the fix-loki-file-deadlock branch from 36e110b to 6b44e0e Compare May 2, 2025 13:24
@kalleep kalleep merged commit 4b78873 into main May 2, 2025
39 checks passed
@kalleep kalleep deleted the fix-loki-file-deadlock branch May 2, 2025 14:14
kalleep added a commit that referenced this pull request May 5, 2025
* Fix deadlock that can happen when stopping reader tasks

Co-authored-by: William Dumont <william.dumont@grafana.com>
kalleep added a commit that referenced this pull request May 5, 2025
* fix: panic that happens when a target gets deleted when using decompression (#3475)

* Fix panic caused that can happen when when file is removed for
decompressor

* Change to start readLine start and stops updatePositions

* Add changelog

* Fix mimir.rules.kubernetes panic on non-leader debug info retrieval (#3451)

* Fix mimir.rules.kubernetes to only return eventProcessor state if it exists

* fix: deadlock in loki.source.file when target is removed (#3488)

* Fix deadlock that can happen when stopping reader tasks

Co-authored-by: William Dumont <william.dumont@grafana.com>

* fix: emit valid logfmt key (#3495)

* Fix log keys to be valid for logfmt

* Add changelog

* Fix streams limit error check so that metrics are correctly labeled as `ReasonStreamLimited` (#3466)

* fix: replace direct error string compare with isErrMaxStreamsLimitExceeded helper

* update CHANGELOG

* Make errMaxStreamsLimitExceeded an error type

---------

Co-authored-by: Théo Brigitte <theo.brigitte@gmail.com>
Co-authored-by: William Dumont <william.dumont@grafana.com>
Co-authored-by: Marat Khvostov <marathvostov@gmail.com>
marctc pushed a commit that referenced this pull request May 7, 2025
* Fix deadlock that can happen when stopping reader tasks

Co-authored-by: William Dumont <william.dumont@grafana.com>
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jun 2, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Missing Pod Logs Not Ingested by Alloy Log collection stuck, http request to /api/v0/web/components blocked, no logs ingested

2 participants