fix: Issues when reading files using non UTF-8 encoding in loki.source.file #5259
Merged
fix: Issues when reading files using non UTF-8 encoding in loki.source.file #5259
Conversation
consume BOM to determine encoding
ptodev
reviewed
Jan 14, 2026
Contributor
|
💻 Deploy preview deleted (fix: Issues when reading files using non UTF-8 encoding in loki.source.file ). |
ptodev
reviewed
Jan 14, 2026
Co-authored-by: Paulin Todev <paulin.todev@gmail.com>
also we don't have to reseek in drain because we buffer non terminated lines returned after EOF.
it works and we can continue from a recorded offset
Co-authored-by: Clayton Cornell <131809008+clayton-cornell@users.noreply.github.com>
Co-authored-by: Clayton Cornell <131809008+clayton-cornell@users.noreply.github.com>
ptodev
approved these changes
Jan 14, 2026
Contributor
ptodev
left a comment
There was a problem hiding this comment.
LGTM, thanks! There's just a print statement we need to remove.
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Brief description of Pull Request
Previously we wrapped our file with the provided decoder and then consumed it using bufio.Reader. This had several problems:
Pull Request Details
To correctly handle file positions we need to operate on the raw bytes in the reader and (optionally) perform the decoding after. This makes sure we always record the correct offset.
Because we now operate on raw bytes we needed to change how we terminated newlines and carriage return. The decoder can translate it for us so we know what we should be looking for. We now also do additional buffering of non terminated lines and do not have to perform re-seeks after modified event when we consumed a partial line.
We are now also able to automatically detect UTF-8, UTF-16LE and UTF-16BE encodings and correctly decode it if there is a BOM present for the file.
Issue(s) fixed by this Pull Request
Notes to the Reviewer
PR Checklist