Skip to content

fix: Issues when reading files using non UTF-8 encoding in loki.source.file #5259

Merged
kalleep merged 21 commits intomainfrom
kalleep/loki-source-file-decoding
Jan 15, 2026
Merged

fix: Issues when reading files using non UTF-8 encoding in loki.source.file #5259
kalleep merged 21 commits intomainfrom
kalleep/loki-source-file-decoding

Conversation

@kalleep
Copy link
Contributor

@kalleep kalleep commented Jan 14, 2026

Brief description of Pull Request

Previously we wrapped our file with the provided decoder and then consumed it using bufio.Reader. This had several problems:

  1. We did not report correct offset into the file
  2. If you used UTF-16 and we had a stored offset we could not determine the correct encoding (UTF-16LE or UTF-16BE).

Pull Request Details

To correctly handle file positions we need to operate on the raw bytes in the reader and (optionally) perform the decoding after. This makes sure we always record the correct offset.

Because we now operate on raw bytes we needed to change how we terminated newlines and carriage return. The decoder can translate it for us so we know what we should be looking for. We now also do additional buffering of non terminated lines and do not have to perform re-seeks after modified event when we consumed a partial line.

We are now also able to automatically detect UTF-8, UTF-16LE and UTF-16BE encodings and correctly decode it if there is a BOM present for the file.

Issue(s) fixed by this Pull Request

Notes to the Reviewer

PR Checklist

  • Documentation added
  • Tests updated
  • Config converters updated

@kalleep kalleep requested a review from a team as a code owner January 14, 2026 10:58
@github-actions
Copy link
Contributor

github-actions bot commented Jan 14, 2026

💻 Deploy preview deleted (fix: Issues when reading files using non UTF-8 encoding in loki.source.file ).

@clayton-cornell clayton-cornell added the type/docs Docs Squad label across all Grafana Labs repos label Jan 14, 2026
kalleep and others added 3 commits January 14, 2026 17:16
it works and we can continue from a recorded offset
Co-authored-by: Clayton Cornell <131809008+clayton-cornell@users.noreply.github.com>
Co-authored-by: Clayton Cornell <131809008+clayton-cornell@users.noreply.github.com>
Copy link
Contributor

@ptodev ptodev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks! There's just a print statement we need to remove.

@kalleep kalleep merged commit 4740276 into main Jan 15, 2026
48 checks passed
@kalleep kalleep deleted the kalleep/loki-source-file-decoding branch January 15, 2026 08:51
@grafana-alloybot grafana-alloybot bot mentioned this pull request Jan 14, 2026
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jan 30, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

frozen-due-to-age type/docs Docs Squad label across all Grafana Labs repos

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants