Fix handling of multiplexed long lines in Docker logs by axd1x8a · Pull Request #4713 · grafana/alloy

axd1x8a · 2025-10-29T18:01:33Z

PR Description

By default, Docker splits lines over 16KB into multiple frames. stdcopy.StdCopy handles multiplexing itself, but not inserted timestamps, which corrupts logs:

{
    "a": 1033461950,
    "b": true
},
2025-10-17T22: 40: 09.574025386Z {
    "a": 1033471950,
    "b": true
},
{
    "a": 1034411950,
    "b": true
},

This PR adds a wrapper for stdcopy.StdCopy streams that checks if incoming streams are part of a larger log line and concatenates them into one, removing timestamp in the process.

Additionally, functions for reading streams and parsing timestamps were changed to operate on bytes to avoid allocations.

Which issue(s) this PR fixes

#4625
#1881

Notes to the Reviewer

Not sure about the policy on unsafe code here, but the usage of unsafe.String in extractTsFromBytes makes sense because we know that the bytes won't be modified and it gives just a little bit more performance from my testing.

PR Checklist

CHANGELOG.md updated
Tests updated

CLAassistant · 2025-10-29T18:01:40Z

All committers have signed the CLA.

axd1x8a · 2025-11-20T23:08:26Z

I will check why tests fail and fix it later

axd1x8a · 2025-11-23T00:09:37Z

Please let me know if PR needs to be rebased to keep up with the main branch. I don't want to run your CI runners unnecessarily

dehaansa

Apologies that the review here has been slow! Thanks for this contribution, have a few comments to resolve.

dehaansa · 2025-11-20T22:01:33Z

internal/component/loki/source/docker/internal/dockertarget/target.go

+		buf := &bytes.Buffer{}
+		buf.Grow(dockerMaxChunkSize * 2)
+		return buf


May as well initialize with that size instead of Growing.

Suggested change

buf := &bytes.Buffer{}

buf.Grow(dockerMaxChunkSize * 2)

return buf

return bytes.NewBuffer(make([]byte, 0, dockerMaxChunkSize * 2))

dehaansa · 2025-12-09T18:35:35Z

internal/component/loki/source/docker/internal/dockertarget/target.go

 	}
-
-	return string(ln), err
+	return ts, line[spaceIdx+1:], nil


This seems more fragile than the previous behavior using strings.SplitN. If, for example, the whole log is "TS " then spaceIdx is the last valid index, and this can panic. I understand that shouldn't happen, but I would rather have the more flexible code that is safer. bytes.SplitN exists, so I think we should just use it.

I think it's better to add an if statement spaceIdx == -1 || spaceIdx >= len(line)-1. This is because bytes.SplitN will allocate a slice, and compared to the unsafe.Stringvsstring case, this will lead to a significant performance regression

│ with_unsafe.txt │ bytes_split.txt │ │ sec/op │ sec/op vs base │ ExtractTsFromBytes/ShortLines_10k-24 329.6µ ± 0% 536.7µ ± 2% +62.85% (p=0.000 n=10) ExtractTsFromBytes/MediumLines_1k-24 32.78µ ± 0% 51.73µ ± 2% +57.80% (p=0.000 n=10) ExtractTsFromBytes/LongLines_100-24 3.286µ ± 0% 5.189µ ± 2% +57.94% (p=0.000 n=10) ExtractTsFromBytes/VeryLongLines_50-24 1.646µ ± 1% 2.629µ ± 1% +59.74% (p=0.000 n=10) ExtractTsFromBytes/MixedWorkload-24 368.1µ ± 0% 598.2µ ± 4% +62.51% (p=0.000 n=10) ExtractTsFromBytes/Parallel_MixedWorkload-24 32.01µ ± 0% 86.20µ ± 2% +169.32% (p=0.000 n=10) geomean 29.71µ 51.89µ +74.65% │ with_unsafe.txt │ bytes_split.txt │ │ B/op │ B/op vs base │ ExtractTsFromBytes/ShortLines_10k-24 0.0Ki ± 0% 468.8Ki ± 0% ? (p=0.000 n=10) ExtractTsFromBytes/MediumLines_1k-24 0.00Ki ± 0% 46.88Ki ± 0% ? (p=0.000 n=10) ExtractTsFromBytes/LongLines_100-24 0.000Ki ± 0% 4.688Ki ± 0% ? (p=0.000 n=10) ExtractTsFromBytes/VeryLongLines_50-24 0.000Ki ± 0% 2.344Ki ± 0% ? (p=0.000 n=10) ExtractTsFromBytes/MixedWorkload-24 0.0Ki ± 0% 522.7Ki ± 0% ? (p=0.000 n=10) ExtractTsFromBytes/Parallel_MixedWorkload-24 0.0Ki ± 0% 522.7Ki ± 0% ? (p=0.000 n=10) geomean ¹ 63.56Ki ? ¹ summaries must be >0 to compute geomean │ with_unsafe.txt │ bytes_split.txt │ │ allocs/op │ allocs/op vs base │ ExtractTsFromBytes/ShortLines_10k-24 0.00k ± 0% 10.00k ± 0% ? (p=0.000 n=10) ExtractTsFromBytes/MediumLines_1k-24 0.000k ± 0% 1.000k ± 0% ? (p=0.000 n=10) ExtractTsFromBytes/LongLines_100-24 0.0 ± 0% 100.0 ± 0% ? (p=0.000 n=10) ExtractTsFromBytes/VeryLongLines_50-24 0.00 ± 0% 50.00 ± 0% ? (p=0.000 n=10) ExtractTsFromBytes/MixedWorkload-24 0.00k ± 0% 11.15k ± 0% ? (p=0.000 n=10) ExtractTsFromBytes/Parallel_MixedWorkload-24 0.00k ± 0% 11.15k ± 0% ? (p=0.000 n=10) geomean ¹ 1.356k ? ¹ summaries must be >0 to compute geomean

Sounds good. 👍

dehaansa · 2025-12-09T18:36:43Z

internal/component/loki/source/docker/internal/dockertarget/target.go

 	dockerLabel                = model.MetaLabelPrefix + "docker_"
 	dockerLabelContainerPrefix = dockerLabel + "container_"
 	dockerLabelLogStream       = dockerLabelContainerPrefix + "log_stream"
+	dockerMaxChunkSize         = 16384


Is there any need to have this configurable?

I think it's better to have a constant with a self-describing name than a magic number in the code

Sorry if I phrased that poorly - I was actually trying to understand if you think this should be user overrideable for any reason.

Well, I don't think Docker will ever change this. When I searched for the original solution, I found that their answer to one of the issues was for the consumer libraries to adapt to this instead of changing the limit on Docker's side. So, yeah, I don't see the need for a configuration here

dehaansa · 2025-12-09T18:42:22Z

internal/component/loki/source/docker/internal/dockertarget/target.go

 	}
-	ts, err := time.Parse("2006-01-02T15:04:05.999999999Z07:00", pair[0])
+	// We know that the bytes won't be modified during the time.Parse call, so it's safe to use unsafe.String here
+	ts, err := time.Parse(timestampLayout, unsafe.String(&line[0], spaceIdx))


I don't see a single other use of unsafe.String in the codebase. If the performance benefit was significant I'm open to it, otherwise I'd rather leave it out.

These are the results I got when I benchmarked string(line[:spaceIdx]) and the current solution

cpu: AMD Ryzen 9 7900X 12-Core Processor │ without_unsafe.txt │ with_unsafe.txt │ │ sec/op │ sec/op vs base │ ExtractTsFromBytes/ShortLines_10k-24 351.8µ ± 7% 331.7µ ± 1% -5.69% (p=0.000 n=10) ExtractTsFromBytes/MediumLines_1k-24 38.29µ ± 3% 32.81µ ± 0% -14.32% (p=0.000 n=10) ExtractTsFromBytes/LongLines_100-24 4.016µ ± 8% 3.280µ ± 0% -18.33% (p=0.000 n=10) ExtractTsFromBytes/VeryLongLines_50-24 2.036µ ± 10% 1.649µ ± 1% -19.03% (p=0.000 n=10) ExtractTsFromBytes/MixedWorkload-24 441.7µ ± 1% 369.0µ ± 0% -16.45% (p=0.000 n=10) ExtractTsFromBytes/Parallel_MixedWorkload-24 34.44µ ± 2% 32.14µ ± 2% -6.66% (p=0.000 n=10) geomean 34.46µ 29.78µ -13.58% │ without_unsafe.txt │ with_unsafe.txt │ │ B/op │ B/op vs base │ ExtractTsFromBytes/ShortLines_10k-24 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ ExtractTsFromBytes/MediumLines_1k-24 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ ExtractTsFromBytes/LongLines_100-24 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ ExtractTsFromBytes/VeryLongLines_50-24 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ ExtractTsFromBytes/MixedWorkload-24 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ ExtractTsFromBytes/Parallel_MixedWorkload-24 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ geomean ² +0.00% ² ¹ all samples are equal ² summaries must be >0 to compute geomean │ without_unsafe.txt │ with_unsafe.txt │ │ allocs/op │ allocs/op vs base │ ExtractTsFromBytes/ShortLines_10k-24 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ ExtractTsFromBytes/MediumLines_1k-24 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ ExtractTsFromBytes/LongLines_100-24 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ ExtractTsFromBytes/VeryLongLines_50-24 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ ExtractTsFromBytes/MixedWorkload-24 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ ExtractTsFromBytes/Parallel_MixedWorkload-24 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ geomean ² +0.00% ² ¹ all samples are equal ² summaries must be >0 to compute geomean

lines line size total data ShortLines 10000 ~100 bytes ~1 MB MediumLines 1000 ~2 KB ~2 MB LongLines 100 ~50 KB ~5 MB VeryLongLines 50 ~500 KB ~25 MB MixedWorkload 11150 mixed ~33 MB

Sure, that seems worthwhile. 👍

…uffer allocation

dehaansa · 2025-12-09T20:19:36Z

CHANGELOG.md


 ### Bugfixes

+- Fix Docker log corruption for multiplexed long lines. (@axd1x8a)


Changelog likely needs to be manually updated from main to hit the unreleased section.

I can rebase to main and also move this line up to the unreleased section if needed

dehaansa

Once you rebase to main and CI is happy we can get this merged. Thanks for your patience.

* Improve docker's multiplexed long lines handling * Fix `dockerChunkWriter` buffer flushing and add test for it * Update `CHANGELOG.md` * Fix potential panic in `extractTsFromBytes` and `dockerChunkWriter` buffer allocation

Improve docker's multiplexed long lines handling

cf95a9e

axd1x8a requested a review from a team as a code owner October 29, 2025 18:01

axd1x8a added 2 commits October 29, 2025 21:21

Fix dockerChunkWriter buffer flushing and add test for it

85d9012

Update CHANGELOG.md

1aff676

axd1x8a changed the title ~~Improve handling of multiplexed long lines in Docker logs~~ Fix handling of multiplexed long lines in Docker logs Oct 29, 2025

axd1x8a force-pushed the fix/docker-long-lines branch from da09bfa to 516f9de Compare November 23, 2025 00:06

axd1x8a requested review from a team and clayton-cornell as code owners November 23, 2025 00:06

axd1x8a force-pushed the fix/docker-long-lines branch from 516f9de to dda088c Compare November 23, 2025 00:06

dehaansa reviewed Dec 9, 2025

View reviewed changes

Fix potential panic in extractTsFromBytes and dockerChunkWriter b…

bab70e7

…uffer allocation

dehaansa reviewed Dec 9, 2025

View reviewed changes

dehaansa approved these changes Dec 9, 2025

View reviewed changes

axd1x8a force-pushed the fix/docker-long-lines branch 2 times, most recently from 71f15cf to bab70e7 Compare December 9, 2025 20:37

dehaansa enabled auto-merge (squash) December 9, 2025 20:41

dehaansa merged commit cccbda7 into grafana:main Dec 9, 2025
39 checks passed

axd1x8a deleted the fix/docker-long-lines branch December 9, 2025 21:00

github-actions bot added the frozen-due-to-age label Dec 24, 2025

github-actions bot locked as resolved and limited conversation to collaborators Dec 24, 2025


		### Bugfixes

		- Fix Docker log corruption for multiplexed long lines. (@axd1x8a)

Conversation

axd1x8a commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Description

Which issue(s) this PR fixes

Notes to the Reviewer

PR Checklist

Uh oh!

CLAassistant commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

axd1x8a commented Nov 20, 2025

Uh oh!

axd1x8a commented Nov 23, 2025

Uh oh!

dehaansa left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

axd1x8a Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dehaansa left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

axd1x8a commented Oct 29, 2025 •

edited

Loading

CLAassistant commented Oct 29, 2025 •

edited

Loading

axd1x8a Dec 9, 2025 •

edited

Loading