Skip to content

Fix handling of multiplexed long lines in Docker logs#4713

Merged
dehaansa merged 4 commits intografana:mainfrom
axd1x8a:fix/docker-long-lines
Dec 9, 2025
Merged

Fix handling of multiplexed long lines in Docker logs#4713
dehaansa merged 4 commits intografana:mainfrom
axd1x8a:fix/docker-long-lines

Conversation

@axd1x8a
Copy link
Contributor

@axd1x8a axd1x8a commented Oct 29, 2025

PR Description

By default, Docker splits lines over 16KB into multiple frames. stdcopy.StdCopy handles multiplexing itself, but not inserted timestamps, which corrupts logs:

{
    "a": 1033461950,
    "b": true
},
2025-10-17T22: 40: 09.574025386Z {
    "a": 1033471950,
    "b": true
},
{
    "a": 1034411950,
    "b": true
},

This PR adds a wrapper for stdcopy.StdCopy streams that checks if incoming streams are part of a larger log line and concatenates them into one, removing timestamp in the process.

Additionally, functions for reading streams and parsing timestamps were changed to operate on bytes to avoid allocations.

Which issue(s) this PR fixes

#4625
#1881

Notes to the Reviewer

Not sure about the policy on unsafe code here, but the usage of unsafe.String in extractTsFromBytes makes sense because we know that the bytes won't be modified and it gives just a little bit more performance from my testing.

PR Checklist

  • CHANGELOG.md updated
  • Tests updated

@axd1x8a axd1x8a requested a review from a team as a code owner October 29, 2025 18:01
@CLAassistant
Copy link

CLAassistant commented Oct 29, 2025

CLA assistant check
All committers have signed the CLA.

@axd1x8a axd1x8a changed the title Improve handling of multiplexed long lines in Docker logs Fix handling of multiplexed long lines in Docker logs Oct 29, 2025
@axd1x8a
Copy link
Contributor Author

axd1x8a commented Nov 20, 2025

I will check why tests fail and fix it later

@axd1x8a axd1x8a force-pushed the fix/docker-long-lines branch from da09bfa to 516f9de Compare November 23, 2025 00:06
@axd1x8a axd1x8a requested review from a team and clayton-cornell as code owners November 23, 2025 00:06
@axd1x8a axd1x8a force-pushed the fix/docker-long-lines branch from 516f9de to dda088c Compare November 23, 2025 00:06
@axd1x8a
Copy link
Contributor Author

axd1x8a commented Nov 23, 2025

Please let me know if PR needs to be rebased to keep up with the main branch. I don't want to run your CI runners unnecessarily

Copy link
Contributor

@dehaansa dehaansa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies that the review here has been slow! Thanks for this contribution, have a few comments to resolve.

Comment on lines +327 to +329
buf := &bytes.Buffer{}
buf.Grow(dockerMaxChunkSize * 2)
return buf
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May as well initialize with that size instead of Growing.

Suggested change
buf := &bytes.Buffer{}
buf.Grow(dockerMaxChunkSize * 2)
return buf
return bytes.NewBuffer(make([]byte, 0, dockerMaxChunkSize * 2))

}

return string(ln), err
return ts, line[spaceIdx+1:], nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems more fragile than the previous behavior using strings.SplitN. If, for example, the whole log is "TS " then spaceIdx is the last valid index, and this can panic. I understand that shouldn't happen, but I would rather have the more flexible code that is safer. bytes.SplitN exists, so I think we should just use it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to add an if statement spaceIdx == -1 || spaceIdx >= len(line)-1. This is because bytes.SplitN will allocate a slice, and compared to the unsafe.Stringvsstring case, this will lead to a significant performance regression

                                             │ with_unsafe.txt │           bytes_split.txt            │
                                             │     sec/op      │   sec/op     vs base                 │
ExtractTsFromBytes/ShortLines_10k-24               329.6µ ± 0%   536.7µ ± 2%   +62.85% (p=0.000 n=10)
ExtractTsFromBytes/MediumLines_1k-24               32.78µ ± 0%   51.73µ ± 2%   +57.80% (p=0.000 n=10)
ExtractTsFromBytes/LongLines_100-24                3.286µ ± 0%   5.189µ ± 2%   +57.94% (p=0.000 n=10)
ExtractTsFromBytes/VeryLongLines_50-24             1.646µ ± 1%   2.629µ ± 1%   +59.74% (p=0.000 n=10)
ExtractTsFromBytes/MixedWorkload-24                368.1µ ± 0%   598.2µ ± 4%   +62.51% (p=0.000 n=10)
ExtractTsFromBytes/Parallel_MixedWorkload-24       32.01µ ± 0%   86.20µ ± 2%  +169.32% (p=0.000 n=10)
geomean                                            29.71µ        51.89µ        +74.65%

                                             │ with_unsafe.txt │        bytes_split.txt         │
                                             │      B/op       │     B/op      vs base          │
ExtractTsFromBytes/ShortLines_10k-24              0.0Ki ± 0%     468.8Ki ± 0%  ? (p=0.000 n=10)
ExtractTsFromBytes/MediumLines_1k-24             0.00Ki ± 0%     46.88Ki ± 0%  ? (p=0.000 n=10)
ExtractTsFromBytes/LongLines_100-24             0.000Ki ± 0%     4.688Ki ± 0%  ? (p=0.000 n=10)
ExtractTsFromBytes/VeryLongLines_50-24          0.000Ki ± 0%     2.344Ki ± 0%  ? (p=0.000 n=10)
ExtractTsFromBytes/MixedWorkload-24               0.0Ki ± 0%     522.7Ki ± 0%  ? (p=0.000 n=10)
ExtractTsFromBytes/Parallel_MixedWorkload-24      0.0Ki ± 0%     522.7Ki ± 0%  ? (p=0.000 n=10)
geomean                                                      ¹   63.56Ki       ?
¹ summaries must be >0 to compute geomean

                                             │ with_unsafe.txt │        bytes_split.txt        │
                                             │    allocs/op    │  allocs/op   vs base          │
ExtractTsFromBytes/ShortLines_10k-24              0.00k ± 0%     10.00k ± 0%  ? (p=0.000 n=10)
ExtractTsFromBytes/MediumLines_1k-24             0.000k ± 0%     1.000k ± 0%  ? (p=0.000 n=10)
ExtractTsFromBytes/LongLines_100-24                 0.0 ± 0%      100.0 ± 0%  ? (p=0.000 n=10)
ExtractTsFromBytes/VeryLongLines_50-24             0.00 ± 0%      50.00 ± 0%  ? (p=0.000 n=10)
ExtractTsFromBytes/MixedWorkload-24               0.00k ± 0%     11.15k ± 0%  ? (p=0.000 n=10)
ExtractTsFromBytes/Parallel_MixedWorkload-24      0.00k ± 0%     11.15k ± 0%  ? (p=0.000 n=10)
geomean                                                      ¹   1.356k       ?
¹ summaries must be >0 to compute geomean

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. 👍

dockerLabel = model.MetaLabelPrefix + "docker_"
dockerLabelContainerPrefix = dockerLabel + "container_"
dockerLabelLogStream = dockerLabelContainerPrefix + "log_stream"
dockerMaxChunkSize = 16384
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any need to have this configurable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to have a constant with a self-describing name than a magic number in the code

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if I phrased that poorly - I was actually trying to understand if you think this should be user overrideable for any reason.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I don't think Docker will ever change this. When I searched for the original solution, I found that their answer to one of the issues was for the consumer libraries to adapt to this instead of changing the limit on Docker's side. So, yeah, I don't see the need for a configuration here

}
ts, err := time.Parse("2006-01-02T15:04:05.999999999Z07:00", pair[0])
// We know that the bytes won't be modified during the time.Parse call, so it's safe to use unsafe.String here
ts, err := time.Parse(timestampLayout, unsafe.String(&line[0], spaceIdx))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a single other use of unsafe.String in the codebase. If the performance benefit was significant I'm open to it, otherwise I'd rather leave it out.

Copy link
Contributor Author

@axd1x8a axd1x8a Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the results I got when I benchmarked string(line[:spaceIdx]) and the current solution

cpu: AMD Ryzen 9 7900X 12-Core Processor
                                                     │ without_unsafe.txt │           with_unsafe.txt           │
                                                     │       sec/op       │   sec/op     vs base                │
ExtractTsFromBytes/ShortLines_10k-24                 351.8µ ±  7%   331.7µ ± 1%   -5.69% (p=0.000 n=10)
ExtractTsFromBytes/MediumLines_1k-24                 38.29µ ±  3%   32.81µ ± 0%  -14.32% (p=0.000 n=10)
ExtractTsFromBytes/LongLines_100-24                  4.016µ ±  8%   3.280µ ± 0%  -18.33% (p=0.000 n=10)
ExtractTsFromBytes/VeryLongLines_50-24               2.036µ ± 10%   1.649µ ± 1%  -19.03% (p=0.000 n=10)
ExtractTsFromBytes/MixedWorkload-24                  441.7µ ±  1%   369.0µ ± 0%  -16.45% (p=0.000 n=10)
ExtractTsFromBytes/Parallel_MixedWorkload-24         34.44µ ±  2%   32.14µ ± 2%   -6.66% (p=0.000 n=10)
geomean                                                      34.46µ         29.78µ       -13.58%

                                                     │ without_unsafe.txt │           with_unsafe.txt           │
                                                     │        B/op        │    B/op     vs base                 │
ExtractTsFromBytes/ShortLines_10k-24                 0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
ExtractTsFromBytes/MediumLines_1k-24                 0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
ExtractTsFromBytes/LongLines_100-24                  0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
ExtractTsFromBytes/VeryLongLines_50-24               0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
ExtractTsFromBytes/MixedWorkload-24                  0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
ExtractTsFromBytes/Parallel_MixedWorkload-24         0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
geomean                                                                 ²               +0.00%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

                                                     │ without_unsafe.txt │           with_unsafe.txt           │
                                                     │     allocs/op      │ allocs/op   vs base                 │
ExtractTsFromBytes/ShortLines_10k-24                 0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
ExtractTsFromBytes/MediumLines_1k-24                 0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
ExtractTsFromBytes/LongLines_100-24                  0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
ExtractTsFromBytes/VeryLongLines_50-24               0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
ExtractTsFromBytes/MixedWorkload-24                  0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
ExtractTsFromBytes/Parallel_MixedWorkload-24         0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
geomean                                                                 ²               +0.00%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean
              lines line size   total data
ShortLines    10000 ~100 bytes  ~1 MB
MediumLines   1000  ~2 KB       ~2 MB
LongLines     100   ~50 KB      ~5 MB
VeryLongLines 50    ~500 KB     ~25 MB
MixedWorkload 11150	mixed       ~33 MB

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that seems worthwhile. 👍

CHANGELOG.md Outdated

### Bugfixes

- Fix Docker log corruption for multiplexed long lines. (@axd1x8a)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changelog likely needs to be manually updated from main to hit the unreleased section.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can rebase to main and also move this line up to the unreleased section if needed

Copy link
Contributor

@dehaansa dehaansa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once you rebase to main and CI is happy we can get this merged. Thanks for your patience.

@axd1x8a axd1x8a force-pushed the fix/docker-long-lines branch 2 times, most recently from 71f15cf to bab70e7 Compare December 9, 2025 20:37
@dehaansa dehaansa enabled auto-merge (squash) December 9, 2025 20:41
@dehaansa dehaansa merged commit cccbda7 into grafana:main Dec 9, 2025
39 checks passed
@axd1x8a axd1x8a deleted the fix/docker-long-lines branch December 9, 2025 21:00
dehaansa pushed a commit to iamrajiv/alloy that referenced this pull request Dec 10, 2025
* Improve docker's multiplexed long lines handling

* Fix `dockerChunkWriter` buffer flushing and add test for it

* Update `CHANGELOG.md`

* Fix potential panic in `extractTsFromBytes` and `dockerChunkWriter` buffer allocation
dehaansa pushed a commit to madhub/alloy that referenced this pull request Dec 10, 2025
* Improve docker's multiplexed long lines handling

* Fix `dockerChunkWriter` buffer flushing and add test for it

* Update `CHANGELOG.md`

* Fix potential panic in `extractTsFromBytes` and `dockerChunkWriter` buffer allocation
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 24, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants