Merged
Conversation
Contributor
Author
|
Note that merging this won't fix the test immediately because we are still using vmexec from an older release to run the tests. |
In cases where the container image has too many layers (>21), the
overlayfs mount command in /etc/mtab file could get very long. When
we use gosigar to list the mounts inside the microvm, it would attempt
to parse the mtab file and crashed
```
panic: runtime error: index out of range [3] with length 3
goroutine 376 [running]:
github.com/elastic/gosigar.(*FileSystemList).Get.func1({0x2b245cf15300?, 0x2b245d207000?})
external/gazelle++go_deps+com_github_elastic_gosigar/sigar_linux_common.go:113 +0x1ac
github.com/elastic/gosigar.readFile({0x16a5c3e?, 0x2b245d08cd18?}, 0x2b245d08cd08)
external/gazelle++go_deps+com_github_elastic_gosigar/sigar_linux_common.go:386 +0x1db
github.com/elastic/gosigar.(*FileSystemList).Get(0x2b245d08ce40)
external/gazelle++go_deps+com_github_elastic_gosigar/sigar_linux_common.go:106 +0x7e
github.com/buildbuddy-io/buildbuddy/enterprise/server/vmexec.getFileSystemUsage()
external/+git_repository+com_github_buildbuddy_io_buildbuddy/enterprise/server/vmexec/vmexec.go:672 +0x3f
github.com/buildbuddy-io/buildbuddy/enterprise/server/vmexec.(*command).Run.func3()
```
The issue is particularly tricky to detect because failures
are automatically retried by Bazel 9 multiple times, which means
sometimes the executions may failed invisibly with a green invocation.
However it seems to be pretty easy to reproduce with the following
```
bazel test \
--config=remote-minimal \
--runs_per_test=20 \
--test_sharding_strategy=disabled \
--test_filter=TestHighLayerCount \
--remote_retries=0 \
enterprise/server/remote_execution/containers/ociruntime/...
```
with `--remote_retries=0` forcing Bazel to accept the first failure.
This is a bug from gosigar because the old parser used bufio.ReadLine
which comes with a size limit
> ReadLine tries to return a single line, not including the end-of-line bytes.
> If the line was too long for the buffer then isPrefix is set and the
> beginning of the line is returned. The rest of the line will be returned
> from future calls. isPrefix will be false when returning the last fragment
> of the line. The returned buffer is only valid until the next call to
> ReadLine. ReadLine either returns a non-nil line or it returns an error,
> never both.
And gosigar old code was not properly dealt with this limit.
```
func readFile(file string, handler func(string) bool) error {
contents, err := ioutil.ReadFile(file)
if err != nil {
return err
}
reader := bufio.NewReader(bytes.NewBuffer(contents))
for {
line, _, err := reader.ReadLine()
if err == io.EOF {
break
}
if !handler(string(line)) {
break
}
}
return nil
}
```
This issue was fixed in upstream master branch, but there has been
no release since the fix was merged.
elastic/gosigar@d69e91c
Upgrade gosigar to the latest commit on 'master' branch, which includes
the fix above.
708155d to
9e55f12
Compare
fmeum
approved these changes
Mar 2, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
In cases where the container image has too many layers (>21), the
overlayfs mount command in /etc/mtab file could get very long. When
we use gosigar to list the mounts inside the microvm, it would attempt
to parse the mtab file and crashed
The issue is particularly tricky to detect because failures
are automatically retried by Bazel 9 multiple times, which means
sometimes the executions may failed invisibly with a green invocation.
However it seems to be pretty easy to reproduce with the following
with
--remote_retries=0forcing Bazel to accept the first failure.This is a bug from gosigar because the old parser used bufio.ReadLine
which comes with a size limit
And gosigar old code was not properly dealt with this limit.
This issue was fixed in upstream master branch, but there has been
no release since the fix was merged.
elastic/gosigar@d69e91c
Upgrade gosigar to the latest commit on 'master' branch, which includes
the fix above.