Skip to content

gosigar: upgrade to latest version.#11456

Merged
sluongng merged 1 commit intomasterfrom
sluongng/upgrade-gosigar
Mar 2, 2026
Merged

gosigar: upgrade to latest version.#11456
sluongng merged 1 commit intomasterfrom
sluongng/upgrade-gosigar

Conversation

@sluongng
Copy link
Contributor

@sluongng sluongng commented Mar 2, 2026

In cases where the container image has too many layers (>21), the
overlayfs mount command in /etc/mtab file could get very long. When
we use gosigar to list the mounts inside the microvm, it would attempt
to parse the mtab file and crashed

panic: runtime error: index out of range [3] with length 3

goroutine 376 [running]:
github.com/elastic/gosigar.(*FileSystemList).Get.func1({0x2b245cf15300?, 0x2b245d207000?})
        external/gazelle++go_deps+com_github_elastic_gosigar/sigar_linux_common.go:113 +0x1ac
github.com/elastic/gosigar.readFile({0x16a5c3e?, 0x2b245d08cd18?}, 0x2b245d08cd08)
        external/gazelle++go_deps+com_github_elastic_gosigar/sigar_linux_common.go:386 +0x1db
github.com/elastic/gosigar.(*FileSystemList).Get(0x2b245d08ce40)
        external/gazelle++go_deps+com_github_elastic_gosigar/sigar_linux_common.go:106 +0x7e
github.com/buildbuddy-io/buildbuddy/enterprise/server/vmexec.getFileSystemUsage()
        external/+git_repository+com_github_buildbuddy_io_buildbuddy/enterprise/server/vmexec/vmexec.go:672 +0x3f
github.com/buildbuddy-io/buildbuddy/enterprise/server/vmexec.(*command).Run.func3()

The issue is particularly tricky to detect because failures
are automatically retried by Bazel 9 multiple times, which means
sometimes the executions may failed invisibly with a green invocation.
However it seems to be pretty easy to reproduce with the following

bazel test \
      --config=remote-minimal \
      --runs_per_test=20 \
      --test_sharding_strategy=disabled \
      --test_filter=TestHighLayerCount \
      --remote_retries=0 \
      enterprise/server/remote_execution/containers/ociruntime/...

with --remote_retries=0 forcing Bazel to accept the first failure.

This is a bug from gosigar because the old parser used bufio.ReadLine
which comes with a size limit

ReadLine tries to return a single line, not including the end-of-line bytes.
If the line was too long for the buffer then isPrefix is set and the
beginning of the line is returned. The rest of the line will be returned
from future calls. isPrefix will be false when returning the last fragment
of the line. The returned buffer is only valid until the next call to
ReadLine. ReadLine either returns a non-nil line or it returns an error,
never both.

And gosigar old code was not properly dealt with this limit.

func readFile(file string, handler func(string) bool) error {
        contents, err := ioutil.ReadFile(file)
        if err != nil {
                return err
        }

        reader := bufio.NewReader(bytes.NewBuffer(contents))

        for {
                line, _, err := reader.ReadLine()
                if err == io.EOF {
                        break
                }
                if !handler(string(line)) {
                        break
                }
        }

        return nil
}

This issue was fixed in upstream master branch, but there has been
no release since the fix was merged.

elastic/gosigar@d69e91c

Upgrade gosigar to the latest commit on 'master' branch, which includes
the fix above.

@sluongng sluongng requested review from bduffany and fmeum March 2, 2026 12:32
@sluongng
Copy link
Contributor Author

sluongng commented Mar 2, 2026

Note that merging this won't fix the test immediately because we are still using vmexec from an older release to run the tests.

In cases where the container image has too many layers (>21), the
overlayfs mount command in /etc/mtab file could get very long. When
we use gosigar to list the mounts inside the microvm, it would attempt
to parse the mtab file and crashed

  ```
  panic: runtime error: index out of range [3] with length 3

  goroutine 376 [running]:
  github.com/elastic/gosigar.(*FileSystemList).Get.func1({0x2b245cf15300?, 0x2b245d207000?})
          external/gazelle++go_deps+com_github_elastic_gosigar/sigar_linux_common.go:113 +0x1ac
  github.com/elastic/gosigar.readFile({0x16a5c3e?, 0x2b245d08cd18?}, 0x2b245d08cd08)
          external/gazelle++go_deps+com_github_elastic_gosigar/sigar_linux_common.go:386 +0x1db
  github.com/elastic/gosigar.(*FileSystemList).Get(0x2b245d08ce40)
          external/gazelle++go_deps+com_github_elastic_gosigar/sigar_linux_common.go:106 +0x7e
  github.com/buildbuddy-io/buildbuddy/enterprise/server/vmexec.getFileSystemUsage()
          external/+git_repository+com_github_buildbuddy_io_buildbuddy/enterprise/server/vmexec/vmexec.go:672 +0x3f
  github.com/buildbuddy-io/buildbuddy/enterprise/server/vmexec.(*command).Run.func3()
  ```

The issue is particularly tricky to detect because failures
are automatically retried by Bazel 9 multiple times, which means
sometimes the executions may failed invisibly with a green invocation.
However it seems to be pretty easy to reproduce with the following

```
bazel test \
      --config=remote-minimal \
      --runs_per_test=20 \
      --test_sharding_strategy=disabled \
      --test_filter=TestHighLayerCount \
      --remote_retries=0 \
      enterprise/server/remote_execution/containers/ociruntime/...
```

with `--remote_retries=0` forcing Bazel to accept the first failure.

This is a bug from gosigar because the old parser used bufio.ReadLine
which comes with a size limit

> ReadLine tries to return a single line, not including the end-of-line bytes.
> If the line was too long for the buffer then isPrefix is set and the
> beginning of the line is returned. The rest of the line will be returned
> from future calls. isPrefix will be false when returning the last fragment
> of the line. The returned buffer is only valid until the next call to
> ReadLine. ReadLine either returns a non-nil line or it returns an error,
> never both.

And gosigar old code was not properly dealt with this limit.

  ```
  func readFile(file string, handler func(string) bool) error {
          contents, err := ioutil.ReadFile(file)
          if err != nil {
                  return err
          }

          reader := bufio.NewReader(bytes.NewBuffer(contents))

          for {
                  line, _, err := reader.ReadLine()
                  if err == io.EOF {
                          break
                  }
                  if !handler(string(line)) {
                          break
                  }
          }

          return nil
  }
  ```

This issue was fixed in upstream master branch, but there has been
no release since the fix was merged.

elastic/gosigar@d69e91c

Upgrade gosigar to the latest commit on 'master' branch, which includes
the fix above.
@sluongng sluongng force-pushed the sluongng/upgrade-gosigar branch from 708155d to 9e55f12 Compare March 2, 2026 12:33
@sluongng sluongng merged commit 89159b4 into master Mar 2, 2026
13 checks passed
@sluongng sluongng deleted the sluongng/upgrade-gosigar branch March 2, 2026 13:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants