Skip to content

Experimental support for cgroups v2#172

Merged
jpalermo merged 1 commit intomasterfrom
cgroup-v2-support
Aug 10, 2024
Merged

Experimental support for cgroups v2#172
jpalermo merged 1 commit intomasterfrom
cgroup-v2-support

Conversation

@aramprice
Copy link
Copy Markdown
Member

@aramprice aramprice commented Jun 28, 2024

These changes do not appear to impact the behavior of bpm when running on an ubuntu-jammy based stemcell (cgroups v1). It should be safe to merge this as the behavior of the code handling cgroup-v1 has not changed.


Previous context left for posterity:

Currently one test in the integration specs is failing. Unclear if this is the fault of my docker setup or if this represents an actual issue with how runc is being setup.

Tests can be run as follows:

# from the repo root
cd src/bpm/
./scripts/test-unit --keep-going

Example of the failure I'm seeing when running these tests from within the container created using ./scripts/start-docker

------------------------------
• [FAILED] [0.167 seconds]
resource limits memory [It] gets OOMed when it exceeds its memory limit
/bpm/src/bpm/integration/resource_limits_test.go:116

  Timeline >>
  If this test fails, then make sure you have enabled swap accounting! Details are in the README.
  Error: failed to start job-process: exit status 1
  [FAILED] in [It] - /bpm/src/bpm/integration/resource_limits_test.go:122 @ 06/28/24 22:07:30.852
  BEGIN '/bpmtmp/resource-limits-test1115196611/sys/log/0cf7c73a-0b65-4f46-90a7-eb0bb809e6c2/0cf7c73a-0b65-4f46-90a7-eb0bb809e6c2.stderr.log'
  time="2024-06-28T22:07:30Z" level=warning msg="unable to get oom kill count" error="openat2 /sys/fs/cgroup/bpm-0cf7c73a-0b65-4f46-90a7-eb0bb809e6c2/memory.events: no such file or directory"
  time="2024-06-28T22:07:30Z" level=error msg="runc run failed: unable to start container process: unable to apply cgroup configuration: cannot enter cgroupv2 \"/sys/fs/cgroup/bpm-0cf7c73a-0b65-4f46-90a7-eb0bb809e6c2\" with domain controllers -- it is in an invalid state"
  END   '/bpmtmp/resource-limits-test1115196611/sys/log/0cf7c73a-0b65-4f46-90a7-eb0bb809e6c2/0cf7c73a-0b65-4f46-90a7-eb0bb809e6c2.stderr.log'
  BEGIN '/bpmtmp/resource-limits-test1115196611/sys/log/0cf7c73a-0b65-4f46-90a7-eb0bb809e6c2/0cf7c73a-0b65-4f46-90a7-eb0bb809e6c2.stdout.log'
  END   '/bpmtmp/resource-limits-test1115196611/sys/log/0cf7c73a-0b65-4f46-90a7-eb0bb809e6c2/0cf7c73a-0b65-4f46-90a7-eb0bb809e6c2.stdout.log'
  << Timeline

  [FAILED] Expected
      <int>: 1
  to match exit code:
      <int>: 0
  In [It] at: /bpm/src/bpm/integration/resource_limits_test.go:122 @ 06/28/24 22:07:30.852
------------------------------
••••••••••••••••••••••••••
------------------------------
• [FAILED] [0.364 seconds]
start when a broken runc configuration is left on the system [It] `bpm start` cleans up the broken-ness and starts it
/bpm/src/bpm/integration/start_test.go:329

  Timeline >>
  Error: failed to start job-process: exit status 1
  [FAILED] in [It] - /bpm/src/bpm/integration/start_test.go:337 @ 06/28/24 22:07:31.915
  BEGIN '/bpmtmp/start-test2475062763/sys/log/e599a26c-5d89-421d-a740-04dd490c314b/e599a26c-5d89-421d-a740-04dd490c314b.stdout.log'
  en_US.UTF-8
  Logging to STDOUT
  Received a TERM signal
  END   '/bpmtmp/start-test2475062763/sys/log/e599a26c-5d89-421d-a740-04dd490c314b/e599a26c-5d89-421d-a740-04dd490c314b.stdout.log'
  BEGIN '/bpmtmp/start-test2475062763/sys/log/e599a26c-5d89-421d-a740-04dd490c314b/e599a26c-5d89-421d-a740-04dd490c314b.stderr.log'
  Logging to STDERR
  [WARN  tini (1)] Reaped zombie process with pid=8
  time="2024-06-28T22:07:31Z" level=error msg="runc run failed: unable to get cgroup PIDs: read /sys/fs/cgroup/bpm-e599a26c-5d89-421d-a740-04dd490c314b/cgroup.procs: operation not supported"
  END   '/bpmtmp/start-test2475062763/sys/log/e599a26c-5d89-421d-a740-04dd490c314b/e599a26c-5d89-421d-a740-04dd490c314b.stderr.log'
  << Timeline

  [FAILED] Expected
      <int>: 1
  to match exit code:
      <int>: 0
  In [It] at: /bpm/src/bpm/integration/start_test.go:337 @ 06/28/24 22:07:31.915
------------------------------
•••••••••••••••••••••••••••••

Summarizing 2 Failures:
  [FAIL] resource limits memory [It] gets OOMed when it exceeds its memory limit
  /bpm/src/bpm/integration/resource_limits_test.go:122
  [FAIL] start when a broken runc configuration is left on the system [It] `bpm start` cleans up the broken-ness and starts it
  /bpm/src/bpm/integration/start_test.go:337

Ran 69 of 69 Specs in 27.622 seconds
FAIL! -- 67 Passed | 2 Failed | 0 Pending | 0 Skipped

@aramprice aramprice force-pushed the cgroup-v2-support branch from 98e12f7 to 1d1fcee Compare June 28, 2024 22:35
@aramprice aramprice self-assigned this Jul 2, 2024
@aramprice aramprice marked this pull request as ready for review July 2, 2024 17:41
@rkoster rkoster requested review from a team, klakin-pivotal and ystros and removed request for a team July 4, 2024 14:50
@rkoster
Copy link
Copy Markdown
Contributor

rkoster commented Jul 4, 2024

When testing these changes feel free to grab a stemcell from here:

  • storage.googleapis.com/bosh-core-stemcells-candidate/google/bosh-stemcell-0.59-google-kvm-ubuntu-noble-go_agent.tgz
  • storage.googleapis.com/bosh-core-stemcells-candidate/aws/bosh-stemcell-0.59-aws-xen-hvm-ubuntu-noble-go_agent.tgz
  • storage.googleapis.com/bosh-core-stemcells-candidate/azure/bosh-stemcell-0.59-azure-hyperv-ubuntu-noble-go_agent.tgz

Source: https://bosh.ci.cloudfoundry.org/teams/stemcell/pipelines/stemcells-ubuntu-noble/

@jpalermo
Copy link
Copy Markdown

Hey @ystros and @klakin-pivotal. Just bumping this up in case you forgot.

Copy link
Copy Markdown
Contributor

@ystros ystros left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of note, to get the above errors, you either need to set a GOFLAGS=-buildvcs=false environment variable or run git config --global --add safe.directory /bpm in the docker container. Otherwise, you get different errors due to permission issues from running as root.

Running the tests on a Mac gets further if you specify the --cgroupns=host when running Docker. When doing this, you get a slightly different error (still about the "exceeds memory" test, but no cgroup warnings / errors):

• [FAILED] [0.326 seconds]
resource limits memory [It] gets OOMed when it exceeds its memory limit
/bpm/src/bpm/integration/resource_limits_test.go:116

  Timeline >>
  If this test fails, then make sure you have enabled swap accounting! Details are in the README.
  [FAILED] in [It] - /bpm/src/bpm/integration/resource_limits_test.go:133 @ 07/25/24 22:44:09.029
  BEGIN '/bpmtmp/resource-limits-test1271800040/sys/log/e4f2b6d1-2f8a-4160-ab0d-d465e8d7a351/e4f2b6d1-2f8a-4160-ab0d-d465e8d7a351.stderr.log'
  END   '/bpmtmp/resource-limits-test1271800040/sys/log/e4f2b6d1-2f8a-4160-ab0d-d465e8d7a351/e4f2b6d1-2f8a-4160-ab0d-d465e8d7a351.stderr.log'
  BEGIN '/bpmtmp/resource-limits-test1271800040/sys/log/e4f2b6d1-2f8a-4160-ab0d-d465e8d7a351/e4f2b6d1-2f8a-4160-ab0d-d465e8d7a351.stdout.log'
  END   '/bpmtmp/resource-limits-test1271800040/sys/log/e4f2b6d1-2f8a-4160-ab0d-d465e8d7a351/e4f2b6d1-2f8a-4160-ab0d-d465e8d7a351.stdout.log'
  << Timeline

  [FAILED] No future change is possible.  Bailing out early after 0.171s.
  Expected
      <chan integration_test.event | len:0, cap:0>: 0xc00013ede0
  to receive something. The channel is closed.
  In [It] at: /bpm/src/bpm/integration/resource_limits_test.go:133 @ 07/25/24 22:44:09.029
------------------------------

It also fails running on a Noble-deployed VM (timing out there instead of prematurely closing):

resource limits memory [It] gets OOMed when it exceeds its memory limit
/var/vcap/bosh_ssh/bosh_c2ef69c73e50468/bpm-release/src/bpm/integration/resource_limits_test.go:116

  Timeline >>
  If this test fails, then make sure you have enabled swap accounting! Details are in the README.
  [FAILED] in [It] - /var/vcap/bosh_ssh/bosh_c2ef69c73e50468/bpm-release/src/bpm/integration/resource_limits_test.go:133 @ 07/25/24 23:35:26.327
  BEGIN '/bpmtmp/resource-limits-test2024427667/sys/log/997a8a24-fe82-43e4-baff-0e7c1a4e25ea/997a8a24-fe82-43e4-baff-0e7c1a4e25ea.stderr.log'
  END   '/bpmtmp/resource-limits-test2024427667/sys/log/997a8a24-fe82-43e4-baff-0e7c1a4e25ea/997a8a24-fe82-43e4-baff-0e7c1a4e25ea.stderr.log'
  BEGIN '/bpmtmp/resource-limits-test2024427667/sys/log/997a8a24-fe82-43e4-baff-0e7c1a4e25ea/997a8a24-fe82-43e4-baff-0e7c1a4e25ea.stdout.log'
  END   '/bpmtmp/resource-limits-test2024427667/sys/log/997a8a24-fe82-43e4-baff-0e7c1a4e25ea/997a8a24-fe82-43e4-baff-0e7c1a4e25ea.stdout.log'
  << Timeline

  [FAILED] Timed out after 20.002s.
  Expected
      <chan integration_test.event | len:0, cap:0>: 0xc00008d440
  to receive something.
  In [It] at: /var/vcap/bosh_ssh/bosh_c2ef69c73e50468/bpm-release/src/bpm/integration/resource_limits_test.go:133 @ 07/25/24 23:35:26.327

I do note that in both cases, neither memory.swap.max nor memory.memsw.limit_in_bytes being checked in the new code is there:

test-bu-1/1022ff32-abba-44e9-8a64-bb6287bfdb06:/var/vcap/bosh_ssh/bosh_c2ef69c73e50468/bpm-release# ll /sys/fs/cgroup/memory*
-r--r--r-- 1 root root 0 Jul 25 23:41 /sys/fs/cgroup/memory.numa_stat
-rw-r--r-- 1 root root 0 Jul 25 23:41 /sys/fs/cgroup/memory.pressure
--w------- 1 root root 0 Jul 25 23:41 /sys/fs/cgroup/memory.reclaim
-r--r--r-- 1 root root 0 Jul 25 23:41 /sys/fs/cgroup/memory.stat
-rw-r--r-- 1 root root 0 Jul 25 23:41 /sys/fs/cgroup/memory.zswap.writeback

memory.memsw.limit_in_bytes IS present on the Jammy stemcell. So perhaps additional settings need to be enabled for Noble?

@beyhan
Copy link
Copy Markdown
Member

beyhan commented Jul 26, 2024

@ramonskie do you have any idea about the above finding?

@ramonskie
Copy link
Copy Markdown

I have not touched anything related to memory limits. So perhaps the defaults changed

@klakin-pivotal
Copy link
Copy Markdown
Contributor

I do note that in both cases, neither memory.swap.max nor memory.memsw.limit_in_bytes being checked in the new code is there:

memory.memsw.* are cgroup v1 control files, and won't be present when we're using only cgroup v2: (See the table here https://docs.kernel.org/admin-guide/cgroup-v1/memory.html#benefits-and-purpose-of-the-memory-controller)

memory.swap.* is documented to only exist in non-root cgroups, so I expect that if you were to descend most any subdirectory of /sys/fs/cgroup you would find those files. (The relevant section of the docs starts here: https://docs.kernel.org/admin-guide/cgroup-v2.html#memory)

@aramprice aramprice force-pushed the cgroup-v2-support branch 2 times, most recently from 53566d9 to df6e552 Compare July 26, 2024 22:30
@aramprice
Copy link
Copy Markdown
Member Author

aramprice commented Jul 26, 2024

The following results in passing tests on the latest GCP ubuntu-jammy, and ubuntu-noble VMs:

sudo su -
apt update && apt install --yes docker.io

git clone https://github.com/cloudfoundry/bpm-release.git
cd bpm-release
git checkout cgroup-v2-support

docker run --privileged --cgroupns host -v ${PWD}:/bpm -it cfbpm/bpm-ci:latest

./scripts/test-unit --keep-going

NOTE: the docker run command differs from scripts/start-docker in that it adds --cgroupns host

This seems to indicate that if the specs are failing on the noble stemcell that we will need to figure out what changes exist in cgroups between standard ubuntu-noble and the new stemcell, and then what changes should be made where to either accommodate the differences, or make the stemcell more like standard ubuntu-noble.

@aramprice aramprice force-pushed the cgroup-v2-support branch from df6e552 to cb059fa Compare July 27, 2024 00:03
Currently one test in the integration specs is failing. Unclear if this
is the fault of my docker setup or if this represents an actual issue
with how `runc` is being setup.

Failure:
```
------------------------------
• [FAILED] [0.167 seconds]
resource limits memory [It] gets OOMed when it exceeds its memory limit
/bpm/src/bpm/integration/resource_limits_test.go:116

  Timeline >>
  If this test fails, then make sure you have enabled swap accounting! Details are in the README.
  Error: failed to start job-process: exit status 1
  [FAILED] in [It] - /bpm/src/bpm/integration/resource_limits_test.go:122 @ 06/28/24 22:07:30.852
  BEGIN '/bpmtmp/resource-limits-test1115196611/sys/log/0cf7c73a-0b65-4f46-90a7-eb0bb809e6c2/0cf7c73a-0b65-4f46-90a7-eb0bb809e6c2.stderr.log'
  time="2024-06-28T22:07:30Z" level=warning msg="unable to get oom kill count" error="openat2 /sys/fs/cgroup/bpm-0cf7c73a-0b65-4f46-90a7-eb0bb809e6c2/memory.events: no such file or directory"
  time="2024-06-28T22:07:30Z" level=error msg="runc run failed: unable to start container process: unable to apply cgroup configuration: cannot enter cgroupv2 \"/sys/fs/cgroup/bpm-0cf7c73a-0b65-4f46-90a7-eb0bb809e6c2\" with domain controllers -- it is in an invalid state"
  END   '/bpmtmp/resource-limits-test1115196611/sys/log/0cf7c73a-0b65-4f46-90a7-eb0bb809e6c2/0cf7c73a-0b65-4f46-90a7-eb0bb809e6c2.stderr.log'
  BEGIN '/bpmtmp/resource-limits-test1115196611/sys/log/0cf7c73a-0b65-4f46-90a7-eb0bb809e6c2/0cf7c73a-0b65-4f46-90a7-eb0bb809e6c2.stdout.log'
  END   '/bpmtmp/resource-limits-test1115196611/sys/log/0cf7c73a-0b65-4f46-90a7-eb0bb809e6c2/0cf7c73a-0b65-4f46-90a7-eb0bb809e6c2.stdout.log'
  << Timeline

  [FAILED] Expected
      <int>: 1
  to match exit code:
      <int>: 0
  In [It] at: /bpm/src/bpm/integration/resource_limits_test.go:122 @ 06/28/24 22:07:30.852
------------------------------
••••••••••••••••••••••••••
------------------------------
• [FAILED] [0.364 seconds]
start when a broken runc configuration is left on the system [It] `bpm start` cleans up the broken-ness and starts it
/bpm/src/bpm/integration/start_test.go:329

  Timeline >>
  Error: failed to start job-process: exit status 1
  [FAILED] in [It] - /bpm/src/bpm/integration/start_test.go:337 @ 06/28/24 22:07:31.915
  BEGIN '/bpmtmp/start-test2475062763/sys/log/e599a26c-5d89-421d-a740-04dd490c314b/e599a26c-5d89-421d-a740-04dd490c314b.stdout.log'
  en_US.UTF-8
  Logging to STDOUT
  Received a TERM signal
  END   '/bpmtmp/start-test2475062763/sys/log/e599a26c-5d89-421d-a740-04dd490c314b/e599a26c-5d89-421d-a740-04dd490c314b.stdout.log'
  BEGIN '/bpmtmp/start-test2475062763/sys/log/e599a26c-5d89-421d-a740-04dd490c314b/e599a26c-5d89-421d-a740-04dd490c314b.stderr.log'
  Logging to STDERR
  [WARN  tini (1)] Reaped zombie process with pid=8
  time="2024-06-28T22:07:31Z" level=error msg="runc run failed: unable to get cgroup PIDs: read /sys/fs/cgroup/bpm-e599a26c-5d89-421d-a740-04dd490c314b/cgroup.procs: operation not supported"
  END   '/bpmtmp/start-test2475062763/sys/log/e599a26c-5d89-421d-a740-04dd490c314b/e599a26c-5d89-421d-a740-04dd490c314b.stderr.log'
  << Timeline

  [FAILED] Expected
      <int>: 1
  to match exit code:
      <int>: 0
  In [It] at: /bpm/src/bpm/integration/start_test.go:337 @ 06/28/24 22:07:31.915
------------------------------
•••••••••••••••••••••••••••••

Summarizing 2 Failures:
  [FAIL] resource limits memory [It] gets OOMed when it exceeds its memory limit
  /bpm/src/bpm/integration/resource_limits_test.go:122
  [FAIL] start when a broken runc configuration is left on the system [It] `bpm start` cleans up the broken-ness and starts it
  /bpm/src/bpm/integration/start_test.go:337

Ran 69 of 69 Specs in 27.622 seconds
FAIL! -- 67 Passed | 2 Failed | 0 Pending | 0 Skipped
```
@aramprice
Copy link
Copy Markdown
Member Author

@jpalermo I've verified that scripts/test-unit works fine on a Jammy stemcell.

Copy link
Copy Markdown

@jpalermo jpalermo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code seems non-breaking as far as Jammy is concerned and I verified a dev release deploys with CF Deployment just fine. Going to merge to try and move Noble progress along.

@jpalermo jpalermo merged commit 508af33 into master Aug 10, 2024
@jpalermo jpalermo deleted the cgroup-v2-support branch August 10, 2024 23:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Development

Successfully merging this pull request may close these issues.

7 participants