Conversation
98e12f7 to
1d1fcee
Compare
|
When testing these changes feel free to grab a stemcell from here:
Source: https://bosh.ci.cloudfoundry.org/teams/stemcell/pipelines/stemcells-ubuntu-noble/ |
|
Hey @ystros and @klakin-pivotal. Just bumping this up in case you forgot. |
ystros
left a comment
There was a problem hiding this comment.
Of note, to get the above errors, you either need to set a GOFLAGS=-buildvcs=false environment variable or run git config --global --add safe.directory /bpm in the docker container. Otherwise, you get different errors due to permission issues from running as root.
Running the tests on a Mac gets further if you specify the --cgroupns=host when running Docker. When doing this, you get a slightly different error (still about the "exceeds memory" test, but no cgroup warnings / errors):
• [FAILED] [0.326 seconds]
resource limits memory [It] gets OOMed when it exceeds its memory limit
/bpm/src/bpm/integration/resource_limits_test.go:116
Timeline >>
If this test fails, then make sure you have enabled swap accounting! Details are in the README.
[FAILED] in [It] - /bpm/src/bpm/integration/resource_limits_test.go:133 @ 07/25/24 22:44:09.029
BEGIN '/bpmtmp/resource-limits-test1271800040/sys/log/e4f2b6d1-2f8a-4160-ab0d-d465e8d7a351/e4f2b6d1-2f8a-4160-ab0d-d465e8d7a351.stderr.log'
END '/bpmtmp/resource-limits-test1271800040/sys/log/e4f2b6d1-2f8a-4160-ab0d-d465e8d7a351/e4f2b6d1-2f8a-4160-ab0d-d465e8d7a351.stderr.log'
BEGIN '/bpmtmp/resource-limits-test1271800040/sys/log/e4f2b6d1-2f8a-4160-ab0d-d465e8d7a351/e4f2b6d1-2f8a-4160-ab0d-d465e8d7a351.stdout.log'
END '/bpmtmp/resource-limits-test1271800040/sys/log/e4f2b6d1-2f8a-4160-ab0d-d465e8d7a351/e4f2b6d1-2f8a-4160-ab0d-d465e8d7a351.stdout.log'
<< Timeline
[FAILED] No future change is possible. Bailing out early after 0.171s.
Expected
<chan integration_test.event | len:0, cap:0>: 0xc00013ede0
to receive something. The channel is closed.
In [It] at: /bpm/src/bpm/integration/resource_limits_test.go:133 @ 07/25/24 22:44:09.029
------------------------------
It also fails running on a Noble-deployed VM (timing out there instead of prematurely closing):
resource limits memory [It] gets OOMed when it exceeds its memory limit
/var/vcap/bosh_ssh/bosh_c2ef69c73e50468/bpm-release/src/bpm/integration/resource_limits_test.go:116
Timeline >>
If this test fails, then make sure you have enabled swap accounting! Details are in the README.
[FAILED] in [It] - /var/vcap/bosh_ssh/bosh_c2ef69c73e50468/bpm-release/src/bpm/integration/resource_limits_test.go:133 @ 07/25/24 23:35:26.327
BEGIN '/bpmtmp/resource-limits-test2024427667/sys/log/997a8a24-fe82-43e4-baff-0e7c1a4e25ea/997a8a24-fe82-43e4-baff-0e7c1a4e25ea.stderr.log'
END '/bpmtmp/resource-limits-test2024427667/sys/log/997a8a24-fe82-43e4-baff-0e7c1a4e25ea/997a8a24-fe82-43e4-baff-0e7c1a4e25ea.stderr.log'
BEGIN '/bpmtmp/resource-limits-test2024427667/sys/log/997a8a24-fe82-43e4-baff-0e7c1a4e25ea/997a8a24-fe82-43e4-baff-0e7c1a4e25ea.stdout.log'
END '/bpmtmp/resource-limits-test2024427667/sys/log/997a8a24-fe82-43e4-baff-0e7c1a4e25ea/997a8a24-fe82-43e4-baff-0e7c1a4e25ea.stdout.log'
<< Timeline
[FAILED] Timed out after 20.002s.
Expected
<chan integration_test.event | len:0, cap:0>: 0xc00008d440
to receive something.
In [It] at: /var/vcap/bosh_ssh/bosh_c2ef69c73e50468/bpm-release/src/bpm/integration/resource_limits_test.go:133 @ 07/25/24 23:35:26.327
I do note that in both cases, neither memory.swap.max nor memory.memsw.limit_in_bytes being checked in the new code is there:
test-bu-1/1022ff32-abba-44e9-8a64-bb6287bfdb06:/var/vcap/bosh_ssh/bosh_c2ef69c73e50468/bpm-release# ll /sys/fs/cgroup/memory*
-r--r--r-- 1 root root 0 Jul 25 23:41 /sys/fs/cgroup/memory.numa_stat
-rw-r--r-- 1 root root 0 Jul 25 23:41 /sys/fs/cgroup/memory.pressure
--w------- 1 root root 0 Jul 25 23:41 /sys/fs/cgroup/memory.reclaim
-r--r--r-- 1 root root 0 Jul 25 23:41 /sys/fs/cgroup/memory.stat
-rw-r--r-- 1 root root 0 Jul 25 23:41 /sys/fs/cgroup/memory.zswap.writeback
memory.memsw.limit_in_bytes IS present on the Jammy stemcell. So perhaps additional settings need to be enabled for Noble?
|
@ramonskie do you have any idea about the above finding? |
|
I have not touched anything related to memory limits. So perhaps the defaults changed |
|
53566d9 to
df6e552
Compare
|
The following results in passing tests on the latest GCP NOTE: the This seems to indicate that if the specs are failing on the noble stemcell that we will need to figure out what changes exist in cgroups between standard |
df6e552 to
cb059fa
Compare
Currently one test in the integration specs is failing. Unclear if this
is the fault of my docker setup or if this represents an actual issue
with how `runc` is being setup.
Failure:
```
------------------------------
• [FAILED] [0.167 seconds]
resource limits memory [It] gets OOMed when it exceeds its memory limit
/bpm/src/bpm/integration/resource_limits_test.go:116
Timeline >>
If this test fails, then make sure you have enabled swap accounting! Details are in the README.
Error: failed to start job-process: exit status 1
[FAILED] in [It] - /bpm/src/bpm/integration/resource_limits_test.go:122 @ 06/28/24 22:07:30.852
BEGIN '/bpmtmp/resource-limits-test1115196611/sys/log/0cf7c73a-0b65-4f46-90a7-eb0bb809e6c2/0cf7c73a-0b65-4f46-90a7-eb0bb809e6c2.stderr.log'
time="2024-06-28T22:07:30Z" level=warning msg="unable to get oom kill count" error="openat2 /sys/fs/cgroup/bpm-0cf7c73a-0b65-4f46-90a7-eb0bb809e6c2/memory.events: no such file or directory"
time="2024-06-28T22:07:30Z" level=error msg="runc run failed: unable to start container process: unable to apply cgroup configuration: cannot enter cgroupv2 \"/sys/fs/cgroup/bpm-0cf7c73a-0b65-4f46-90a7-eb0bb809e6c2\" with domain controllers -- it is in an invalid state"
END '/bpmtmp/resource-limits-test1115196611/sys/log/0cf7c73a-0b65-4f46-90a7-eb0bb809e6c2/0cf7c73a-0b65-4f46-90a7-eb0bb809e6c2.stderr.log'
BEGIN '/bpmtmp/resource-limits-test1115196611/sys/log/0cf7c73a-0b65-4f46-90a7-eb0bb809e6c2/0cf7c73a-0b65-4f46-90a7-eb0bb809e6c2.stdout.log'
END '/bpmtmp/resource-limits-test1115196611/sys/log/0cf7c73a-0b65-4f46-90a7-eb0bb809e6c2/0cf7c73a-0b65-4f46-90a7-eb0bb809e6c2.stdout.log'
<< Timeline
[FAILED] Expected
<int>: 1
to match exit code:
<int>: 0
In [It] at: /bpm/src/bpm/integration/resource_limits_test.go:122 @ 06/28/24 22:07:30.852
------------------------------
••••••••••••••••••••••••••
------------------------------
• [FAILED] [0.364 seconds]
start when a broken runc configuration is left on the system [It] `bpm start` cleans up the broken-ness and starts it
/bpm/src/bpm/integration/start_test.go:329
Timeline >>
Error: failed to start job-process: exit status 1
[FAILED] in [It] - /bpm/src/bpm/integration/start_test.go:337 @ 06/28/24 22:07:31.915
BEGIN '/bpmtmp/start-test2475062763/sys/log/e599a26c-5d89-421d-a740-04dd490c314b/e599a26c-5d89-421d-a740-04dd490c314b.stdout.log'
en_US.UTF-8
Logging to STDOUT
Received a TERM signal
END '/bpmtmp/start-test2475062763/sys/log/e599a26c-5d89-421d-a740-04dd490c314b/e599a26c-5d89-421d-a740-04dd490c314b.stdout.log'
BEGIN '/bpmtmp/start-test2475062763/sys/log/e599a26c-5d89-421d-a740-04dd490c314b/e599a26c-5d89-421d-a740-04dd490c314b.stderr.log'
Logging to STDERR
[WARN tini (1)] Reaped zombie process with pid=8
time="2024-06-28T22:07:31Z" level=error msg="runc run failed: unable to get cgroup PIDs: read /sys/fs/cgroup/bpm-e599a26c-5d89-421d-a740-04dd490c314b/cgroup.procs: operation not supported"
END '/bpmtmp/start-test2475062763/sys/log/e599a26c-5d89-421d-a740-04dd490c314b/e599a26c-5d89-421d-a740-04dd490c314b.stderr.log'
<< Timeline
[FAILED] Expected
<int>: 1
to match exit code:
<int>: 0
In [It] at: /bpm/src/bpm/integration/start_test.go:337 @ 06/28/24 22:07:31.915
------------------------------
•••••••••••••••••••••••••••••
Summarizing 2 Failures:
[FAIL] resource limits memory [It] gets OOMed when it exceeds its memory limit
/bpm/src/bpm/integration/resource_limits_test.go:122
[FAIL] start when a broken runc configuration is left on the system [It] `bpm start` cleans up the broken-ness and starts it
/bpm/src/bpm/integration/start_test.go:337
Ran 69 of 69 Specs in 27.622 seconds
FAIL! -- 67 Passed | 2 Failed | 0 Pending | 0 Skipped
```
cb059fa to
8e66f70
Compare
|
@jpalermo I've verified that |
jpalermo
left a comment
There was a problem hiding this comment.
The code seems non-breaking as far as Jammy is concerned and I verified a dev release deploys with CF Deployment just fine. Going to merge to try and move Noble progress along.
These changes do not appear to impact the behavior of
bpmwhen running on anubuntu-jammybased stemcell (cgroups v1). It should be safe to merge this as the behavior of the code handling cgroup-v1 has not changed.Previous context left for posterity:
Currently one test in the integration specs is failing. Unclear if this is the fault of my docker setup or if this represents an actual issue with how
runcis being setup.Tests can be run as follows:
Example of the failure I'm seeing when running these tests from within the container created using
./scripts/start-docker