Reduce lock contention in manager package by dgrisonnet · Pull Request #3756 · google/cadvisor

dgrisonnet · 2025-12-03T16:14:37Z

Summary

This PR reduces lock contention in the cAdvisor manager package by replacing the global containersLock RWMutex with sync.Map, resulting in 60% faster operations and 318x less lock contention in concurrent workloads.

Changes

Use `atomic.Int64` for container timestamps

Replace mutex-protected time.Time fields with atomic.Int64 for infoLastUpdatedTime and statsLastUpdatedTime to eliminate lock acquisition on timestamp updates.

Replace `containersLock` with `sync.Map`

Convert containers from map[namespacedContainerName]*containerData + sync.RWMutex to sync.Map for lock-free reads.

Benchmark Results

Raw Benchmark Output

goos: linux
goarch: amd64
pkg: github.com/google/cadvisor/manager
cpu: Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz
BenchmarkSyncMapConcurrentReads          3788887               308.0 ns/op           21 B/op           1 allocs/op
BenchmarkSyncMapConcurrentReads-4        9459172               106.3 ns/op           21 B/op           1 allocs/op
BenchmarkSyncMapConcurrentReads-8       11848674                96.62 ns/op          21 B/op           1 allocs/op
BenchmarkRWMutexMapConcurrentReads       4317105               255.9 ns/op           21 B/op           1 allocs/op
BenchmarkRWMutexMapConcurrentReads-4    12015366               114.6 ns/op           21 B/op           1 allocs/op
BenchmarkRWMutexMapConcurrentReads-8    12923415                99.69 ns/op          21 B/op           1 allocs/op
BenchmarkSyncMapIteration                  78121             18491 ns/op              0 B/op           0 allocs/op
BenchmarkSyncMapIteration-4                59899             19836 ns/op              0 B/op           0 allocs/op
BenchmarkSyncMapIteration-8                70450             18755 ns/op              0 B/op           0 allocs/op
BenchmarkRWMutexMapIteration               80077             14979 ns/op              0 B/op           0 allocs/op
BenchmarkRWMutexMapIteration-4             71200             15488 ns/op              0 B/op           0 allocs/op
BenchmarkRWMutexMapIteration-8             71007             15382 ns/op              0 B/op           0 allocs/op
BenchmarkSyncMapMixedReadWrite           3245372               364.6 ns/op           29 B/op           1 allocs/op
BenchmarkSyncMapMixedReadWrite-4         9687744               128.6 ns/op           28 B/op           1 allocs/op
BenchmarkSyncMapMixedReadWrite-8        11052004               106.3 ns/op           28 B/op           1 allocs/op
BenchmarkRWMutexMapMixedReadWrite        3675916               292.5 ns/op           29 B/op           1 allocs/op
BenchmarkRWMutexMapMixedReadWrite-4      5093344               241.3 ns/op           28 B/op           1 allocs/op
BenchmarkRWMutexMapMixedReadWrite-8      4681834               254.0 ns/op           27 B/op           1 allocs/op
PASS
ok      github.com/google/cadvisor/manager      26.053s

Benchstat Comparison (8 CPUs, n=10)

                     │ RWMutex (old) │         sync.Map (new)          │
                     │    sec/op     │   sec/op     vs base            │
MapConcurrentReads-8       112.3n ± 4%   105.3n ± 11%   -6.15% (p=0.034)
MapIteration-8             16.81µ ± 9%   21.16µ ± 18%  +25.86% (p=0.000)
MapMixedReadWrite-8        279.4n ± 5%   112.2n ±  7%  -59.85% (p=0.000)
geomean                    807.9n        630.1n        -22.02%

Key results:

Concurrent reads: 6% faster
Iteration: 26% slower (but doesn't block writers)
Mixed read/write: 60% faster

Scaling Behavior

CPUs	sync.Map	RWMutex	sync.Map Advantage
1	365 ns	293 ns	RWMutex 1.2x faster
4	129 ns	241 ns	sync.Map 1.9x faster
8	106 ns	254 ns	sync.Map 2.4x faster

sync.Map scales better with CPU count. RWMutex gets slower from 4→8 CPUs due to increased contention.

Mutex Contention Profiling

RWMutex Pattern

$ go tool pprof -top mutex_rwmutex.out
Type: delay
Showing nodes accounting for 6.70s, 100% of 6.70s total
      flat  flat%   sum%        cum   cum%
     5.53s 82.49% 82.49%      6.27s 93.58%  sync.(*RWMutex).Unlock
     0.72s 10.81% 93.30%      0.72s 10.81%  sync.(*Mutex).Unlock
     0.30s  4.44% 97.74%      0.32s  4.75%  sync.(*RWMutex).RUnlock
     0.15s  2.26%   100%      0.15s  2.26%  runtime.unlock

Total contention: 6.70 seconds (writers blocked by readers)

sync.Map Pattern

$ go tool pprof -top mutex_syncmap.out
Type: delay
Showing nodes accounting for 21.12ms, 100% of 21.12ms total
      flat  flat%   sum%        cum   cum%
   18.43ms 87.26% 87.26%    18.43ms 87.26%  runtime.unlock
    1.45ms  6.87% 94.13%     1.45ms  6.87%  runtime._LostContendedRuntimeLock
    1.24ms  5.87%   100%     1.33ms  6.29%  sync.(*HashTrieMap).Swap

Total contention: 21 milliseconds (runtime overhead only, no lock contention)

Result: Lock contention reduced by 318x (6.70s → 21ms).

How to Reproduce

# Run benchmarks
go test -bench=. -benchmem -cpu=1,4,8 ./manager/ -run=^$

# Run with mutex profiling
go test -bench=BenchmarkRWMutexMapMixedReadWrite -cpu=8 \
  -mutexprofile=mutex.out -mutexprofilefraction=1 ./manager/ -run=^$
go tool pprof -top mutex.out

Replace mutex-protected infoLastUpdatedTime and statsLastUpdatedTime with atomic.Int64 to eliminate lock acquisition on timestamp updates. Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>

Move container filtering logic outside the lock to minimize containersLock hold time during iteration. Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>

Convert containers from map + RWMutex to sync.Map for lock-free reads. Eliminates contention between Prometheus scrapes and container lifecycle events. Benchmarks show 2.7x improvement in mixed read/write workloads. Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>

Add benchmarks comparing concurrent read, iteration, and mixed read/write performance between sync.Map and RWMutex patterns. Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>

google-cla · 2025-12-03T16:14:42Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

dims · 2025-12-03T16:30:39Z

@dgrisonnet please sign the CLA

dgrisonnet · 2025-12-03T16:45:07Z

it should be good now

manager/container.go

Add atomicTime (wrapping atomic.Int64) and containerMap (wrapping sync.Map) to provide type-safe APIs. This removes boilerplate type assertions at call sites and improves code readability. Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>

haircommander · 2025-12-04T17:18:33Z

LGTM, this looks great thanks @dgrisonnet

dgrisonnet added 4 commits December 3, 2025 16:11

manager: use atomic.Int64 for container timestamps

c60c5de

Replace mutex-protected infoLastUpdatedTime and statsLastUpdatedTime with atomic.Int64 to eliminate lock acquisition on timestamp updates. Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>

manager: reduce lock scope in getSubcontainers

ee458ba

Move container filtering logic outside the lock to minimize containersLock hold time during iteration. Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>

manager: add contention benchmarks for sync.Map vs RWMutex

357c216

Add benchmarks comparing concurrent read, iteration, and mixed read/write performance between sync.Map and RWMutex patterns. Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>

dgrisonnet force-pushed the manager-reduce-lock-contention branch 2 times, most recently from 78e73da to 357c216 Compare December 3, 2025 16:26

haircommander reviewed Dec 3, 2025

View reviewed changes

manager/container.go Outdated Show resolved Hide resolved

dims merged commit fd4eb37 into google:master Dec 4, 2025
7 checks passed

This was referenced Dec 4, 2025

Reduce lock contention in cache/memory package #3762

Merged

OCPBUGS-57665: Pick upstream lock contention improvements openshift/google-cadvisor#32

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce lock contention in manager package#3756

Reduce lock contention in manager package#3756
dims merged 5 commits intogoogle:masterfrom
dgrisonnet:manager-reduce-lock-contention

dgrisonnet commented Dec 3, 2025

Uh oh!

google-cla bot commented Dec 3, 2025

Uh oh!

dims commented Dec 3, 2025

Uh oh!

dgrisonnet commented Dec 3, 2025

Uh oh!

Uh oh!

haircommander commented Dec 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

dgrisonnet commented Dec 3, 2025

Summary

Changes

Use atomic.Int64 for container timestamps

Replace containersLock with sync.Map

Benchmark Results

Raw Benchmark Output

Benchstat Comparison (8 CPUs, n=10)

Scaling Behavior

Mutex Contention Profiling

RWMutex Pattern

sync.Map Pattern

How to Reproduce

Uh oh!

google-cla bot commented Dec 3, 2025

Uh oh!

dims commented Dec 3, 2025

Uh oh!

dgrisonnet commented Dec 3, 2025

Uh oh!

Uh oh!

haircommander commented Dec 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Use `atomic.Int64` for container timestamps

Replace `containersLock` with `sync.Map`