feat: replace cgroups-ebpf SHM transport with netipc IPC#22221
feat: replace cgroups-ebpf SHM transport with netipc IPC#22221ktsaou wants to merge 8 commits intonetdata:masterfrom
Conversation
|
@thiagoftsm review this please. |
There was a problem hiding this comment.
1 issue found across 106 files
Confidence score: 4/5
- This PR is likely safe to merge, with a focused test-quality risk rather than a direct runtime breakage.
- In
src/crates/netipc/src/transport/shm_tests.rs, usingpath.exists()can miss dangling symlinks, so cleanup regressions in shared-memory transport tests may go undetected. - Because the issue is severity 5/10 and confined to test assertions, the main impact is reduced regression detection confidence rather than immediate user-facing failure.
- Pay close attention to
src/crates/netipc/src/transport/shm_tests.rs- update the assertion to check symlink metadata so dangling-link cleanup failures are caught.
Note: This PR contains a large number of files. cubic only reviews up to 75 files per PR, so some files may not have been reviewed. cubic prioritises the most important files to review.
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="src/crates/netipc/src/transport/shm_tests.rs">
<violation number="1" location="src/crates/netipc/src/transport/shm_tests.rs:1334">
P2: This assertion is ineffective for dangling symlinks and can let cleanup regressions pass undetected. Check symlink metadata instead of `path.exists()`.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
On it. |
thiagoftsm
left a comment
There was a problem hiding this comment.
PR is working as expected with cgroup version 1 and 2. LGTM!
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR rebases and re-vendors the upstream netipc IPC library across C/Rust/Go, then updates Netdata’s cgroups/eBPF integration to use the new handshake/transport contract (session-locked transport, reconnect-on-SHM-attach-failure), adding transport observability logs.
Changes:
- Vendored and wired
netipcimplementations (Rust crate here) including POSIX UDS transport and Linux SHM transport + extensive tests. - Replaced legacy cgroups↔eBPF shared-memory integration with a typed
cgroups-snapshotnetipcserver (incgroups.plugin) and an eBPF-sidenetipccache client, plus transport state logging. - Updated build system wiring (CMake + Cargo workspace) to build/link the new C
netipcruntime and the Rust crate.
Reviewed changes
Copilot reviewed 37 out of 106 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| src/crates/netipc/src/transport/shm_tests.rs | Adds Linux SHM transport conformance and chaos tests. |
| src/crates/netipc/src/transport/shm.rs | Implements Linux SHM transport (futex) and stale cleanup logic. |
| src/crates/netipc/src/transport/posix.rs | Implements POSIX UDS SEQPACKET transport with handshake and chunking. |
| src/crates/netipc/src/transport/mod.rs | Exposes transport backends via platform cfg flags. |
| src/crates/netipc/src/service/raw_windows_tests.rs | Adds Windows raw service tests for handshake/session behavior. |
| src/crates/netipc/src/service/mod.rs | Defines L2 service module exports. |
| src/crates/netipc/src/service/cgroups_windows_tests.rs | Adds typed cgroups service + cache tests on Windows. |
| src/crates/netipc/src/service/cgroups_unix_tests.rs | Adds typed cgroups service + cache tests on Unix/Linux SHM profile. |
| src/crates/netipc/src/service/cgroups.rs | Adds typed cgroups client/server/cache facade over raw transport. |
| src/crates/netipc/src/protocol/string_reverse.rs | Adds STRING_REVERSE codec + dispatch and tests. |
| src/crates/netipc/src/protocol/increment.rs | Adds INCREMENT codec + dispatch and tests. |
| src/crates/netipc/src/protocol/cgroups.rs | Adds cgroups snapshot codec/builder/dispatch and tests. |
| src/crates/netipc/src/lib.rs | Exposes protocol/service/transport modules in Rust crate. |
| src/crates/netipc/Cargo.toml | Introduces netipc Rust crate manifest. |
| src/crates/Cargo.toml | Adds netipc crate to the Rust workspace members. |
| src/collectors/ebpf.plugin/ebpf_vfs.c | Switches cgroup gating/systemd flag checks to netipc-derived globals. |
| src/collectors/ebpf.plugin/ebpf_swap.c | Switches cgroup gating/systemd flag checks to netipc-derived globals. |
| src/collectors/ebpf.plugin/ebpf_socket.c | Switches cgroup gating/systemd flag checks to netipc-derived globals. |
| src/collectors/ebpf.plugin/ebpf_shm.c | Switches cgroup gating/systemd flag checks to netipc-derived globals. |
| src/collectors/ebpf.plugin/ebpf_process.c | Switches cgroup gating/systemd flag checks to netipc-derived globals. |
| src/collectors/ebpf.plugin/ebpf_oomkill.c | Switches cgroup gating/systemd flag checks to netipc-derived globals. |
| src/collectors/ebpf.plugin/ebpf_fd.c | Switches cgroup gating/systemd flag checks to netipc-derived globals. |
| src/collectors/ebpf.plugin/ebpf_dcstat.c | Switches cgroup gating/systemd flag checks to netipc-derived globals. |
| src/collectors/ebpf.plugin/ebpf_cachestat.c | Switches cgroup gating/systemd flag checks to netipc-derived globals. |
| src/collectors/ebpf.plugin/ebpf_cgroup.h | Removes legacy SHM API declarations; adds netipc cache cleanup API. |
| src/collectors/ebpf.plugin/ebpf_cgroup.c | Implements eBPF-side netipc cache client + transport state logging. |
| src/collectors/ebpf.plugin/ebpf.h | Replaces legacy SHM globals with netipc-derived integration flags. |
| src/collectors/ebpf.plugin/ebpf.c | Defines netipc-derived global integration flags and cleanup hook. |
| src/collectors/cgroups.plugin/sys_fs_cgroup.h | Removes legacy SHM structs in favor of netipc metadata sharing. |
| src/collectors/cgroups.plugin/cgroup-netipc.h | Adds cgroups.plugin netipc server init/cleanup API. |
| src/collectors/cgroups.plugin/cgroup-netipc.c | Implements typed cgroups-snapshot netipc server for Linux. |
| src/collectors/cgroups.plugin/cgroup-discovery.c | Starts/stops the netipc server instead of legacy SHM sharing. |
| CMakeLists.txt | Builds/links standalone C netipc library and Netdata shim wrapper. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
thiagoftsm
left a comment
There was a problem hiding this comment.
After address all requests, PR is working in all cgroup versions. LGTM!
Supersedes #22143.
This rebases the
plugin-ipcintegration on top of currentmaster, re-vendors the latestnetipcsource of truth, and keeps the Netdata-side cgroups/eBPF integration on the current upstream handshake and fallback contract.What changed
netipcinto:src/libnetdata/netipcsrc/go/pkg/netipcsrc/crates/netipcWhy
The old PR was based on an older
masterand an older vendoredplugin-ipctree. Since then, the upstreamnetipccontract changed materially:This PR updates the Netdata integration to match that contract instead of carrying stale fallback semantics.
Validation
In-tree validation:
cd src/go && go test ./pkg/netipc/...cd src/crates && cargo test -p netipc --no-run./netdata-installer.sh --enable-plugin-go --enable-plugin-nfacct --enable-plugin-systemd-journal --enable-plugin-otel --enable-plugin-otel-signal-viewer --enable-plugin-ibm --enable-lto --enable-ml --use-system-protobuf --zlib-is-really-here --dont-waitLive runtime validation on the installed daemon from this branch:
v2.10.0-46-gfd027a98fCGROUP: netipc server started on '/run/netdata/cgroups-snapshot.sock'EBPF CGROUP: netipc transport state=ready session_valid=1 session=0000000000000001 selected_profile=shm-futex data_plane=shm generation=1 items=0 enabled=0EBPF CGROUP: netipc snapshot generation=2 items=265 enabled=58 imported_targets=58 total_pids=115 send_cgroup_chart=1 integration_active=1 systemd_enabled=1 refresh_failures=0cgroup_uptime-kuma.cpusystemd_docker.cpusystemd_libvirtd.cpusystemd_systemd-journald.cpuScope note
The live end-to-end runtime proof here is the C integration path used by
cgroups.pluginandebpf.plugin.The Go and Rust
netipctrees are also vendored into Netdata and validated in-tree by build/test, so the library is available consistently in all three language trees.Summary by cubic
Replaces the cgroups/eBPF SHM path with
netipcIPC, adds a Linux-onlycgroups-snapshotserver, and moves the eBPF client to a managed cache with atomic gating for more reliable cgroups charts. Vendors the latestnetipcacross C, Rust, and Go, aligns with the upstream handshake, and adds profile/state/data-plane logs.New Features
cgroups-snapshotovernetipc(UDS/SHM); Linux server via newcgroup-netipc.*. eBPF consumes snapshots through a managed cache with atomic flags and transport logging.netipcin C (src/libnetdata/netipc), Rust crate (src/crates/netipc), and Go (src/go/pkg/netipc); workspace wiring and tests included. Addednetipc_netdataC helpers and build hooks for Linux/Windows.Migration
/run/netdata/cgroups-snapshot.sock; no user-facing config changes expected.Written for commit d97e8fa. Summary will update on new commits.