Skip to content

feat: replace cgroups-ebpf SHM transport with netipc IPC#22221

Open
ktsaou wants to merge 8 commits intonetdata:masterfrom
ktsaou:plugin-ipc-integration-v2
Open

feat: replace cgroups-ebpf SHM transport with netipc IPC#22221
ktsaou wants to merge 8 commits intonetdata:masterfrom
ktsaou:plugin-ipc-integration-v2

Conversation

@ktsaou
Copy link
Copy Markdown
Member

@ktsaou ktsaou commented Apr 17, 2026

Supersedes #22143.

This rebases the plugin-ipc integration on top of current master, re-vendors the latest netipc source of truth, and keeps the Netdata-side cgroups/eBPF integration on the current upstream handshake and fallback contract.

What changed

  • vendor current netipc into:
    • src/libnetdata/netipc
    • src/go/pkg/netipc
    • src/crates/netipc
  • wire the C runtime into Netdata for the cgroups snapshot server/client path
  • keep the Netdata-specific cgroups/eBPF integration changes on top of the new vendored tree
  • add transport observability so the eBPF client logs the negotiated profile and actual data plane in use

Why

The old PR was based on an older master and an older vendored plugin-ipc tree. Since then, the upstream netipc contract changed materially:

  • handshake negotiation is field-specific but server-decided
  • a successful handshake locks the negotiated transport for that session
  • post-handshake same-session SHM-to-baseline fallback is not allowed
  • client-side SHM attach failure recovers by reconnecting with a new handshake

This PR updates the Netdata integration to match that contract instead of carrying stale fallback semantics.

Validation

In-tree validation:

  • cd src/go && go test ./pkg/netipc/...
  • cd src/crates && cargo test -p netipc --no-run
  • full build/install from this branch with ./netdata-installer.sh --enable-plugin-go --enable-plugin-nfacct --enable-plugin-systemd-journal --enable-plugin-otel --enable-plugin-otel-signal-viewer --enable-plugin-ibm --enable-lto --enable-ml --use-system-protobuf --zlib-is-really-here --dont-wait

Live runtime validation on the installed daemon from this branch:

  • installed version: v2.10.0-46-gfd027a98f
  • cgroups server started:
    • CGROUP: netipc server started on '/run/netdata/cgroups-snapshot.sock'
  • eBPF client negotiated and actually used SHM:
    • EBPF CGROUP: netipc transport state=ready session_valid=1 session=0000000000000001 selected_profile=shm-futex data_plane=shm generation=1 items=0 enabled=0
  • snapshot import succeeded after connect:
    • EBPF CGROUP: netipc snapshot generation=2 items=265 enabled=58 imported_targets=58 total_pids=115 send_cgroup_chart=1 integration_active=1 systemd_enabled=1 refresh_failures=0
  • live charts were present and updating, including:
    • cgroup_uptime-kuma.cpu
    • systemd_docker.cpu
    • systemd_libvirtd.cpu
    • systemd_systemd-journald.cpu

Scope note

The live end-to-end runtime proof here is the C integration path used by cgroups.plugin and ebpf.plugin.

The Go and Rust netipc trees are also vendored into Netdata and validated in-tree by build/test, so the library is available consistently in all three language trees.


Summary by cubic

Replaces the cgroups/eBPF SHM path with netipc IPC, adds a Linux-only cgroups-snapshot server, and moves the eBPF client to a managed cache with atomic gating for more reliable cgroups charts. Vendors the latest netipc across C, Rust, and Go, aligns with the upstream handshake, and adds profile/state/data-plane logs.

  • New Features

    • cgroups-snapshot over netipc (UDS/SHM); Linux server via new cgroup-netipc.*. eBPF consumes snapshots through a managed cache with atomic flags and transport logging.
    • Vendored netipc in C (src/libnetdata/netipc), Rust crate (src/crates/netipc), and Go (src/go/pkg/netipc); workspace wiring and tests included. Added netipc_netdata C helpers and build hooks for Linux/Windows.
  • Migration

    • Handshake is server-decided and locks the transport per session; no same-session SHM→baseline fallback.
    • On SHM attach failure the client reconnects to re-handshake.
    • Removed legacy cgroups SHM structs/paths and semaphore usage; eBPF now gates charts via atomics with lightweight getters; discovery no longer shares/locks SHM.
    • Runtime socket: /run/netdata/cgroups-snapshot.sock; no user-facing config changes expected.

Written for commit d97e8fa. Summary will update on new commits.

@ktsaou
Copy link
Copy Markdown
Member Author

ktsaou commented Apr 17, 2026

@thiagoftsm review this please.

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 106 files

Confidence score: 4/5

  • This PR is likely safe to merge, with a focused test-quality risk rather than a direct runtime breakage.
  • In src/crates/netipc/src/transport/shm_tests.rs, using path.exists() can miss dangling symlinks, so cleanup regressions in shared-memory transport tests may go undetected.
  • Because the issue is severity 5/10 and confined to test assertions, the main impact is reduced regression detection confidence rather than immediate user-facing failure.
  • Pay close attention to src/crates/netipc/src/transport/shm_tests.rs - update the assertion to check symlink metadata so dangling-link cleanup failures are caught.

Note: This PR contains a large number of files. cubic only reviews up to 75 files per PR, so some files may not have been reviewed. cubic prioritises the most important files to review.

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="src/crates/netipc/src/transport/shm_tests.rs">

<violation number="1" location="src/crates/netipc/src/transport/shm_tests.rs:1334">
P2: This assertion is ineffective for dangling symlinks and can let cleanup regressions pass undetected. Check symlink metadata instead of `path.exists()`.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Comment thread src/crates/netipc/src/transport/shm_tests.rs Outdated
@thiagoftsm
Copy link
Copy Markdown
Contributor

@thiagoftsm review this please.

On it.

thiagoftsm
thiagoftsm previously approved these changes Apr 17, 2026
Copy link
Copy Markdown
Contributor

@thiagoftsm thiagoftsm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR is working as expected with cgroup version 1 and 2. LGTM!

@ktsaou ktsaou requested a review from Copilot April 17, 2026 14:47
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR rebases and re-vendors the upstream netipc IPC library across C/Rust/Go, then updates Netdata’s cgroups/eBPF integration to use the new handshake/transport contract (session-locked transport, reconnect-on-SHM-attach-failure), adding transport observability logs.

Changes:

  • Vendored and wired netipc implementations (Rust crate here) including POSIX UDS transport and Linux SHM transport + extensive tests.
  • Replaced legacy cgroups↔eBPF shared-memory integration with a typed cgroups-snapshot netipc server (in cgroups.plugin) and an eBPF-side netipc cache client, plus transport state logging.
  • Updated build system wiring (CMake + Cargo workspace) to build/link the new C netipc runtime and the Rust crate.

Reviewed changes

Copilot reviewed 37 out of 106 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/crates/netipc/src/transport/shm_tests.rs Adds Linux SHM transport conformance and chaos tests.
src/crates/netipc/src/transport/shm.rs Implements Linux SHM transport (futex) and stale cleanup logic.
src/crates/netipc/src/transport/posix.rs Implements POSIX UDS SEQPACKET transport with handshake and chunking.
src/crates/netipc/src/transport/mod.rs Exposes transport backends via platform cfg flags.
src/crates/netipc/src/service/raw_windows_tests.rs Adds Windows raw service tests for handshake/session behavior.
src/crates/netipc/src/service/mod.rs Defines L2 service module exports.
src/crates/netipc/src/service/cgroups_windows_tests.rs Adds typed cgroups service + cache tests on Windows.
src/crates/netipc/src/service/cgroups_unix_tests.rs Adds typed cgroups service + cache tests on Unix/Linux SHM profile.
src/crates/netipc/src/service/cgroups.rs Adds typed cgroups client/server/cache facade over raw transport.
src/crates/netipc/src/protocol/string_reverse.rs Adds STRING_REVERSE codec + dispatch and tests.
src/crates/netipc/src/protocol/increment.rs Adds INCREMENT codec + dispatch and tests.
src/crates/netipc/src/protocol/cgroups.rs Adds cgroups snapshot codec/builder/dispatch and tests.
src/crates/netipc/src/lib.rs Exposes protocol/service/transport modules in Rust crate.
src/crates/netipc/Cargo.toml Introduces netipc Rust crate manifest.
src/crates/Cargo.toml Adds netipc crate to the Rust workspace members.
src/collectors/ebpf.plugin/ebpf_vfs.c Switches cgroup gating/systemd flag checks to netipc-derived globals.
src/collectors/ebpf.plugin/ebpf_swap.c Switches cgroup gating/systemd flag checks to netipc-derived globals.
src/collectors/ebpf.plugin/ebpf_socket.c Switches cgroup gating/systemd flag checks to netipc-derived globals.
src/collectors/ebpf.plugin/ebpf_shm.c Switches cgroup gating/systemd flag checks to netipc-derived globals.
src/collectors/ebpf.plugin/ebpf_process.c Switches cgroup gating/systemd flag checks to netipc-derived globals.
src/collectors/ebpf.plugin/ebpf_oomkill.c Switches cgroup gating/systemd flag checks to netipc-derived globals.
src/collectors/ebpf.plugin/ebpf_fd.c Switches cgroup gating/systemd flag checks to netipc-derived globals.
src/collectors/ebpf.plugin/ebpf_dcstat.c Switches cgroup gating/systemd flag checks to netipc-derived globals.
src/collectors/ebpf.plugin/ebpf_cachestat.c Switches cgroup gating/systemd flag checks to netipc-derived globals.
src/collectors/ebpf.plugin/ebpf_cgroup.h Removes legacy SHM API declarations; adds netipc cache cleanup API.
src/collectors/ebpf.plugin/ebpf_cgroup.c Implements eBPF-side netipc cache client + transport state logging.
src/collectors/ebpf.plugin/ebpf.h Replaces legacy SHM globals with netipc-derived integration flags.
src/collectors/ebpf.plugin/ebpf.c Defines netipc-derived global integration flags and cleanup hook.
src/collectors/cgroups.plugin/sys_fs_cgroup.h Removes legacy SHM structs in favor of netipc metadata sharing.
src/collectors/cgroups.plugin/cgroup-netipc.h Adds cgroups.plugin netipc server init/cleanup API.
src/collectors/cgroups.plugin/cgroup-netipc.c Implements typed cgroups-snapshot netipc server for Linux.
src/collectors/cgroups.plugin/cgroup-discovery.c Starts/stops the netipc server instead of legacy SHM sharing.
CMakeLists.txt Builds/links standalone C netipc library and Netdata shim wrapper.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/collectors/ebpf.plugin/ebpf.c Outdated
Comment thread src/collectors/cgroups.plugin/cgroup-netipc.c
Comment thread src/crates/netipc/src/transport/shm_tests.rs Outdated
Comment thread src/crates/netipc/src/transport/shm.rs
Comment thread src/crates/netipc/src/protocol/cgroups.rs Outdated
@thiagoftsm thiagoftsm self-requested a review April 17, 2026 19:35
Comment thread src/crates/netipc/src/transport/shm_tests.rs Fixed
Copy link
Copy Markdown
Contributor

@thiagoftsm thiagoftsm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After address all requests, PR is working in all cgroup versions. LGTM!

@thiagoftsm thiagoftsm requested a review from stelfrag April 17, 2026 23:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/build Build system (autotools and cmake). area/collectors Everything related to data collection area/go collectors/cgroups collectors/ebpf collectors/go.d

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants