Multi-threaded Layer-2 Stress & Hardware Evaluation Tool for GPU Cluster Fabrics
Modern AI training and inference infrastructure depends on large GPU clusters interconnected by high-speed fabrics — typically RoCE or InfiniBand over 100/400/800 GbE. Collective communication libraries such as NCCL make heavy, continuous use of these fabrics: allreduce, allgather, broadcast, and related operations generate substantial traffic across multiple switch hops.
In this environment, a single misbehaving switch, a misconfigured NIC, or a fabric policy error does not necessarily cause an obvious outage. Instead, it can manifest as silent performance degradation — NCCL throughput drops, step times increase, and GPU utilization falls. These symptoms are often subtle and slow to develop, making root cause identification genuinely difficult.
This problem is not limited to initial bring-up. GPU cluster fabrics are complex systems whose behavior can drift over time: firmware updates, configuration changes, physical layer degradation, and incremental topology changes can all introduce regressions that were not present at initial qualification. Periodic re-validation is a practical necessity, not a one-time exercise.
Standard network qualification (like RFC 2544) does not exercise the fabric at the intensity needed for dense GPU clusters. Basidium provides the high-fidelity stress required to surface the failure modes that cause multi-million dollar idle-time in AI training.
A basidium (from the Latin, meaning "little pedestal") is the structural foundation that supports and launches spores into the environment. While the mushroom's cap gets the attention, the basidium is the microscopic machinery that ensures the next generation actually takes flight.
In a GPU cluster, the network fabric is the basidium. The models get the headlines, but they cannot exist without a stable pedestal. If the fabric is cracked, jittery, or misconfigured, the entire computational process fails. Basidium ensures the pedestal doesn't buckle under the weight of line-rate traffic before you risk your training budget.
--diffcompanion — compare two reports step-by-step. Pair with--seedfor bit-reproducible regression hunts.--stop-on-degradation N/--stop-on-failopen— halt sweeps and scenarios the moment a regression or fail-open is detected. Exits 2, scriptable for fail-fast gates.--validate scenario.tco,--print-config,--list-modes,--list-profiles— quality-of-life diagnostics for shell scripting and debugging silently-merged profile loads.--seed N,--ndjson,--csv,--report-compact— deterministic runs and machine-readable output for downstream tooling.- Hardening — atomic
conf.mode/conf.pps, packed wire-format structs with_Static_assert, validatedstrtoleverywhere (-V 5000,-T 10.0.0.0/40,-S 00:11,--duration 5xnow error out instead of corrupting state),sigaction+ SIGPIPE-ignore, OS-entropy RNG seeding, properclock_gettime-based per-packet rate limiter. make test— exhaustive offline test runner (~127 assertions);make asan/make tsanfor sanitizer rebuilds;make checkvalidates every shipped scenario; the man page lints clean withmandoc -Tlint.- Bash completion in
contrib/basidium.bash.
graph LR
subgraph "Layer-2 Stress"
PFC["PFC PAUSE Flood<br/>RoCE/RDMA deadlock testing"]
CAM["CAM Table Exhaustion<br/>fail-open detection"]
ARP["ARP / ND / IGMP<br/>table resource limits"]
STP["STP TCN Flood<br/>forced MAC flush"]
end
subgraph "Measurement"
SWEEP["Rate Sweep<br/>PPS ramp + JSON report"]
NCCL["NCCL Correlation<br/>per-step busbw + degradation%"]
end
subgraph "Orchestration"
TCO["TCO Scenarios<br/>multi-mode congestion patterns<br/>.tco scenario files"]
end
subgraph "Regression Detection"
STOP["--stop-on-degradation<br/>--stop-on-failopen<br/>halt + exit 2"]
DIFF["--diff baseline.json today.json<br/>step-by-step pps + busbw delta"]
end
PFC & CAM & ARP & STP --> SWEEP
SWEEP --> NCCL
TCO --> |"mode + PPS<br/>per step"| PFC & CAM & ARP & STP
TCO --> NCCL
NCCL --> STOP
SWEEP & TCO --> |"JSON report"| DIFF
- PFC / RDMA Stress: Flood PFC PAUSE frames to confirm RoCE/RDMA priority flow control is correctly configured and does not deadlock under congestion — a known failure mode in lossless Ethernet fabrics.
- L2 Table Exhaustion: Saturate CAM tables to verify switch fail-open behavior and VLAN isolation under load. Exhaust IGMP snooping and ARP tables to find resource limits before they surface in production.
- Rate Sweeps: Generate precise rate sweeps with JSON reporting to establish forwarding capacity baselines and detect regressions over time.
- NCCL Co-Validation: Run injection patterns simultaneously with NCCL collective tests to observe how Layer-2 stress conditions measurably affect application-layer throughput. This side-by-side view helps isolate whether a performance problem originates in the fabric or the software stack.
- Targeted Congestion Orchestration (TCO): Define multi-step, multi-mode congestion scenarios (
.tcofiles) that switch between flood modes at runtime while measuring NCCL degradation at each step. - Regression Detection: Identify performance drift caused by firmware updates, configuration changes, physical layer degradation, or incremental topology changes.
--seedmakes runs bit-reproducible;--diffcompares two reports step-by-step and exits 2 on threshold breach;--stop-on-degradationand--stop-on-failopenhalt mid-run for fail-fast scripting.
Authorization required. Use only on airgapped hardware you own or have explicit written permission to test. Never run against production infrastructure or equipment belonging to others.
Total Cluster Outage (TCO) Warning: Many modules in this tool — specifically PFC flooding and TCO orchestration — are designed to halt traffic flow. Using these on a live environment will likely trigger a Total Cluster Outage. Periodic re-validation is a practical necessity, but it must be conducted in a controlled, isolated environment.
Author: Matthew Stits <stits@stits.org>
Repository: https://github.com/mstits/Basidium
graph TD
CLI["basidium.c<br/>CLI / main loop<br/>sigaction + SIGPIPE ignored"] --> FLOOD["flood.c<br/>packet builders<br/>worker threads<br/>clock_gettime token-bucket"]
CLI --> TUI["tui.c<br/>ncurses TUI"]
CLI --> SWEEP["sweep_thread<br/>rate ramp<br/>--stop-on-degradation"]
CLI --> TCO["tco.c<br/>scenario orchestrator"]
CLI --> SNIFF["sniffer_thread<br/>learning / adaptive<br/>fail-open detection<br/>--stop-on-failopen"]
CLI --> DIFF["diff.c<br/>--diff a.json b.json<br/>regression detector"]
FLOOD --> PCAP["libpcap<br/>pcap_inject<br/>pcap_next_ex"]
TCO --> |"_Atomic conf.mode<br/>+ _Atomic conf.pps"| FLOOD
SWEEP --> |"_Atomic conf.pps<br/>+ launches NCCL"| NCCL
TCO --> |"launches NCCL<br/>per step"| NCCL["nccl.c<br/>NCCL subprocess"]
TUI --> NCCL
TUI --> NIC["nic_stats.c<br/>Linux: /sys/class/net<br/>macOS: getifaddrs"]
CLI --> REPORT["report.c<br/>JSON / CSV / compact"]
CLI --> PROFILES["profiles.c<br/>XDG_CONFIG_HOME<br/>+ legacy ~/.basidium/"]
SWEEP -.-> |"--stop-on-degradation"| CLI
SNIFF -.-> |"--stop-on-failopen"| CLI
style DIFF fill:#369,stroke:#000,color:#fff
graph LR
subgraph "Flood Modes"
M0[mac<br/>CAM flood]
M1[arp<br/>ARP storm]
M2[dhcp<br/>starvation]
M3[pfc<br/>RoCE/RDMA]
M4[nd<br/>IPv6 ND]
M5[lldp<br/>CPU path]
M6[stp<br/>TCN flush]
M7[igmp<br/>multicast snoop]
end
subgraph "Worker Thread"
FP[Fast path<br/>Xorshift128+<br/>direct MAC write]
SP[Slow path<br/>per-thread RNG<br/>full packet rebuild]
end
M0 -->|no stealth/VLAN-range| FP
M0 -->|stealth/learning/VLAN| SP
M1 & M2 & M3 & M4 & M5 & M6 & M7 --> SP
graph TD
SRC{"--seed N<br/>set?"} -->|"no"| ENTROPY["entropy_seed<br/>getrandom() / urandom"]
SRC -->|"yes"| FIXED["rng_base_seed = N<br/>(deterministic)"]
ENTROPY & FIXED --> MAIN[main thread<br/>derives probe_signature]
MAIN --> W0[Worker 0<br/>rng_init_seed offset=0]
MAIN --> W1[Worker 1<br/>rng_init_seed offset=1]
MAIN --> WN[Worker N<br/>rng_init_seed offset=N]
W0 --> MIX0["SplitMix64<br/>finalizer (decorrelate<br/>adjacent offsets)"]
W1 --> MIX1["SplitMix64<br/>finalizer"]
WN --> MIXN["SplitMix64<br/>finalizer"]
MIX0 --> RNG0[xorshift128+<br/>thread-local state]
MIX1 --> RNG1[xorshift128+<br/>thread-local state]
MIXN --> RNGN[xorshift128+<br/>thread-local state]
RNG0 --> B0[build_packet_*<br/>rng_rand for MACs,IPs]
RNG1 --> B1[build_packet_*<br/>rng_rand for MACs,IPs]
RNGN --> BN[build_packet_*<br/>rng_rand for MACs,IPs]
sequenceDiagram
participant User
participant TUI
participant Workers
participant Sniffer
participant Switch
User->>TUI: launch --tui
TUI->>Workers: spawn (standby)
TUI->>Sniffer: spawn (if learning/detect)
Sniffer->>Sniffer: install BPF filter
User->>TUI: press s (start)
TUI->>Workers: set is_started=1
loop inject
Workers->>Switch: inject frames
Switch-->>Sniffer: echo (if fail-open)
Sniffer-->>TUI: fail_open_detected alert
end
User->>TUI: press q
TUI->>Workers: set is_running=0
graph TB
subgraph "GPU Training Cluster"
GPU1["GPU Server 1<br/>ConnectX-7 100GbE"]
GPU2["GPU Server 2<br/>ConnectX-7 100GbE"]
GPU3["GPU Server 3<br/>ConnectX-7 100GbE"]
GPU4["GPU Server 4<br/>ConnectX-7 100GbE"]
end
subgraph "Fabric"
TOR1["ToR Switch 1<br/>Lossless Ethernet<br/>PFC + ECN"]
TOR2["ToR Switch 2<br/>Lossless Ethernet<br/>PFC + ECN"]
SPINE["Spine Switch"]
end
GPU1 & GPU2 --> TOR1
GPU3 & GPU4 --> TOR2
TOR1 & TOR2 --> SPINE
subgraph "Basidium Host"
B["Basidium"]
TCO_F[".tco scenario"]
SEED["--seed N<br/>(deterministic)"]
NCCL_T["NCCL test"]
REPORT_F["JSON / CSV report"]
end
subgraph "Regression Detection"
BASE["baseline.json<br/>(captured pre-change)"]
DIFF["basidium --diff<br/>baseline.json today.json"]
VERDICT{"exit code"}
end
TCO_F & SEED --> |"defines steps + seed"| B
B --> |"inject PFC/MAC/ARP/STP"| TOR1
B --> |"launches per step"| NCCL_T
NCCL_T --> |"allreduce via fabric"| TOR1
B --> REPORT_F
BASE --> DIFF
REPORT_F --> |"today.json"| DIFF
DIFF --> VERDICT
VERDICT --> |"0 = OK"| OK["promote build"]
VERDICT --> |"2 = regression"| FAIL["fail the gate / alert"]
style B fill:#2d6,stroke:#000,color:#fff
style TOR1 fill:#f96,stroke:#000
style TCO_F fill:#369,stroke:#000,color:#fff
style DIFF fill:#369,stroke:#000,color:#fff
style FAIL fill:#c33,stroke:#000,color:#fff
Dependencies: libpcap-dev, libncurses-dev (TUI only), gcc, make,
python3 + bash (for make test only), mandoc (optional, for man-page lint)
# CLI only
make
# With ncurses TUI
make TUI=1
# Debug build (no fortify, no opt)
make debug
# AddressSanitizer + UndefinedBehaviorSanitizer
make asan
# ThreadSanitizer
make tsan
# Install to /usr/local (includes man page, examples, bash completion)
sudo make install
# Custom prefix
sudo make install PREFIX=/opt/local
# 14-test self-test suite (packet builders + TCO/NCCL parsers)
sudo make selftest
# Exhaustive offline test suite (~125 assertions: every flag, error path,
# packet-builder content via pcap-out, RNG determinism, profile loader,
# diff regression detection, NDJSON/CSV/compact reports, signal handling,
# sanitizer build). Does not need sudo or a NIC.
make testPlatform notes:
- Linux: fully supported; NIC TX/RX statistics read from
/sys/class/net/ - macOS: fully supported; NIC statistics read via
getifaddrs()+AF_LINKif_data - FreeBSD / OpenBSD / NetBSD: NIC statistics supported via same BSD
getifaddrs()path - Raw packet injection requires root (
sudo) on all platforms
# ---- Build ----
make TUI=1 # compile with ncurses TUI
# ---- Basic Stress ----
sudo ./basidium -i eth0 -t 4 # MAC CAM flood, 4 threads
sudo ./basidium -i eth0 -M arp -r 5000 --tui # ARP storm at 5000 pps with TUI
sudo ./basidium -i eth0 -M pfc # PFC PAUSE flood on RDMA priority 3
sudo ./basidium -i eth0 -M igmp -t 4 # IGMP snooping exhaustion
# ---- Rate Sweep + NCCL Correlation ----
sudo ./basidium -i eth0 --sweep 1000:50000:5000:30 --nccl --report
# ---- TCO Scenario ----
sudo ./basidium -i eth0 --scenario pfc-ramp.tco --nccl --report
# ---- Fail-Open Detection ----
sudo ./basidium -i eth0 --detect -A --tui
# ---- Dry Run (no sudo, no NIC) ----
./basidium --dry-run -M pfc -n 1000
# ---- Regression Detection ----
sudo ./basidium -i eth0 --sweep 1000:50000:5000:30 --nccl --report=baseline.json --seed 42
# ... after a firmware change, repeat the run, then compare:
sudo ./basidium -i eth0 --sweep 1000:50000:5000:30 --nccl --report=today.json --seed 42
basidium --diff baseline.json today.json --diff-threshold-busbw -10
# ---- Fail-fast scripted gate ----
sudo ./basidium -i eth0 --scenario examples/pfc-recovery.tco --nccl \
--stop-on-degradation 30 --stop-on-failopen --report=ci.json
# Exit 2 means regression — fail the job.Build your models on a solid pedestal. Build on Basidium.
| Mode | dst MAC | EtherType | Effect |
|---|---|---|---|
mac |
random | 0x0800 | Exhausts CAM table; switch degrades to hub |
arp |
ff:ff:ff:ff:ff:ff | 0x0806 | Floods ARP table |
dhcp |
ff:ff:ff:ff:ff:ff | 0x0800 | Starves DHCP address pool |
pfc |
01:80:C2:00:00:01 | 0x8808 | Freezes RoCE/RDMA priority queues |
nd |
33:33:ff:xx:xx:xx | 0x86DD | Exhausts IPv6 ND/NDP table |
lldp |
01:80:C2:00:00:0E | 0x88CC | Stresses switch CPU / LLDP daemon |
stp |
01:80:C2:00:00:00 | LLC | Triggers repeated MAC table flushes |
igmp |
01:00:5E:xx:xx:xx | 0x0800 | Exhausts IGMP snooping table |
| Flag | Default | Description |
|---|---|---|
-i <iface> |
required | Network interface |
-t <n> |
1 | Worker threads (max 16) |
-r <pps> |
0 (unlimited) | Rate limit packets/sec |
-J <bytes> |
60 | Frame size (60-9216) |
-n <count> |
0 (unlimited) | Stop after N frames |
| Flag | Default | Description |
|---|---|---|
-V <id> |
0 (untagged) | 802.1Q VLAN ID (1-4094) |
--vlan-pcp <0-7> |
0 | 802.1p priority bits |
--vlan-range <end> |
— | Random VID per frame from -V to end |
--qinq <outer-vid> |
— | 802.1ad outer tag (combine with -V for double-tag) |
--pfc-priority <0-7> |
3 | PFC priority class (3 = RDMA on Mellanox/NVIDIA) |
--pfc-quanta <val> |
65535 | PFC pause duration (0-65535) |
| Flag | Description |
|---|---|
-S <OUI> |
Restrict source MAC OUI (e.g. 00:11:22) |
-T <CIDR> |
Embed IPs from subnet; repeatable up to 64 |
-L |
Learning mode — skip observed MACs |
-A |
Adaptive mode — throttle on broadcast storm |
-U |
Allow multicast source MACs |
-R |
Randomize DHCP client MAC independently |
| Flag | Description |
|---|---|
--burst <count:gap_ms> |
Send count frames at wire speed, pause gap_ms ms |
--detect |
Fail-open detection via embedded probe signature |
--payload <pattern> |
MAC flood payload: zeros ff dead incr |
--sweep start:end:step[:hold_s]
Ramps injection rate from start to end PPS in step increments, holding each for hold_s seconds (default 10). Exits on completion and writes a JSON report.
| Flag | Description |
|---|---|
-v |
Verbose per-thread and live PPS |
-l <file> |
JSON event log |
--tui |
ncurses TUI (requires make TUI=1) |
--report [file] |
JSON session report on exit |
--pcap-out <file> |
Write frames to .pcap |
--pcap-replay <file> |
Replay .pcap onto interface |
| Flag | Description |
|---|---|
--nccl |
NCCL busbw correlation panel in TUI; per-step measurement during --sweep and --scenario |
--nccl-binary <path> |
Path to nccl-tests binary (implies --nccl) |
| Flag | Description |
|---|---|
--scenario <file> |
Run a multi-step congestion scenario from a .tco file (mutually exclusive with --sweep) |
| Flag | Description |
|---|---|
--profile <name> |
Load ~/.basidium/<name>.conf (also honors $XDG_CONFIG_HOME and $BASIDIUM_PROFILE_DIR) |
--duration <time> |
Auto-stop: 30, 5m, 2h, 1d |
| Flag | Description |
|---|---|
--stop-on-failopen |
Halt run on first fail-open detection (exit 2) |
--stop-on-degradation N |
Halt sweep/scenario when NCCL busbw drops past -N% (sign-tolerant; exit 2) |
| Flag | Description |
|---|---|
--ndjson |
One JSON status object per second on stdout (replaces in-place spinner) |
--csv <file> |
Emit sweep/scenario steps as CSV alongside the JSON report |
--report-compact |
Single-line JSON report (post-processed; quoted strings preserved) |
| Flag | Description |
|---|---|
--diff <a.json> <b.json> |
Compare two reports; exit 2 on threshold breach (subcommand-style: skips other flags) |
--diff-threshold-pps N |
PPS regression threshold for --diff (default -10) |
--diff-threshold-busbw N |
NCCL busbw regression threshold for --diff (default -10) |
--seed N |
Seed RNG and probe signature deterministically (default: OS entropy via getrandom() / /dev/urandom) |
| Flag | Description |
|---|---|
--selftest |
Run 14 built-in validation tests |
--validate <file.tco> |
Parse and validate a scenario file (exit 0/1; line-numbered diagnostics) |
--print-config |
Dump merged effective config (defaults + profile + flags) and exit |
--list-modes |
Print supported flood modes and exit (one per line) |
--list-profiles |
Print saved profile names and exit (one per line) |
--version |
Print version and exit (--version --json for machine-parsable form) |
--dry-run |
Build & count packets without injecting (no sudo needed) |
Launch with --tui (requires make TUI=1). Starts in STANDBY — no injection until you press s or Enter.
| Key | Action |
|---|---|
s / Enter |
Start injecting |
| Space | Pause / Resume |
q |
Quit |
? |
Help overlay |
p |
Profile menu |
+ / = |
Rate +1000 pps |
- |
Rate -1000 pps |
o |
Set OUI prefix |
v |
Set VLAN ID |
n |
Toggle NCCL panel |
b |
Record NCCL baseline |
l |
Load .pcap for replay |
- Header — mode, interface,
[STANDBY]/[RUNNING]/[PAUSED], blinking[!FAIL-OPEN DETECTED!]when triggered - Live Stats — PPS, total frames, uptime, session countdown, sparkline, per-thread PPS, NIC tx/rx/drop/error
- Config — mode, rate or sweep progress, threads, OUI, VLAN/PFC settings
- NCCL — busbw, baseline, degradation% (with
--nccl) - Log — scrolling event log
Each flood mode targets a specific failure mode. The table below maps each mode to the switch counters and behavior you should observe during testing.
| Mode | Target Failure | Expected Switch Behavior | Key Counters / Logs |
|---|---|---|---|
mac |
CAM table overflow, fail-open | dot1dTpLearnedEntryDiscards climbs; port may flood all frames. Use --detect to confirm. |
dot1dTpLearnedEntryDiscards (1.3.6.1.2.1.17.4.3.1.3), ifInDiscards |
pfc |
PFC deadlock, watchdog trigger | Target priority queue pauses; watch for PFC watchdog syslog events. Lossless traffic on that priority should halt. | Memory buffer utilization, PFC watchdog syslog, cbQosPoliceCfgRate |
arp |
ARP table exhaustion | ARP table fills; new entries fail to resolve. Watch for ARP timeouts in switch logs. | ipNetToMediaTable (1.3.6.1.2.1.4.22), ARP cache size |
dhcp |
DHCP pool starvation | DHCP server exhausts address pool. Useful for testing relay agent behavior. | DHCP server pool utilization, relay counters |
stp |
Spanning-tree instability | TCN triggers MAC table flush followed by brief flood mode per flush. dot1dStpTopChanges should increment rapidly. |
dot1dStpTopChanges (1.3.6.1.2.1.17.2.4), dot1dStpRootPort |
igmp |
IGMP snooping table exhaustion | Snooping table fills; switch falls back to flooding multicast. | igmpCacheTable (1.3.6.1.2.1.85.1.2), multicast group count |
lldp |
Control-plane CPU stress | LLDP neighbor count climbs; switch CPU may spike. Watch for LLDP flap warnings. | lldpRemTable, CPU utilization MIB |
nd |
IPv6 ND table exhaustion | ND cache fills; neighbor resolution fails for legitimate hosts. | IPv6 neighbor cache size, ICMPv6 error counters |
Basidium's JSON event log and session reports pair directly with SNMP polling to correlate injection activity with live switch MIB counters.
| Metric | MIB Object | OID |
|---|---|---|
| CAM discard events | dot1dTpLearnedEntryDiscards |
1.3.6.1.2.1.17.4.3.1.3 |
| CAM aging time | dot1dTpAgingTime |
1.3.6.1.2.1.17.4.2 |
| STP topology changes | dot1dStpTopChanges |
1.3.6.1.2.1.17.2.4 |
| Interface errors | ifInErrors |
IF-MIB::ifInErrors |
| Interface discards | ifInDiscards |
IF-MIB::ifInDiscards |
| IGMP group table | igmpCacheTable |
1.3.6.1.2.1.85.1.2 |
#!/bin/bash
# poll-snmp.sh — record CAM, STP, and interface counters during injection
SWITCH=192.168.1.1
COMMUNITY=public
while true; do
TS=$(date +%s)
CAM=$(snmpget -v2c -c $COMMUNITY $SWITCH \
1.3.6.1.2.1.17.4.3.1.3.0 2>/dev/null | awk '{print $NF}')
ERR=$(snmpget -v2c -c $COMMUNITY $SWITCH \
IF-MIB::ifInErrors.1 2>/dev/null | awk '{print $NF}')
STP=$(snmpget -v2c -c $COMMUNITY $SWITCH \
1.3.6.1.2.1.17.2.4.0 2>/dev/null | awk '{print $NF}')
echo "$TS cam_discards=$CAM if_errors=$ERR stp_topo_changes=$STP"
sleep 1
done# Terminal 1: start SNMP polling
./poll-snmp.sh | tee snmp-log.txt &
# Terminal 2: run Basidium sweep
sudo ./basidium -i eth0 --sweep 1000:100000:10000:5 --report sweep.json#!/usr/bin/env python3
"""
Correlate basidium sweep JSON with live SNMP counters.
Requires: pip install pysnmp
"""
import json
from pysnmp.hlapi import *
SWITCH = "192.168.1.1"
COMMUNITY = "public"
REPORT = "sweep.json"
def snmp_get(oid):
it = getCmd(
SnmpEngine(),
CommunityData(COMMUNITY, mpModel=1),
UdpTransportTarget((SWITCH, 161)),
ContextData(),
ObjectType(ObjectIdentity(oid))
)
errorIndication, errorStatus, _, varBinds = next(it)
if errorIndication or errorStatus:
return None
return int(varBinds[0][1])
with open(REPORT) as f:
report = json.load(f)
print(f"Interface: {report['interface']}")
print(f"Duration: {report['duration_s']}s")
print(f"Total sent: {report['total_packets']:,}")
print(f"Peak PPS: {report['peak_pps']:,}")
print()
cam = snmp_get("1.3.6.1.2.1.17.4.3.1.3.0")
stp = snmp_get("1.3.6.1.2.1.17.2.4.0")
errs = snmp_get("1.3.6.1.2.1.2.2.1.14.1")
print(f"CAM discards: {cam}")
print(f"STP topo changes: {stp}")
print(f"Interface errors: {errs}")
print()
if report.get("sweep"):
print("Sweep results:")
for s in report["sweep"]:
eff = s['achieved_pps'] / s['target_pps'] * 100
print(f" step {s['step']:>2}: {s['target_pps']:>8} pps target "
f"→ {s['achieved_pps']:>8} achieved ({eff:.1f}%)")# Capture SNMP traps from the switch
snmptrapd -f -Lo -c /etc/snmp/snmptrapd.conf &
# Run STP TCN flood
sudo ./basidium -i eth0 -M stp -r 100 -l events.json
# Count topology change traps received
grep -c "topologyChange" /var/log/snmptrapd.logBasidium embeds a random 16-bit probe signature in the IP ID field of every MAC-flood frame. The sniffer thread watches the interface; if a frame with that signature is received back, the switch has entered hub mode.
sequenceDiagram
participant B as Basidium
participant SW as Switch (healthy)
participant SW2 as Switch (fail-open)
B->>SW: inject frame [ip_id=probe]
SW->>SW: CAM lookup → unicast forward
note over SW: not echoed back
B->>SW2: inject frame [ip_id=probe]
SW2->>SW2: CAM full → flood all ports
SW2-->>B: echo [ip_id=probe]
B->>B: fail_open_detected = 1
B-->>B: TUI alert + FAIL_OPEN log event
note over B: --stop-on-failopen:<br/>halt run, exit 2
sudo ./basidium -i eth0 --detect -A --tuiflowchart LR
A[sweep_start PPS] -->|+sweep_step| B[inject for sweep_hold s]
B --> N{--nccl?}
N -->|yes| F[launch NCCL test]
F --> G[wait for NCCL completion]
G --> H[record PPS + busbw]
H --> SOD{"--stop-on-degradation<br/>threshold breached?"}
SOD -->|yes| HALT["halt: write report<br/>exit 2"]
SOD -->|no| C
N -->|no| E[record achieved PPS]
E --> C{reached sweep_end?}
C -->|no| A
C -->|yes| D["write JSON / CSV report<br/>exit 0"]
# Standard sweep (no NCCL)
sudo ./basidium -i eth0 --sweep 1000:100000:10000:5 --report /tmp/report.json
# NCCL-correlated sweep — measures busbw at each congestion level
sudo ./basidium -i eth0 -M pfc --sweep 1000:50000:5000:30 --nccl --reportWhen --sweep and --nccl are both active, Basidium launches an NCCL test at each sweep step and waits for it to complete before advancing to the next PPS level. The first step's busbw becomes the baseline; subsequent steps report degradation relative to that baseline. This produces per-step correlation showing exactly how congestion affects application-layer throughput.
Note: The NCCL test runs concurrently with injection, so it measures busbw under active congestion. The sweep hold time should be at least as long as the NCCL test duration (typically 30-120s depending on
--nccl-binaryargs). If the NCCL test takes longer than the hold period, the sweep waits for completion before moving on.
Example report (with NCCL correlation):
{
"generated": "2026-04-10T22:00:00Z",
"interface": "eth0",
"mode": "pfc",
"threads": 1,
"duration_s": 180,
"total_packets": 4823000,
"peak_pps": 48200,
"sweep": {
"start": 1000,
"end": 50000,
"step": 5000,
"hold_s": 30,
"nccl_baseline_busbw": 76.50,
"steps": [
{"pps_target": 1000, "pps_achieved": 999, "nccl_busbw": 76.50, "nccl_degradation_pct": 0.0},
{"pps_target": 6000, "pps_achieved": 5998, "nccl_busbw": 74.20, "nccl_degradation_pct": -3.0},
{"pps_target": 11000, "pps_achieved": 10995, "nccl_busbw": 68.10, "nccl_degradation_pct": -11.0},
{"pps_target": 16000, "pps_achieved": 15990, "nccl_busbw": 52.30, "nccl_degradation_pct": -31.6}
]
}
}Scenario files (.tco) define multi-step, multi-mode congestion patterns for automated fabric qualification. The orchestrator thread steps through each configuration, dynamically switching worker threads between flood modes at runtime. With --nccl, each step measures application-layer throughput under the current congestion conditions.
flowchart TD
LOAD["Load + validate .tco scenario<br/>(or basidium --validate file)"] --> STEP["Apply step: _Atomic conf.mode + conf.pps<br/>workers detect change, memset buffer, rebuild template"]
STEP --> INJECT["Workers inject at target rate"]
INJECT --> NCCL_Q{"--nccl?"}
NCCL_Q --> |yes| NCCL_RUN["Launch NCCL test<br/>measure busbw under congestion"]
NCCL_Q --> |no| HOLD["Hold for duration_s"]
NCCL_RUN --> HOLD
HOLD --> RECORD["Record achieved PPS + busbw + NIC delta"]
RECORD --> SOD{"--stop-on-degradation<br/>threshold breached?"}
SOD --> |yes| HALT["halt: write report<br/>exit 2"]
SOD --> |no| MORE{"More steps?"}
MORE --> |yes| STEP
MORE --> |no| REPORT["Write JSON / CSV report + exit 0"]
# Each line: mode pps duration_s [nccl]
# Comments start with #. Blank lines ignored.
mac 1000 30 nccl # baseline: light MAC flood + NCCL measurement
pfc 5000 60 nccl # light PFC stress
pfc 20000 60 nccl # moderate PFC stress
pfc 50000 60 nccl # heavy PFC stress
arp 10000 30 # ARP storm (no NCCL measurement this step)
mac 1000 30 nccl # recovery baseline
# Run scenario with NCCL correlation
sudo ./basidium -i eth0 --scenario /path/to/scenario.tco --nccl --report
# Run scenario without NCCL
sudo ./basidium -i eth0 --scenario examples/pfc-stress-ramp.tco --report{
"scenario": {
"name": "pfc-stress-ramp",
"file": "examples/pfc-stress-ramp.tco",
"nccl_baseline_busbw": 76.50,
"steps": [
{"mode": "mac", "pps_target": 1000, "duration_s": 30, "pps_achieved": 999, "nccl_busbw": 76.50, "nccl_degradation_pct": 0.0},
{"mode": "pfc", "pps_target": 5000, "duration_s": 60, "pps_achieved": 4998, "nccl_busbw": 74.20, "nccl_degradation_pct": -3.0},
{"mode": "pfc", "pps_target": 20000, "duration_s": 60, "pps_achieved": 19995, "nccl_busbw": 68.10, "nccl_degradation_pct": -11.0},
{"mode": "pfc", "pps_target": 50000, "duration_s": 60, "pps_achieved": 49800, "nccl_busbw": 52.30, "nccl_degradation_pct": -31.6},
{"mode": "arp", "pps_target": 10000, "duration_s": 30, "pps_achieved": 9998},
{"mode": "mac", "pps_target": 1000, "duration_s": 30, "pps_achieved": 999, "nccl_busbw": 75.80, "nccl_degradation_pct": -0.9}
]
}
}Basidium ships a built-in regression detector that closes the loop on the
--report design. The typical workflow:
flowchart LR
subgraph "Capture baseline (one-time)"
B1["sudo basidium -i eth0<br/>--scenario qual.tco --nccl<br/>--seed 42<br/>--report=baseline.json"]
end
subgraph "After firmware / config / topology change"
B2["sudo basidium -i eth0<br/>--scenario qual.tco --nccl<br/>--seed 42<br/>--report=today.json"]
end
subgraph "Compare"
D["basidium --diff baseline.json today.json<br/>--diff-threshold-busbw -10<br/>--diff-threshold-pps -5"]
end
B1 -->|"baseline.json"| D
B2 -->|"today.json"| D
D --> R{"step delta<br/>vs threshold"}
R -->|"all within threshold"| OK["exit 0<br/>(merge / promote)"]
R -->|"any breach"| FAIL["exit 2<br/>(fail the gate)"]
style D fill:#369,stroke:#000,color:#fff
style FAIL fill:#c33,stroke:#000,color:#fff
--seed N makes the worker RNG and probe signature deterministic, so the
packet content of two runs at the same seed is bit-identical — the only
legitimate source of variance between baseline and today is the fabric
itself.
--diff outputs a step-by-step table:
step mode old_pps new_pps Δpps% old_busbw new_busbw Δbusbw%
---- ---- ------- ------- ----- --------- --------- -------
1 mac 1000 1000 +0.0% 76.50 76.40 -0.1%
2 pfc 4998 4995 -0.1% 74.20 74.10 -0.1%
3 pfc 19995 19890 -0.5% 68.10 68.00 -0.1%
4 pfc 49800 12300 -75.3% 52.30 18.40 -64.8%
REGRESSION: at least one step exceeded threshold (pps<=-5.0%, busbw<=-10.0%)
For mid-run halts (instead of running the full sweep / scenario and then
diffing), pair --stop-on-degradation N with --stop-on-failopen. The
sweep / orchestrator threads watch each step's NCCL measurement; if the
busbw drops past -N% of the baseline (sign-tolerant — 30 and -30
both mean "stop at 30% drop"), the run halts and exits 2 immediately.
--stop-on-failopen does the same on the first echoed probe frame.
# Fail-fast gate: halt at 30% NCCL drop OR first fail-open detection
sudo ./basidium -i eth0 --scenario qual.tco --nccl --seed 42 \
--stop-on-degradation 30 --stop-on-failopen --report=ci.json
# Exit 2 means regression — fail the job.IEEE 802.3 MAC Control frame layout for PFC mode:
[dst: 01:80:C2:00:00:01 (6B)][src: random (6B)]
[EtherType: 0x8808 (2B)][Opcode: 0x0101 (2B)]
[Priority Enable Vector (2B)][quanta[0..7]: 16B]
[pad to 60B]
Default priority 3 is the standard lossless class on Mellanox/NVIDIA ConnectX and BlueField. Only the target priority bit is set in the PEV; all other quanta are zero.
sudo ./basidium -i eth0 -M pfc -V 100 --pfc-priority 3 --pfc-quanta 65535macOS 26.2 introduced native RDMA over Thunderbolt 5 with an ibverbs-compatible API (infiniband/verbs.h, librdma.tbd). This is a fundamentally different transport from the RoCE/InfiniBand-over-Ethernet fabrics that Basidium targets:
| GPU Cluster Fabric (Basidium) | Apple TB5 RDMA | |
|---|---|---|
| Transport | Ethernet (RoCEv2) | Thunderbolt 5 protocol |
| Flow control | PFC PAUSE (IEEE 802.1Qbb) | Credit-based (TB controller HW) |
| Topology | Multi-hop switched fabric | Point-to-point, no switch |
Basidium's -M pfc mode generates IEEE 802.1Qbb MAC Control frames (EtherType 0x8808) that stress-test Ethernet switch priority queues. A Thunderbolt 5 controller does not process these frames — it uses credit-based flow control at the hardware level. The failure modes that matter for TB5 RDMA (QP exhaustion, PD leaks, credit stalls) are verbs-level application concerns, not Layer-2 fabric issues.
With -V 100 --qinq 200 the wire format is:
[dst][src][0x88A8][outer TCI VID=200][0x8100][inner TCI VID=100][EtherType][payload]
Useful for provider bridges (802.1ad), metro Ethernet, and L2VPN stitching.
sudo ./basidium -i eth0 -V 100 --qinq 200 -t 4Profiles are stored as key=value files. The lookup order is:
$BASIDIUM_PROFILE_DIR(explicit override)$XDG_CONFIG_HOME/basidium/(XDG basedir spec) — used only when the legacy directory does not exist~/.basidium/(legacy, kept for compat)
All fields — VLAN, PFC, sweep, burst, detect, QinQ, payload, threads, rate — are persisted. Profile names are restricted to alphanumeric characters, dashes, and underscores to prevent path traversal. Numeric fields are validated against their accepted ranges on load (so e.g. threads=99 or mode=bogus is rejected with a field-named diagnostic instead of silently falling back to defaults). CRLF line endings are tolerated.
# Save from TUI: press p → s → type name → Enter
# Load from CLI
sudo ./basidium --profile rdma-stress
sudo ./basidium --profile stp-flood
# List saved profiles
basidium --list-profiles
# Inspect what a profile resolves to
basidium --profile rdma-stress --print-configbasidium.c main(), CLI parsing, thread orchestration, sigaction
sigaction(SIGINT/SIGTERM/SIGPIPE), validated strtol parsers
flood.c packet builders, worker threads, sniffer, RNG (xorshift128+
seeded via getrandom()/urandom + SplitMix64), token-bucket
rate limiter (clock_gettime + nanosleep), selftest
flood.h shared types, flood_mode_t enum, config struct, prototypes;
wire-format structs are __attribute__((packed)) with
_Static_assert size guards
tco.c/.h TCO scenario parser + orchestrator thread
tui.c ncurses TUI (make TUI=1)
nccl.c/.h NCCL subprocess orchestration
profiles.c/.h named profile save/load with XDG_CONFIG_HOME support, CRLF
tolerance, range-checked strtol fields, name sanitization
nic_stats.c/.h NIC statistics (Linux: sysfs, macOS/BSD: getifaddrs)
report.c/.h JSON / CSV / compact session report writer
diff.c/.h --diff regression detection: parse two reports, compare
pps_achieved + nccl_busbw step-by-step, exit 2 on breach
contrib/
basidium.bash bash completion (modes, flags, scenario files, profiles)
examples/*.tco shipped scenarios (validated by `make check` via --validate)
tests/run-all.sh exhaustive offline test suite (~125 assertions)
basidium.8 man page (lints clean with `mandoc -Tlint`)
In MAC flood mode without stealth, learning, or VLAN-range active, workers use Xorshift128+ to overwrite only the 12 MAC bytes of a pre-built frame template — no packet-builder overhead, near wire-rate throughput. The fast path uses memcpy for alignment safety on strict-alignment platforms; mode switches under TCO trigger a buffer wipe + template rebuild before the next iteration so stale bytes from the previous mode never leak forward.
All packet builders accept a per-thread struct rng_state * parameter. No global rand() calls occur in worker threads. Each thread initializes its own xorshift128+ state from a base seed (entropy or --seed N) mixed through SplitMix64 with a per-thread offset, so adjacent threads do not produce correlated streams. conf.mode and conf.pps are _Atomic-qualified, giving sweep/TCO-to-worker writes seq_cst semantics without changing call-site syntax. make asan + make tsan rebuilds run the selftest cleanly on every release.
| Library | Debian/Ubuntu | RHEL/Fedora | macOS (Homebrew) |
|---|---|---|---|
| libpcap | libpcap-dev |
libpcap-devel |
brew install libpcap (usually preinstalled) |
| libpthread | standard | standard | standard |
| libncurses | libncurses-dev |
ncurses-devel (TUI only) |
preinstalled |
| python3 | python3 |
python3 (for make test only) |
preinstalled |
| mandoc | mandoc |
mandoc (optional, man-page lint) |
brew install mandoc |
For authorized laboratory use.
© Matthew Stits — https://github.com/mstits/Basidium
Build your models on a solid pedestal.