Motivation
While debugging an encoder fps gap between OpenIPC and vendor firmware running on identical gk7205v300 silicon, I needed a quick way to verify that DDR was actually performing at the same speed on both boards. Building, deploying, and timing a one-off C benchmark for each test ate a lot of session time. A bundled tool would have made this a one-liner.
Memory-throughput is a useful baseline check whenever an SoC misbehaves: encoder/ISP fps caps, frame drops, slow dd to flash, etc. It separates "the CPU pipeline is the bottleneck" from "the DDR pipeline is the bottleneck" in a few seconds.
Concrete example from the field
Two boards, same goke,gk7205v300 per device-tree, same OTP at 0x12020084, same chip ID registers, same DDR-clock select bit (CRG 0x12010080 = 0x549 → 450 MHz on both):
| Op (16 MB buffer, both streamers idle) |
OpenIPC |
XM vendor |
Ratio |
| memset (write) |
1622 MB/s |
2264 MB/s |
+40% |
| read scan (volatile uint32_t sum) |
496 MB/s |
558 MB/s |
+12% |
| memcpy (R+W) |
1547 MB/s |
2109 MB/s |
+36% |
That ratio matched the wire-fps gap (~20%) we were seeing, which immediately pointed the investigation at the CPU/DDR PLL multiplier — set by mask ROM based on per-die HPM characterization (HPM_CHECK_REG at 0x1202015c differed: 0xF3 vs 0xF6 → mask ROM picks CPU PLL FBDIV=0x77 vs 0x8F → 952 MHz vs 1144 MHz).
Without a bundled bandwidth test, this was hours of cross-compile / scp / measure / cross-compile loop.
Proposed subcommand
ipctool membw [options]
--size MB buffer size per pass (default: 16; must exceed L2 cache)
--iters N number of passes per op (default: 16)
--ops set,... comma list of ops to run: write,read,copy (default: all)
--json machine-readable output
Sample run:
$ ipctool membw
Chip: gk7205v300 DDR clock: 450 MHz
Buffer size: 16 MB, iters: 16
memset (write): 1622 MB/s (0.090 s)
read scan: 496 MB/s (0.288 s)
memcpy (R+W): 1547 MB/s (0.220 s)
Reference implementation
Self-contained, static, libc-portable. Uses mmap of /dev/zero rather than malloc so the buffers come from anonymous DDR pages rather than tmpfs/page cache. The read scan loop uses volatile uint32_t sum to force actual loads (compiler can't elide).
// see attached membw.c in the issue (~70 lines)
Notes / caveats worth documenting in the tool:
- Buffer size must exceed L2 (typically 256K-1MB on V4 family) or you measure L2, not DDR.
- Streamer activity loads DDR via DMA. Stop majestic / vendor App before measuring DDR config; run with streamer when comparing real workload bandwidth.
- libc memset/memcpy implementations differ (musl vs uClibc vs glibc); the
read scan loop is libc-independent and is the most trustworthy across builds.
- On very dark scenes, AE-extended exposure dominates rate; measure under known lighting if you correlate with stream fps.
Where this fits in ipctool
Currently ipctool is "the read-only hardware probe" — chip ID, sensor, MMZ layout, etc. membw fits the same shape: read-only, fast, prints a one-shot diagnostic. Doesn't write anything, doesn't need privileges beyond root (which ipctool already has).
The size measurement should be small enough that running on a live camera with majestic up doesn't cause frame drops (8 MB × 8 iters takes <2 s and consumes ~1 GB-second of bandwidth).
Related work
Companion request being filed in OpenIPC/defib to run the same test in U-Boot/bare-metal context — useful for proving "DDR is fine on bare metal, Linux is the problem" or vice versa.
Self-contained C source attached.
Motivation
While debugging an encoder fps gap between OpenIPC and vendor firmware running on identical
gk7205v300silicon, I needed a quick way to verify that DDR was actually performing at the same speed on both boards. Building, deploying, and timing a one-off C benchmark for each test ate a lot of session time. A bundled tool would have made this a one-liner.Memory-throughput is a useful baseline check whenever an SoC misbehaves: encoder/ISP fps caps, frame drops, slow
ddto flash, etc. It separates "the CPU pipeline is the bottleneck" from "the DDR pipeline is the bottleneck" in a few seconds.Concrete example from the field
Two boards, same
goke,gk7205v300per device-tree, same OTP at0x12020084, same chip ID registers, same DDR-clock select bit (CRG0x12010080= 0x549 → 450 MHz on both):That ratio matched the wire-fps gap (~20%) we were seeing, which immediately pointed the investigation at the CPU/DDR PLL multiplier — set by mask ROM based on per-die HPM characterization (
HPM_CHECK_REGat0x1202015cdiffered: 0xF3 vs 0xF6 → mask ROM picks CPU PLL FBDIV=0x77 vs 0x8F → 952 MHz vs 1144 MHz).Without a bundled bandwidth test, this was hours of cross-compile / scp / measure / cross-compile loop.
Proposed subcommand
Sample run:
Reference implementation
Self-contained, static, libc-portable. Uses
mmapof/dev/zerorather thanmallocso the buffers come from anonymous DDR pages rather than tmpfs/page cache. Theread scanloop usesvolatile uint32_t sumto force actual loads (compiler can't elide).// see attached membw.c in the issue (~70 lines)Notes / caveats worth documenting in the tool:
read scanloop is libc-independent and is the most trustworthy across builds.Where this fits in ipctool
Currently
ipctoolis "the read-only hardware probe" — chip ID, sensor, MMZ layout, etc.membwfits the same shape: read-only, fast, prints a one-shot diagnostic. Doesn't write anything, doesn't need privileges beyond root (which ipctool already has).The size measurement should be small enough that running on a live camera with majestic up doesn't cause frame drops (8 MB × 8 iters takes <2 s and consumes ~1 GB-second of bandwidth).
Related work
Companion request being filed in OpenIPC/defib to run the same test in U-Boot/bare-metal context — useful for proving "DDR is fine on bare metal, Linux is the problem" or vice versa.
Self-contained C source attached.