Skip to content

Add DDR bandwidth test subcommand (memset/memcpy/read scan) #160

@widgetii

Description

@widgetii

Motivation

While debugging an encoder fps gap between OpenIPC and vendor firmware running on identical gk7205v300 silicon, I needed a quick way to verify that DDR was actually performing at the same speed on both boards. Building, deploying, and timing a one-off C benchmark for each test ate a lot of session time. A bundled tool would have made this a one-liner.

Memory-throughput is a useful baseline check whenever an SoC misbehaves: encoder/ISP fps caps, frame drops, slow dd to flash, etc. It separates "the CPU pipeline is the bottleneck" from "the DDR pipeline is the bottleneck" in a few seconds.

Concrete example from the field

Two boards, same goke,gk7205v300 per device-tree, same OTP at 0x12020084, same chip ID registers, same DDR-clock select bit (CRG 0x12010080 = 0x549 → 450 MHz on both):

Op (16 MB buffer, both streamers idle) OpenIPC XM vendor Ratio
memset (write) 1622 MB/s 2264 MB/s +40%
read scan (volatile uint32_t sum) 496 MB/s 558 MB/s +12%
memcpy (R+W) 1547 MB/s 2109 MB/s +36%

That ratio matched the wire-fps gap (~20%) we were seeing, which immediately pointed the investigation at the CPU/DDR PLL multiplier — set by mask ROM based on per-die HPM characterization (HPM_CHECK_REG at 0x1202015c differed: 0xF3 vs 0xF6 → mask ROM picks CPU PLL FBDIV=0x77 vs 0x8F → 952 MHz vs 1144 MHz).

Without a bundled bandwidth test, this was hours of cross-compile / scp / measure / cross-compile loop.

Proposed subcommand

ipctool membw [options]
  --size MB         buffer size per pass (default: 16; must exceed L2 cache)
  --iters N         number of passes per op (default: 16)
  --ops set,...     comma list of ops to run: write,read,copy (default: all)
  --json            machine-readable output

Sample run:

$ ipctool membw
Chip: gk7205v300  DDR clock: 450 MHz
Buffer size: 16 MB, iters: 16
memset (write):  1622 MB/s  (0.090 s)
read scan:       496 MB/s   (0.288 s)
memcpy (R+W):    1547 MB/s  (0.220 s)

Reference implementation

Self-contained, static, libc-portable. Uses mmap of /dev/zero rather than malloc so the buffers come from anonymous DDR pages rather than tmpfs/page cache. The read scan loop uses volatile uint32_t sum to force actual loads (compiler can't elide).

// see attached membw.c in the issue (~70 lines)

Notes / caveats worth documenting in the tool:

  • Buffer size must exceed L2 (typically 256K-1MB on V4 family) or you measure L2, not DDR.
  • Streamer activity loads DDR via DMA. Stop majestic / vendor App before measuring DDR config; run with streamer when comparing real workload bandwidth.
  • libc memset/memcpy implementations differ (musl vs uClibc vs glibc); the read scan loop is libc-independent and is the most trustworthy across builds.
  • On very dark scenes, AE-extended exposure dominates rate; measure under known lighting if you correlate with stream fps.

Where this fits in ipctool

Currently ipctool is "the read-only hardware probe" — chip ID, sensor, MMZ layout, etc. membw fits the same shape: read-only, fast, prints a one-shot diagnostic. Doesn't write anything, doesn't need privileges beyond root (which ipctool already has).

The size measurement should be small enough that running on a live camera with majestic up doesn't cause frame drops (8 MB × 8 iters takes <2 s and consumes ~1 GB-second of bandwidth).

Related work

Companion request being filed in OpenIPC/defib to run the same test in U-Boot/bare-metal context — useful for proving "DDR is fine on bare metal, Linux is the problem" or vice versa.

Self-contained C source attached.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions