Motivation
When two boards of the same chip variant behave differently (encoder fps, sensor capture rate, throughput), the first useful diagnostic question is "are they running at the same CPU / DDR clock?". Today the answer requires manual devmem + per-platform PLL formula decoding. A bundled tool would surface this in one command.
Importantly: on V4 family Hisilicon/Goke (gk7205v200/v300, hi3516ev200, etc.) the mask ROM sets PLL multipliers at boot based on per-die HPM (Hardware Performance Monitor) characterization — different physical chips, even from the same batch, can end up at different runtime clocks. Today there's no easy way to spot this.
Concrete case from the field
Two goke,gk7205v300 boards, same OTP 0x12020084 = 0x02115888, same chip ID at 0x12020088/8c/90, same hardware — but:
| Register |
Board A (OpenIPC) |
Board B (vendor) |
Decoded |
0x12010080 (CRG32 / DDR_CKSEL) |
0x00000549 |
0x00000549 |
DDR @ 450 MHz (both) |
0x12010014 (CPU PLL FBDIV) |
0x01770000 |
0x018F0000 |
CPU = 952 vs 1144 MHz |
0x1201000c (peripheral PLL) |
0x018F0000 |
0x01970000 |
peri = 1144 vs 1208 MHz |
0x1202015c (HPM_CHECK_REG) |
0x00F30000 |
0x00F60000 |
HPM = 243 vs 246 |
0x120280d8 (HPM_CORE_REG0) |
0x81080102 |
0x80AF00AE |
per-die monitor reading |
Same batch, different per-die HPM → mask ROM picks different PLL multiplier → 20% CPU clock gap → 36% memcpy throughput gap → wire fps difference on encoder.
Until I dumped these registers and decoded them manually, the issue looked like a software / configuration problem. It wasn't — it was silicon-binning, invisible without this view.
Proposed subcommand
ipctool clocks (or: ipctool freq)
Output (V4 / gk7205v300 example):
Chip: gk7205v300
OTP_CPU_CLK: 0x02115888 (mux_chn=0)
HPM: sys_hpm_core=243 (range 190-310, [bin: low])
core0=0x102 core1=0x102
PLL frequencies (derived from CRG):
CPU PLL 1144 MHz (CRG[0x14]: FBDIV=0x77)
Peripheral PLL 1144 MHz (CRG[0x0c]: FBDIV=0x8F)
DDR cksel 450 MHz (CRG[0x80] bits[3:5]=001)
CPU running: 952 MHz (cpufreq governor 'performance')
DDR controller @ 0x120d0000:
PLL frequency 450 MHz
Data rate 1800 Mbps (DDR3-1800 equiv)
The bin level (low/high/...) should reflect the documented HPM thresholds the mask ROM uses to select PLL multiplier — those are platform-specific. For V4 family the thresholds are visible in u-boot source (u-boot-gk7205v200/arch/arm/cpu/armv7/gk7205v300/lowlevel_init_v300.c — HPM_CORE_VALUE_MIN/MAX, HPM_CORE_MIN/MAX).
A --json variant should be added so this is easy to consume from scripts (e.g., comparing fleet of boards).
Implementation notes
The register addresses + bit decode are well-defined per-SoC-family — V4 family alone is ~5 chips with the same CRG layout. Suggested structure:
struct clock_info {
const char *name;
uint32_t reg; // physical address
int fbdiv_shift; // bit offset of FBDIV in the reg
int fbdiv_mask; // width
int refdiv; // typically 1
int postdiv; // typically 3 on V4 family
int input_mhz; // 24 MHz crystal
};
Then a per-platform table maps each PLL/clock domain.
Where helpful, also report:
- HPM bin classification (low/medium/high) per documented thresholds
- Voltage / SVB state (
SYS_CTRL_VOLT_REG etc.)
- A short note when the chip is on the low bin so users understand why their identical-spec board underperforms.
Why this matters operationally
- Diagnosing fps gaps: confirms hardware-vs-software root cause in seconds.
- Fleet uniformity check: identify which units in a batch landed in the lower bin (potentially useful for QA / RMA decisions on dev boards).
- Cross-chip comparison: lets people quickly answer "is the cv500 actually running 700 MHz like the datasheet says, or is mask ROM downclocking it?"
- Bootloader sanity: confirms u-boot / mask ROM brought everything up at the expected speeds; a stuck PLL / failed training shows up here.
Related
Companion request: OpenIPC/ipctool#160 — DDR bandwidth benchmarking subcommand. The bandwidth tool measures the result; this tool explains the cause when results differ.
For V4 family decode, the formulas are:
f_pll = INPUT_HZ × FBDIV / (REFDIV × POSTDIV1 × POSTDIV2)
- on gk7205v300:
INPUT=24 MHz, REFDIV × POSTDIV = 3 typically
- DDR cksel field (CRG
0x80 bits [3:5]):
- 0b000 → 24 MHz, 0b001 → 450 MHz, 0b011 → 300 MHz, 0b100 → 297 MHz
Per-family table needed for other Hisilicon V1-V5 / Goke variants; the V4 case is the well-documented one.
Motivation
When two boards of the same chip variant behave differently (encoder fps, sensor capture rate, throughput), the first useful diagnostic question is "are they running at the same CPU / DDR clock?". Today the answer requires manual
devmem+ per-platform PLL formula decoding. A bundled tool would surface this in one command.Importantly: on V4 family Hisilicon/Goke (gk7205v200/v300, hi3516ev200, etc.) the mask ROM sets PLL multipliers at boot based on per-die HPM (Hardware Performance Monitor) characterization — different physical chips, even from the same batch, can end up at different runtime clocks. Today there's no easy way to spot this.
Concrete case from the field
Two
goke,gk7205v300boards, same OTP0x12020084 = 0x02115888, same chip ID at0x12020088/8c/90, same hardware — but:0x12010080(CRG32 / DDR_CKSEL)0x12010014(CPU PLL FBDIV)0x1201000c(peripheral PLL)0x1202015c(HPM_CHECK_REG)0x120280d8(HPM_CORE_REG0)Same batch, different per-die HPM → mask ROM picks different PLL multiplier → 20% CPU clock gap → 36% memcpy throughput gap → wire fps difference on encoder.
Until I dumped these registers and decoded them manually, the issue looked like a software / configuration problem. It wasn't — it was silicon-binning, invisible without this view.
Proposed subcommand
Output (V4 / gk7205v300 example):
The bin level (low/high/...) should reflect the documented HPM thresholds the mask ROM uses to select PLL multiplier — those are platform-specific. For V4 family the thresholds are visible in u-boot source (u-boot-gk7205v200/arch/arm/cpu/armv7/gk7205v300/lowlevel_init_v300.c —
HPM_CORE_VALUE_MIN/MAX,HPM_CORE_MIN/MAX).A
--jsonvariant should be added so this is easy to consume from scripts (e.g., comparing fleet of boards).Implementation notes
The register addresses + bit decode are well-defined per-SoC-family — V4 family alone is ~5 chips with the same CRG layout. Suggested structure:
Then a per-platform table maps each PLL/clock domain.
Where helpful, also report:
SYS_CTRL_VOLT_REGetc.)Why this matters operationally
Related
Companion request: OpenIPC/ipctool#160 — DDR bandwidth benchmarking subcommand. The bandwidth tool measures the result; this tool explains the cause when results differ.
For V4 family decode, the formulas are:
f_pll = INPUT_HZ × FBDIV / (REFDIV × POSTDIV1 × POSTDIV2)INPUT=24 MHz,REFDIV × POSTDIV = 3typically0x80bits [3:5]):Per-family table needed for other Hisilicon V1-V5 / Goke variants; the V4 case is the well-documented one.