Build CUDA benchmarks once, but run in parallel#8489
Merged
CodSpeed HQ / CodSpeed Performance Analysis
failed
Jun 18, 2026 in 0s
Performance Regression: -14.25%
⚠️ Unknown Walltime execution environment detected
Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.
For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.
⚠️ Different runtime environments detected
Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.
⚡ 1 improved benchmark
❌ 4 regressed benchmarks
✅ 1576 untouched benchmarks
Warning
Please fix the performance issues or acknowledge them on CodSpeed.
Performance Changes
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ❌ | Simulation | take_10k_random |
197.9 µs | 255.8 µs | -22.63% |
| ❌ | Simulation | take_10k_contiguous |
218.5 µs | 276.4 µs | -20.94% |
| ❌ | Simulation | patched_take_10k_contiguous_patches |
232.2 µs | 291 µs | -20.18% |
| ❌ | Simulation | patched_take_10k_random |
244.2 µs | 303 µs | -19.41% |
| ⚡ | WallTime | cuda/bitpacked_u8/unpack/3bw[100M] |
352.6 µs | 299.3 µs | +17.8% |
Tip
Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.
Comparing adamg/unifrom-codspeed-gpu-build (9ed22a8) with develop (d020924)
Loading