Add Jmh benchmarks along with corresponding rust benchmarks#8597
Add Jmh benchmarks along with corresponding rust benchmarks#8597robert3005 wants to merge 4 commits into
Conversation
New `vortex-jni-bench` module (JMH) that stresses the vortex-jni read boundary — JNI plus the Arrow C Data Interface — which is the path an Iceberg FormatModel takes to read Vortex from the JVM. Three query shapes (full scan, projection, selective filter) over a synthetic six-column table, consumed column-at-a-time so the numbers reflect format/boundary cost rather than per-row JVM allocation. Includes a batch-granularity diagnostic (Vortex coalesces to ~64K-row read batches regardless of write chunk) and a README with run instructions. Must run against a --release native lib (VORTEX_SKIP_MAKE_TEST_FILES=true to preserve it). v2 TODO: a native Rust criterion read of the same file as a floor, to quote boundary overhead vs native. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: mprammer <martin@spiraldb.com>
…ix units, add guards A codex-run gauntlet (fresh/correctness/maint) flagged the first cut as overclaiming relative to what it measured. This commit fixes that: - Isolate native pushdown from JVM-side work: add projectionControl (full scan, consume id,y in Java) and filterControl (full scan, filter cat='alpha' in Java). The pushdown speedup is now projection-vs-projectionControl (~4.1x) and selectiveFilter-vs-filterControl (~4.6x), not the confounded ~6x-vs-fullScan. (M2) - fullScan now consumes all six columns at the buffer level (z, cat, tag added), so the "all-six-column scan" number is honest (~40M rows/s). (M1) - @OperationsPerInvocation(ROWS) so JMH reports input rows/s directly, not scans/s. (M3) - @setup validates the file before measuring: exact row count, cat='alpha' returns ROWS/|CATS|, projection schema is exactly [id,y] — fast garbage can't be cited. (M5) - Gradle guard fails the jmh task unless VORTEX_SKIP_MAKE_TEST_FILES=true, so a plain run can't silently rebuild + measure the debug lib. (M6) - @threads(1); tag carries a 10% null rate; README documents the synthetic-data caveats. Read path returns string columns as VarCharVector (Utf8), not ViewVarCharVector — matches the existing TestMinimal read path. Native floor for a boundary-overhead % remains the v2 TODO (M4). Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: mprammer <martin@spiraldb.com>
Merging this PR will not alter performance
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ❌ | Simulation | chunked_bool_canonical_into[(1000, 10)] |
15.9 µs | 26.7 µs | -40.29% |
| ❌ | Simulation | chunked_varbinview_into_canonical[(1000, 10)] |
169.1 µs | 205.8 µs | -17.83% |
| ❌ | Simulation | slice_empty_vortex |
310 ns | 368.3 ns | -15.84% |
| ⚡ | Simulation | bitwise_not_vortex_buffer_mut[128] |
273.6 ns | 215.3 ns | +27.1% |
| ⚡ | Simulation | bitwise_not_vortex_buffer_mut[1024] |
333.9 ns | 275.6 ns | +21.17% |
| ⚡ | Simulation | bitwise_not_vortex_buffer_mut[2048] |
427.8 ns | 369.4 ns | +15.79% |
| ⚡ | Simulation | chunked_varbinview_canonical_into[(100, 100)] |
259.6 µs | 224.5 µs | +15.65% |
| ⚡ | Simulation | chunked_varbinview_into_canonical[(100, 100)] |
306.8 µs | 271.9 µs | +12.84% |
Tip
Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.
Comparing mp/jni-bench (60deb99) with develop (bdbf6c4)
Footnotes
-
4 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
Add Jmh benchmarks for java bindings with corresponding rust version of those
benchmarks