Add Jmh benchmarks along with corresponding rust benchmarks by robert3005 · Pull Request #8597 · vortex-data/vortex

robert3005 · 2026-06-25T18:21:25Z

Add Jmh benchmarks for java bindings with corresponding rust version of those
benchmarks

New `vortex-jni-bench` module (JMH) that stresses the vortex-jni read boundary — JNI plus the Arrow C Data Interface — which is the path an Iceberg FormatModel takes to read Vortex from the JVM. Three query shapes (full scan, projection, selective filter) over a synthetic six-column table, consumed column-at-a-time so the numbers reflect format/boundary cost rather than per-row JVM allocation. Includes a batch-granularity diagnostic (Vortex coalesces to ~64K-row read batches regardless of write chunk) and a README with run instructions. Must run against a --release native lib (VORTEX_SKIP_MAKE_TEST_FILES=true to preserve it). v2 TODO: a native Rust criterion read of the same file as a floor, to quote boundary overhead vs native. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: mprammer <martin@spiraldb.com>

@setup

…ix units, add guards A codex-run gauntlet (fresh/correctness/maint) flagged the first cut as overclaiming relative to what it measured. This commit fixes that: - Isolate native pushdown from JVM-side work: add projectionControl (full scan, consume id,y in Java) and filterControl (full scan, filter cat='alpha' in Java). The pushdown speedup is now projection-vs-projectionControl (~4.1x) and selectiveFilter-vs-filterControl (~4.6x), not the confounded ~6x-vs-fullScan. (M2) - fullScan now consumes all six columns at the buffer level (z, cat, tag added), so the "all-six-column scan" number is honest (~40M rows/s). (M1) - @OperationsPerInvocation(ROWS) so JMH reports input rows/s directly, not scans/s. (M3) - @setup validates the file before measuring: exact row count, cat='alpha' returns ROWS/|CATS|, projection schema is exactly [id,y] — fast garbage can't be cited. (M5) - Gradle guard fails the jmh task unless VORTEX_SKIP_MAKE_TEST_FILES=true, so a plain run can't silently rebuild + measure the debug lib. (M6) - @threads(1); tag carries a 10% null rate; README documents the synthetic-data caveats. Read path returns string columns as VarCharVector (Utf8), not ViewVarCharVector — matches the existing TestMinimal read path. Native floor for a boundary-overhead % remains the v2 TODO (M4). Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: mprammer <martin@spiraldb.com>

Signed-off-by: Robert Kruszewski <github@robertk.io>

codspeed-hq · 2026-06-25T18:32:13Z

Merging this PR will not alter performance

⚠️

Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚡ 5 improved benchmarks
❌ 3 regressed benchmarks
✅ 1581 untouched benchmarks
⏩ 4 skipped benchmarks¹

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

	Mode	Benchmark	`BASE`	`HEAD`	Efficiency
❌	Simulation	`chunked_bool_canonical_into[(1000, 10)]`	15.9 µs	26.7 µs	-40.29%
❌	Simulation	`chunked_varbinview_into_canonical[(1000, 10)]`	169.1 µs	205.8 µs	-17.83%
❌	Simulation	`slice_empty_vortex`	310 ns	368.3 ns	-15.84%
⚡	Simulation	`bitwise_not_vortex_buffer_mut[128]`	273.6 ns	215.3 ns	+27.1%
⚡	Simulation	`bitwise_not_vortex_buffer_mut[1024]`	333.9 ns	275.6 ns	+21.17%
⚡	Simulation	`bitwise_not_vortex_buffer_mut[2048]`	427.8 ns	369.4 ns	+15.79%
⚡	Simulation	`chunked_varbinview_canonical_into[(100, 100)]`	259.6 µs	224.5 µs	+15.65%
⚡	Simulation	`chunked_varbinview_into_canonical[(100, 100)]`	306.8 µs	271.9 µs	+12.84%

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.

_{Comparing mp/jni-bench (60deb99) with develop (bdbf6c4)}

4 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

mprammer and others added 3 commits June 25, 2026 17:31

more

98615fa

Signed-off-by: Robert Kruszewski <github@robertk.io>

robert3005 requested a review from a team June 25, 2026 18:21

robert3005 added the changelog/chore A trivial change label Jun 25, 2026

less

60deb99

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Jmh benchmarks along with corresponding rust benchmarks#8597

Add Jmh benchmarks along with corresponding rust benchmarks#8597
robert3005 wants to merge 4 commits into
developfrom
mp/jni-bench

robert3005 commented Jun 25, 2026

Uh oh!

codspeed-hq Bot commented Jun 25, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

robert3005 commented Jun 25, 2026

Uh oh!

codspeed-hq Bot commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will not alter performance

Performance Changes

Footnotes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codspeed-hq Bot commented Jun 25, 2026 •

edited

Loading