Skip to content

perf: branchless boolean zip kernel#8275

Merged
joseph-isaacs merged 3 commits into
developfrom
claude/bool-branchless-zip
Jun 9, 2026
Merged

perf: branchless boolean zip kernel#8275
joseph-isaacs merged 3 commits into
developfrom
claude/bool-branchless-zip

Conversation

@joseph-isaacs

@joseph-isaacs joseph-isaacs commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds a dedicated, branchless ZipKernel for Bool.

Add a dedicated `ZipKernel for Bool` that blends the two value bitmaps with the
mask in a single bitwise pass -- `(true & mask) | (false & !mask)` -- instead of
the generic per-run builder, so boolean zips are branch-free and mask-shape
independent.

Also add a shared `zip_validity` helper to the zip module that builds the result
validity as a (lazy) zip over the two boolean validity bitmaps, reusing this
kernel. This gives the per-encoding zip kernels one shared validity-selection
path; the recursion terminates immediately because validity bitmaps are
non-nullable.

Adds a small `bool_zip` divan benchmark (nonnull ~7us, nullable ~14us).

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
@joseph-isaacs joseph-isaacs requested a review from a team June 5, 2026 16:37
@joseph-isaacs joseph-isaacs added the changelog/performance A performance improvement label Jun 5, 2026 — with Claude
The doc comment on the public ZipKernel impl linked to the pub(crate)
zip_validity, which -D rustdoc::private-intra-doc-links rejects. Use a plain
code span instead.

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
@codspeed-hq

codspeed-hq Bot commented Jun 5, 2026

Copy link
Copy Markdown

Merging this PR will not alter performance

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚡ 4 improved benchmarks
❌ 1 regressed benchmark
✅ 1512 untouched benchmarks
🆕 2 new benchmarks

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation chunked_varbinview_canonical_into[(100, 100)] 274.5 µs 309.3 µs -11.27%
Simulation chunked_varbinview_canonical_into[(1000, 10)] 198.1 µs 161.6 µs +22.57%
Simulation encode_varbin[(1000, 8)] 165.2 µs 145.3 µs +13.69%
Simulation encode_varbin[(1000, 4)] 164.1 µs 144.6 µs +13.48%
Simulation encode_varbin[(1000, 32)] 170.4 µs 150.6 µs +13.16%
🆕 Simulation nonnull N/A 54.5 µs N/A
🆕 Simulation nullable N/A 93.7 µs N/A

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.


Comparing claude/bool-branchless-zip (86f2413) with develop (42640f5)

Open in CodSpeed

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
@joseph-isaacs joseph-isaacs enabled auto-merge (squash) June 9, 2026 13:14
@joseph-isaacs joseph-isaacs merged commit 9e3ae2f into develop Jun 9, 2026
62 of 64 checks passed
@joseph-isaacs joseph-isaacs deleted the claude/bool-branchless-zip branch June 9, 2026 13:14
joseph-isaacs added a commit that referenced this pull request Jun 12, 2026
## Summary

Adds a dedicated primitive zip kernel that selects values branchlessly
per row.

The generic zip path copies runs of `if_true`/`if_false` between mask
boundaries — fast for clustered masks but degrading to per-element work
on fragmented masks. This kernel walks the mask as 64-bit chunks and
blends both sides per row with no data-dependent branch, so the inner
loop stays branch-free and auto-vectorizable regardless of mask shape.
Result validity reuses the shared `zip_validity` helper, which expresses
validity selection as a (lazy) zip over the two validity bitmaps.

> The branchless boolean zip kernel (#8275) this builds on has now
merged into `develop`; this branch has been rebased on top of it, so the
diff here is primitive-only.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/performance A performance improvement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants