faster ALP encode#924
Merged
Merged
Conversation
AdamGS
reviewed
Sep 25, 2024
a10y
reviewed
Sep 25, 2024
lwwmanning
commented
Sep 25, 2024
| } | ||
|
|
||
| // if there are no patches, we are done | ||
| if chunk_patch_count == 0 { |
Contributor
Author
There was a problem hiding this comment.
Need to handle the edge case of 2 chunks where chunk 0 is all patches, chunk 1 has 0 patches... which won't fill
lwwmanning
added a commit
that referenced
this pull request
Sep 26, 2024
Realized that there's an unhandled edge case in #924, [commented here](https://github.com/spiraldb/vortex/pull/924/files#r1776099681) Essentially, on develop, if we have two chunks and the first chunk is all patches and the second chunk has 0 patches, then the patched values won't get filled in the encoded array. Not the end of the world (they're presumably full of integer approximations that don't round-trip), but if it's a case of outlier large values that are getting patched, then the encoded values will end up bitpacking poorly. This PR fixes that.
danking
added a commit
that referenced
this pull request
Feb 3, 2025
This PR trims invalid values from the patches and makes the patches validity either AllValid (for nullable arrays) or NonNullable. This microbenchmark doesn't reveal any clear improvements or degradations. It seems to me mostly noise. In theory, this change should make decompression a bit faster because validity is one place, but my primary goal here is to make ALP array simpler: validity is in one place, the encoded array. ### Benchmarks on latest commit: - PR: 7fb595b - develop: 0a18498 parameter is: (number of elements, fraction patched, fraction valid). Any ratio greater than 1.1 or less than 0.9 has a ` ***` ``` alp_compress │ PR median │ develop median │ ratio ├─ compress_alp │ │ │ │ ├─ f32 │ │ │ │ │ ├─ (100000, 0.0, 0.25) │ 160.4 µs │ 159.6 µs │ 1.0050 │ │ ├─ (100000, 0.0, 0.95) │ 145.9 µs │ 143.8 µs │ 1.0146 │ │ ├─ (100000, 0.0, 1.0) │ 137.0 µs │ 135.5 µs │ 1.0110 │ │ ├─ (100000, 0.01, 0.25) │ 227.7 µs │ 230.7 µs │ 0.9869 │ │ ├─ (100000, 0.01, 0.95) │ 227.9 µs │ 227.2 µs │ 1.0030 │ │ ├─ (100000, 0.01, 1.0) │ 226.6 µs │ 227.5 µs │ 0.9960 │ │ ├─ (100000, 0.1, 0.25) │ 238.3 µs │ 248.9 µs │ 0.9574 │ │ ├─ (100000, 0.1, 0.95) │ 238.2 µs │ 269.8 µs │ 0.8828 *** │ │ ├─ (100000, 0.1, 1.0) │ 230.6 µs │ 231.9 µs │ 0.9943 │ │ ├─ (10000000, 0.0, 0.25) │ 14.17 ms │ 13.77 ms │ 1.0290 │ │ ├─ (10000000, 0.0, 0.95) │ 14.16 ms │ 13.8 ms │ 1.0260 │ │ ├─ (10000000, 0.0, 1.0) │ 14.0 ms │ 12.47 ms │ 1.1226 *** │ │ ├─ (10000000, 0.01, 0.25) │ 22.29 ms │ 23.13 ms │ 0.9636 │ │ ├─ (10000000, 0.01, 0.95) │ 22.26 ms │ 23.78 ms │ 0.9360 │ │ ├─ (10000000, 0.01, 1.0) │ 22.19 ms │ 21.79 ms │ 1.0183 │ │ ├─ (10000000, 0.1, 0.25) │ 23.31 ms │ 27.72 ms │ 0.8409 *** │ │ ├─ (10000000, 0.1, 0.95) │ 23.4 ms │ 27.47 ms │ 0.8518 *** │ │ ╰─ (10000000, 0.1, 1.0) │ 22.99 ms │ 22.31 ms │ 1.0304 │ ╰─ f64 │ │ │ │ ├─ (100000, 0.0, 0.25) │ 165.2 µs │ 165.4 µs │ 0.9987 │ ├─ (100000, 0.0, 0.95) │ 166.1 µs │ 163.4 µs │ 1.0165 │ ├─ (100000, 0.0, 1.0) │ 164.7 µs │ 179.9 µs │ 0.9155 │ ├─ (100000, 0.01, 0.25) │ 269.7 µs │ 259.1 µs │ 1.0409 │ ├─ (100000, 0.01, 0.95) │ 270.5 µs │ 259.6 µs │ 1.0419 │ ├─ (100000, 0.01, 1.0) │ 268.9 µs │ 270.6 µs │ 0.9937 │ ├─ (100000, 0.1, 0.25) │ 281.7 µs │ 281.3 µs │ 1.0014 │ ├─ (100000, 0.1, 0.95) │ 279.1 µs │ 315.3 µs │ 0.8851 *** │ ├─ (100000, 0.1, 1.0) │ 273.0 µs │ 275.7 µs │ 0.9902 │ ├─ (10000000, 0.0, 0.25) │ 16.16 ms │ 15.86 ms │ 1.0189 │ ├─ (10000000, 0.0, 0.95) │ 16.19 ms │ 15.75 ms │ 1.0279 │ ├─ (10000000, 0.0, 1.0) │ 16.2 ms │ 15.83 ms │ 1.0233 │ ├─ (10000000, 0.01, 0.25) │ 25.29 ms │ 25.77 ms │ 0.9813 │ ├─ (10000000, 0.01, 0.95) │ 25.74 ms │ 25.94 ms │ 0.9922 │ ├─ (10000000, 0.01, 1.0) │ 25.54 ms │ 25.32 ms │ 1.0086 │ ├─ (10000000, 0.1, 0.25) │ 26.89 ms │ 30.73 ms │ 0.8750 *** │ ├─ (10000000, 0.1, 0.95) │ 27.05 ms │ 30.53 ms │ 0.8860 *** │ ╰─ (10000000, 0.1, 1.0) │ 26.22 ms │ 25.98 ms │ 1.0092 ├─ decompress_alp │ │ │ │ ├─ f32 │ │ │ │ │ ├─ (100000, 0.0, 0.25) │ 12.24 µs │ 12.33 µs │ 0.9927 │ │ ├─ (100000, 0.0, 0.95) │ 12.24 µs │ 12.16 µs │ 1.0065 │ │ ├─ (100000, 0.0, 1.0) │ 12.2 µs │ 12.16 µs │ 1.0032 │ │ ├─ (100000, 0.01, 0.25) │ 15.12 µs │ 14.04 µs │ 1.0769 │ │ ├─ (100000, 0.01, 0.95) │ 14.95 µs │ 14.81 µs │ 1.0094 │ │ ├─ (100000, 0.01, 1.0) │ 13.43 µs │ 13.24 µs │ 1.0143 │ │ ├─ (100000, 0.1, 0.25) │ 26.08 µs │ 17.41 µs │ 1.4979 *** │ │ ├─ (100000, 0.1, 0.95) │ 25.87 µs │ 25.04 µs │ 1.0331 │ │ ├─ (100000, 0.1, 1.0) │ 19.33 µs │ 21.08 µs │ 0.9169 │ │ ├─ (10000000, 0.0, 0.25) │ 2.067 ms │ 2.057 ms │ 1.0048 │ │ ├─ (10000000, 0.0, 0.95) │ 2.068 ms │ 2.055 ms │ 1.0063 │ │ ├─ (10000000, 0.0, 1.0) │ 2.07 ms │ 1.261 ms │ 1.6415 *** │ │ ├─ (10000000, 0.01, 0.25) │ 1.51 ms │ 2.113 ms │ 0.7146 *** │ │ ├─ (10000000, 0.01, 0.95) │ 1.477 ms │ 2.621 ms │ 0.5635 *** │ │ ├─ (10000000, 0.01, 1.0) │ 1.35 ms │ 1.346 ms │ 1.0029 │ │ ├─ (10000000, 0.1, 0.25) │ 3.765 ms │ 2.58 ms │ 1.4593 *** │ │ ├─ (10000000, 0.1, 0.95) │ 2.784 ms │ 3.28 ms │ 0.8487 *** │ │ ╰─ (10000000, 0.1, 1.0) │ 1.764 ms │ 1.754 ms │ 1.0057 │ ╰─ f64 │ │ │ │ ├─ (100000, 0.0, 0.25) │ 23.33 µs │ 23.45 µs │ 0.9948 │ ├─ (100000, 0.0, 0.95) │ 23.41 µs │ 23.33 µs │ 1.0034 │ ├─ (100000, 0.0, 1.0) │ 23.33 µs │ 23.49 µs │ 0.9931 │ ├─ (100000, 0.01, 0.25) │ 25.58 µs │ 24.66 µs │ 1.0373 │ ├─ (100000, 0.01, 0.95) │ 25.58 µs │ 25.79 µs │ 0.9918 │ ├─ (100000, 0.01, 1.0) │ 24.2 µs │ 24.62 µs │ 0.9829 │ ├─ (100000, 0.1, 0.25) │ 39.83 µs │ 27.87 µs │ 1.4291 *** │ ├─ (100000, 0.1, 0.95) │ 39.7 µs │ 39.56 µs │ 1.0035 │ ├─ (100000, 0.1, 1.0) │ 34.43 µs │ 31.66 µs │ 1.0874 │ ├─ (10000000, 0.0, 0.25) │ 4.246 ms │ 4.239 ms │ 1.0016 │ ├─ (10000000, 0.0, 0.95) │ 4.227 ms │ 4.292 ms │ 0.9848 │ ├─ (10000000, 0.0, 1.0) │ 4.227 ms │ 4.246 ms │ 0.9955 │ ├─ (10000000, 0.01, 0.25) │ 4.696 ms │ 4.356 ms │ 1.0780 │ ├─ (10000000, 0.01, 0.95) │ 4.933 ms │ 4.637 ms │ 1.0638 │ ├─ (10000000, 0.01, 1.0) │ 4.538 ms │ 4.545 ms │ 0.9984 │ ├─ (10000000, 0.1, 0.25) │ 7.23 ms │ 5.304 ms │ 1.3631 *** │ ├─ (10000000, 0.1, 0.95) │ 6.227 ms │ 5.913 ms │ 1.0531 │ ╰─ (10000000, 0.1, 1.0) │ 5.207 ms │ 5.29 ms │ 0.9843 ``` ### Benchmarks before reverting to develop's chunking code <details> [1] Seems like this PR is about the same except for compressing really large f64 arrays. The PR that introduced chunking, #924, reported substantially larger reductions (~5ms of 29ms) in time than this increase of ~1ms (of 17ms). ``` alp_compress │ PR median │ PR mean │ develop median │ develop mean │ ├─ compress_alp │ │ │ │ │ │ ├─ f32 │ │ │ │ │ │ │ ├─ (100000, 0.25) │ 136.4 µs │ 137.9 µs │ 143 µs │ 145.9 µs │ │ │ ├─ (100000, 0.95) │ 136.3 µs │ 137.1 µs │ 133.1 µs │ 134.3 µs │ │ │ ├─ (100000, 1.0) │ 136 µs │ 137.3 µs │ 133.6 µs │ 134.6 µs │ │ │ ├─ (10000000, 0.25) │ 13.54 ms │ 13.67 ms │ 13.74 ms │ 13.84 ms │ │ │ ├─ (10000000, 0.95) │ 13.54 ms │ 13.64 ms │ 13.49 ms │ 13.59 ms │ │ │ ╰─ (10000000, 1.0) │ 13.47 ms │ 13.57 ms │ 13.58 ms │ 13.73 ms │ │ ╰─ f64 │ │ │ │ │ │ ├─ (100000, 0.25) │ 152.5 µs │ 153.9 µs │ 166.1 µs │ 167.2 µs │ │ ├─ (100000, 0.95) │ 152.5 µs │ 154.3 µs │ 166.4 µs │ 167 µs │ │ ├─ (100000, 1.0) │ 151.5 µs │ 153 µs │ 166.2 µs │ 166.9 µs │ │ ├─ (10000000, 0.25) │ 16.89 ms │ 17 ms │ 15.87 ms │ 15.91 ms │ │ ├─ (10000000, 0.95) │ 16.96 ms │ 17.19 ms │ 16.14 ms │ 16.12 ms │ │ ╰─ (10000000, 1.0) │ 16.93 ms │ 16.99 ms │ 16.15 ms │ 16.18 ms │ ╰─ decompress_alp │ │ │ │ │ ├─ f32 │ │ │ │ │ │ ├─ (100000, 0.25) │ 12.33 µs │ 12.4 µs │ 12.37 µs │ 12.55 µs │ │ ├─ (100000, 0.95) │ 11.99 µs │ 12.01 µs │ 12.45 µs │ 12.58 µs │ │ ├─ (100000, 1.0) │ 11.95 µs │ 11.98 µs │ 11.91 µs │ 11.96 µs │ │ ├─ (10000000, 0.25) │ 1.233 ms │ 1.24 ms │ 2.064 ms │ 2.088 ms │ │ ├─ (10000000, 0.95) │ 1.232 ms │ 1.235 ms │ 2.063 ms │ 2.094 ms │ │ ╰─ (10000000, 1.0) │ 1.233 ms │ 1.236 ms │ 2.061 ms │ 2.088 ms │ ╰─ f64 │ │ │ │ │ ├─ (100000, 0.25) │ 23.29 µs │ 23.46 µs │ 23.33 µs │ 23.4 µs │ ├─ (100000, 0.95) │ 22.87 µs │ 22.92 µs │ 22.99 µs │ 23.06 µs │ ├─ (100000, 1.0) │ 22.87 µs │ 23 µs │ 22.95 µs │ 23 µs │ ├─ (10000000, 0.25) │ 4.254 ms │ 4.393 ms │ 4.239 ms │ 4.28 ms │ ├─ (10000000, 0.95) │ 4.703 ms │ 4.639 ms │ 4.27 ms │ 4.437 ms │ ╰─ (10000000, 1.0) │ 4.479 ms │ 4.58 ms │ 4.684 ms │ 4.618 ms │ ``` </details>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
fixes #920
Consistently cuts encoding time by 10-50%.
Before the change:
After: