feat: teach ALPArray to store validity only in the encoded array & other minor changes#2053
feat: teach ALPArray to store validity only in the encoded array & other minor changes#2053danking wants to merge 15 commits into
Conversation
|
previously: #1951 |
Benchmarks: random_accessTable of Results
|
Benchmarks: datafusionTable of Results
|
Benchmarks: ClickbenchTable of Results
|
| let (encoded, exceptional_positions) = T::chunked_encode(values.as_slice::<T>(), exponents); | ||
|
|
||
| let encoded_array = PrimitiveArray::new(encoded, values.validity()).into_array(); | ||
| let exceptional_positions = match values.logical_validity() { |
There was a problem hiding this comment.
could we add like 1-2 comments in this section just to make it clear what is going on
There was a problem hiding this comment.
LMK what you think, I added a comment above this line.
b5b44e0 to
ca17b75
Compare
This comment was marked as outdated.
This comment was marked as outdated.
Benchmarks: TPC-HTable of Results
|
|
@danking you should make sure you run with RUSTFLAGS='-C target-cpu=native' we had to axe it from our repo cargo config since it messed up cross compilation |
This comment was marked as outdated.
This comment was marked as outdated.
Benchmarks: compressTable of Results
|
The patches are now always non-nullable. This required PrimitiveArray::patch to gracefully handle non-nullable patches when the array is nullable. I modified the benchmarks to include patch manipulation time, but notice that the test data has no patches. The benchmarks measure the overhead of `is_valid`. If we had test data where the invalid positions contained exceptional values, I would expect a modest improvement in both decompression and compression time.
0351f19 to
e5709ef
Compare
| if let Some(fill_value) = | ||
| Self::first_non_patched_encoded_value(&encoded, &patch_indices) |
There was a problem hiding this comment.
just use unwrap_or_default() to fallback to zero
There was a problem hiding this comment.
the original code also does this: https://github.com/cwida/ALP/blob/main/include/alp/encoder.hpp#L379-L385
| encoded_output.extend(chunk.iter().map(|v| { | ||
| let encoded = unsafe { T::encode_single_unchecked(*v, exp) }; | ||
| let decoded = T::decode_single(encoded, exp); | ||
| let neq = (decoded != *v) as usize; | ||
| chunk_patch_count += neq; | ||
| encoded | ||
| })); |
There was a problem hiding this comment.
This loop structure is actually pretty important for performance.
You can refer to the original code where this is actually separated out into two loops:
Loop 1: calculate and materialize encoded + decoded vectors
Loop 2: find the exceptions by looping through the zipped vectors to find mismatches
https://github.com/cwida/ALP/blob/main/include/alp/encoder.hpp#L338-L346
https://github.com/cwida/ALP/blob/main/include/alp/encoder.hpp#L370-L376
I'm not sure i personally find the new version on the right more readable, but I want to make sure we keep a similar structure that the branch predictor likes
e0e2528 to
c2580f0
Compare
|
Moved here: #2216 |
The original change of this PR trims invalid values from the patches and makes the patches validity either AllValid (for nullable arrays) or NonNullable.
Benchmarks on latest commit:
parameter is: (number of elements, fraction patched, fraction valid).
Any ratio greater than 1.1 or less than 0.9 has a
***Benchmarks before reverting to develop's chunking code
Details
[1] Seems like this PR is about the same except for compressing really large f64 arrays. The PR that introduced chunking, #924, reported substantially larger reductions (~5ms of 29ms) in time than this increase of ~1ms (of 17ms).