Skip to content

feat: teach ALPArray to store validity only in the encoded array#2216

Merged
danking merged 19 commits into
developfrom
dk/alp-validity-in-encoded-only-v2
Feb 3, 2025
Merged

feat: teach ALPArray to store validity only in the encoded array#2216
danking merged 19 commits into
developfrom
dk/alp-validity-in-encoded-only-v2

Conversation

@danking

@danking danking commented Feb 3, 2025

Copy link
Copy Markdown
Contributor

This PR trims invalid values from the patches and makes the patches validity either AllValid (for nullable arrays) or NonNullable.

This microbenchmark doesn't reveal any clear improvements or degradations. It seems to me mostly noise. In theory, this change should make decompression a bit faster because validity is one place, but my primary goal here is to make ALP array simpler: validity is in one place, the encoded array.

Benchmarks on latest commit:

parameter is: (number of elements, fraction patched, fraction valid).

Any ratio greater than 1.1 or less than 0.9 has a ***

alp_compress                    │ PR median     │ develop median │ ratio
├─ compress_alp                 │               │                │
│  ├─ f32                       │               │                │
│  │  ├─ (100000, 0.0, 0.25)    │ 160.4 µs      │ 159.6 µs       │ 1.0050
│  │  ├─ (100000, 0.0, 0.95)    │ 145.9 µs      │ 143.8 µs       │ 1.0146
│  │  ├─ (100000, 0.0, 1.0)     │ 137.0 µs      │ 135.5 µs       │ 1.0110
│  │  ├─ (100000, 0.01, 0.25)   │ 227.7 µs      │ 230.7 µs       │ 0.9869
│  │  ├─ (100000, 0.01, 0.95)   │ 227.9 µs      │ 227.2 µs       │ 1.0030
│  │  ├─ (100000, 0.01, 1.0)    │ 226.6 µs      │ 227.5 µs       │ 0.9960
│  │  ├─ (100000, 0.1, 0.25)    │ 238.3 µs      │ 248.9 µs       │ 0.9574
│  │  ├─ (100000, 0.1, 0.95)    │ 238.2 µs      │ 269.8 µs       │ 0.8828  ***
│  │  ├─ (100000, 0.1, 1.0)     │ 230.6 µs      │ 231.9 µs       │ 0.9943
│  │  ├─ (10000000, 0.0, 0.25)  │ 14.17 ms      │ 13.77 ms       │ 1.0290
│  │  ├─ (10000000, 0.0, 0.95)  │ 14.16 ms      │ 13.8 ms        │ 1.0260
│  │  ├─ (10000000, 0.0, 1.0)   │ 14.0 ms       │ 12.47 ms       │ 1.1226  ***
│  │  ├─ (10000000, 0.01, 0.25) │ 22.29 ms      │ 23.13 ms       │ 0.9636
│  │  ├─ (10000000, 0.01, 0.95) │ 22.26 ms      │ 23.78 ms       │ 0.9360
│  │  ├─ (10000000, 0.01, 1.0)  │ 22.19 ms      │ 21.79 ms       │ 1.0183
│  │  ├─ (10000000, 0.1, 0.25)  │ 23.31 ms      │ 27.72 ms       │ 0.8409  ***
│  │  ├─ (10000000, 0.1, 0.95)  │ 23.4 ms       │ 27.47 ms       │ 0.8518  ***
│  │  ╰─ (10000000, 0.1, 1.0)   │ 22.99 ms      │ 22.31 ms       │ 1.0304
│  ╰─ f64                       │               │                │
│     ├─ (100000, 0.0, 0.25)    │ 165.2 µs      │ 165.4 µs       │ 0.9987
│     ├─ (100000, 0.0, 0.95)    │ 166.1 µs      │ 163.4 µs       │ 1.0165
│     ├─ (100000, 0.0, 1.0)     │ 164.7 µs      │ 179.9 µs       │ 0.9155
│     ├─ (100000, 0.01, 0.25)   │ 269.7 µs      │ 259.1 µs       │ 1.0409
│     ├─ (100000, 0.01, 0.95)   │ 270.5 µs      │ 259.6 µs       │ 1.0419
│     ├─ (100000, 0.01, 1.0)    │ 268.9 µs      │ 270.6 µs       │ 0.9937
│     ├─ (100000, 0.1, 0.25)    │ 281.7 µs      │ 281.3 µs       │ 1.0014
│     ├─ (100000, 0.1, 0.95)    │ 279.1 µs      │ 315.3 µs       │ 0.8851  ***
│     ├─ (100000, 0.1, 1.0)     │ 273.0 µs      │ 275.7 µs       │ 0.9902
│     ├─ (10000000, 0.0, 0.25)  │ 16.16 ms      │ 15.86 ms       │ 1.0189
│     ├─ (10000000, 0.0, 0.95)  │ 16.19 ms      │ 15.75 ms       │ 1.0279
│     ├─ (10000000, 0.0, 1.0)   │ 16.2 ms       │ 15.83 ms       │ 1.0233
│     ├─ (10000000, 0.01, 0.25) │ 25.29 ms      │ 25.77 ms       │ 0.9813
│     ├─ (10000000, 0.01, 0.95) │ 25.74 ms      │ 25.94 ms       │ 0.9922
│     ├─ (10000000, 0.01, 1.0)  │ 25.54 ms      │ 25.32 ms       │ 1.0086
│     ├─ (10000000, 0.1, 0.25)  │ 26.89 ms      │ 30.73 ms       │ 0.8750  ***
│     ├─ (10000000, 0.1, 0.95)  │ 27.05 ms      │ 30.53 ms       │ 0.8860  ***
│     ╰─ (10000000, 0.1, 1.0)   │ 26.22 ms      │ 25.98 ms       │ 1.0092
├─ decompress_alp               │               │                │
│  ├─ f32                       │               │                │
│  │  ├─ (100000, 0.0, 0.25)    │ 12.24 µs      │ 12.33 µs       │ 0.9927
│  │  ├─ (100000, 0.0, 0.95)    │ 12.24 µs      │ 12.16 µs       │ 1.0065
│  │  ├─ (100000, 0.0, 1.0)     │ 12.2 µs       │ 12.16 µs       │ 1.0032
│  │  ├─ (100000, 0.01, 0.25)   │ 15.12 µs      │ 14.04 µs       │ 1.0769
│  │  ├─ (100000, 0.01, 0.95)   │ 14.95 µs      │ 14.81 µs       │ 1.0094
│  │  ├─ (100000, 0.01, 1.0)    │ 13.43 µs      │ 13.24 µs       │ 1.0143
│  │  ├─ (100000, 0.1, 0.25)    │ 26.08 µs      │ 17.41 µs       │ 1.4979  ***
│  │  ├─ (100000, 0.1, 0.95)    │ 25.87 µs      │ 25.04 µs       │ 1.0331
│  │  ├─ (100000, 0.1, 1.0)     │ 19.33 µs      │ 21.08 µs       │ 0.9169
│  │  ├─ (10000000, 0.0, 0.25)  │ 2.067 ms      │ 2.057 ms       │ 1.0048
│  │  ├─ (10000000, 0.0, 0.95)  │ 2.068 ms      │ 2.055 ms       │ 1.0063
│  │  ├─ (10000000, 0.0, 1.0)   │ 2.07 ms       │ 1.261 ms       │ 1.6415  ***
│  │  ├─ (10000000, 0.01, 0.25) │ 1.51 ms       │ 2.113 ms       │ 0.7146  ***
│  │  ├─ (10000000, 0.01, 0.95) │ 1.477 ms      │ 2.621 ms       │ 0.5635  ***
│  │  ├─ (10000000, 0.01, 1.0)  │ 1.35 ms       │ 1.346 ms       │ 1.0029
│  │  ├─ (10000000, 0.1, 0.25)  │ 3.765 ms      │ 2.58 ms        │ 1.4593  ***
│  │  ├─ (10000000, 0.1, 0.95)  │ 2.784 ms      │ 3.28 ms        │ 0.8487  ***
│  │  ╰─ (10000000, 0.1, 1.0)   │ 1.764 ms      │ 1.754 ms       │ 1.0057
│  ╰─ f64                       │               │                │
│     ├─ (100000, 0.0, 0.25)    │ 23.33 µs      │ 23.45 µs       │ 0.9948
│     ├─ (100000, 0.0, 0.95)    │ 23.41 µs      │ 23.33 µs       │ 1.0034
│     ├─ (100000, 0.0, 1.0)     │ 23.33 µs      │ 23.49 µs       │ 0.9931
│     ├─ (100000, 0.01, 0.25)   │ 25.58 µs      │ 24.66 µs       │ 1.0373
│     ├─ (100000, 0.01, 0.95)   │ 25.58 µs      │ 25.79 µs       │ 0.9918
│     ├─ (100000, 0.01, 1.0)    │ 24.2 µs       │ 24.62 µs       │ 0.9829
│     ├─ (100000, 0.1, 0.25)    │ 39.83 µs      │ 27.87 µs       │ 1.4291  ***
│     ├─ (100000, 0.1, 0.95)    │ 39.7 µs       │ 39.56 µs       │ 1.0035
│     ├─ (100000, 0.1, 1.0)     │ 34.43 µs      │ 31.66 µs       │ 1.0874
│     ├─ (10000000, 0.0, 0.25)  │ 4.246 ms      │ 4.239 ms       │ 1.0016
│     ├─ (10000000, 0.0, 0.95)  │ 4.227 ms      │ 4.292 ms       │ 0.9848
│     ├─ (10000000, 0.0, 1.0)   │ 4.227 ms      │ 4.246 ms       │ 0.9955
│     ├─ (10000000, 0.01, 0.25) │ 4.696 ms      │ 4.356 ms       │ 1.0780
│     ├─ (10000000, 0.01, 0.95) │ 4.933 ms      │ 4.637 ms       │ 1.0638
│     ├─ (10000000, 0.01, 1.0)  │ 4.538 ms      │ 4.545 ms       │ 0.9984
│     ├─ (10000000, 0.1, 0.25)  │ 7.23 ms       │ 5.304 ms       │ 1.3631  ***
│     ├─ (10000000, 0.1, 0.95)  │ 6.227 ms      │ 5.913 ms       │ 1.0531
│     ╰─ (10000000, 0.1, 1.0)   │ 5.207 ms      │ 5.29 ms        │ 0.9843

Benchmarks before reverting to develop's chunking code

Details

[1] Seems like this PR is about the same except for compressing really large f64 arrays. The PR that introduced chunking, #924, reported substantially larger reductions (~5ms of 29ms) in time than this increase of ~1ms (of 17ms).

alp_compress               │ PR median     │ PR mean   │ develop median │ develop mean │
├─ compress_alp            │               │           │                │              │
│  ├─ f32                  │               │           │                │              │
│  │  ├─ (100000, 0.25)    │ 136.4 µs      │ 137.9 µs  │ 143 µs         │ 145.9 µs     │
│  │  ├─ (100000, 0.95)    │ 136.3 µs      │ 137.1 µs  │ 133.1 µs       │ 134.3 µs     │
│  │  ├─ (100000, 1.0)     │ 136 µs        │ 137.3 µs  │ 133.6 µs       │ 134.6 µs     │
│  │  ├─ (10000000, 0.25)  │ 13.54 ms      │ 13.67 ms  │ 13.74 ms       │ 13.84 ms     │
│  │  ├─ (10000000, 0.95)  │ 13.54 ms      │ 13.64 ms  │ 13.49 ms       │ 13.59 ms     │
│  │  ╰─ (10000000, 1.0)   │ 13.47 ms      │ 13.57 ms  │ 13.58 ms       │ 13.73 ms     │
│  ╰─ f64                  │               │           │                │              │
│     ├─ (100000, 0.25)    │ 152.5 µs      │ 153.9 µs  │ 166.1 µs       │ 167.2 µs     │
│     ├─ (100000, 0.95)    │ 152.5 µs      │ 154.3 µs  │ 166.4 µs       │ 167 µs       │
│     ├─ (100000, 1.0)     │ 151.5 µs      │ 153 µs    │ 166.2 µs       │ 166.9 µs     │
│     ├─ (10000000, 0.25)  │ 16.89 ms      │ 17 ms     │ 15.87 ms       │ 15.91 ms     │
│     ├─ (10000000, 0.95)  │ 16.96 ms      │ 17.19 ms  │ 16.14 ms       │ 16.12 ms     │
│     ╰─ (10000000, 1.0)   │ 16.93 ms      │ 16.99 ms  │ 16.15 ms       │ 16.18 ms     │
╰─ decompress_alp          │               │           │                │              │
   ├─ f32                  │               │           │                │              │
   │  ├─ (100000, 0.25)    │ 12.33 µs      │ 12.4 µs   │ 12.37 µs       │ 12.55 µs     │
   │  ├─ (100000, 0.95)    │ 11.99 µs      │ 12.01 µs  │ 12.45 µs       │ 12.58 µs     │
   │  ├─ (100000, 1.0)     │ 11.95 µs      │ 11.98 µs  │ 11.91 µs       │ 11.96 µs     │
   │  ├─ (10000000, 0.25)  │ 1.233 ms      │ 1.24 ms   │ 2.064 ms       │ 2.088 ms     │
   │  ├─ (10000000, 0.95)  │ 1.232 ms      │ 1.235 ms  │ 2.063 ms       │ 2.094 ms     │
   │  ╰─ (10000000, 1.0)   │ 1.233 ms      │ 1.236 ms  │ 2.061 ms       │ 2.088 ms     │
   ╰─ f64                  │               │           │                │              │
      ├─ (100000, 0.25)    │ 23.29 µs      │ 23.46 µs  │ 23.33 µs       │ 23.4 µs      │
      ├─ (100000, 0.95)    │ 22.87 µs      │ 22.92 µs  │ 22.99 µs       │ 23.06 µs     │
      ├─ (100000, 1.0)     │ 22.87 µs      │ 23 µs     │ 22.95 µs       │ 23 µs        │
      ├─ (10000000, 0.25)  │ 4.254 ms      │ 4.393 ms  │ 4.239 ms       │ 4.28 ms      │
      ├─ (10000000, 0.95)  │ 4.703 ms      │ 4.639 ms  │ 4.27 ms        │ 4.437 ms     │
      ╰─ (10000000, 1.0)   │ 4.479 ms      │ 4.58 ms   │ 4.684 ms       │ 4.618 ms     │

The patches are now always non-nullable.

This required PrimitiveArray::patch to gracefully handle non-nullable patches when the array is
nullable.

I modified the benchmarks to include patch manipulation time, but notice that the test data has no
patches. The benchmarks measure the overhead of `is_valid`. If we had test data where the invalid
positions contained exceptional values, I would expect a modest improvement in both decompression
and compression time.
finish revert
@danking danking requested review from a10y and gatesn February 3, 2025 18:52
@danking danking marked this pull request as ready for review February 3, 2025 18:56
Comment thread encodings/alp/src/alp/array.rs Outdated
vortex_bail!(MismatchedTypes: dtype, patches.dtype());
}

if patches.values().validity_mask()?.false_count() != 0 {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling validity_mask here triggers a "canonicalization" of the validity buffer. You should instead use patches.values().all_valid()? which should short-circuit if possible

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks.

Comment thread encodings/alp/Cargo.toml Outdated
workspace = true

[dependencies]
arrow-array = { workspace = true }

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is suspect?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed cruft. Removed.

Comment thread encodings/alp/src/alp/compress.rs Outdated
// exceptional_positions may contain exceptions at invalid positions (which contain garbage
// data). We remove invalid exceptional positions in order to keep the Patches small.
let (valid_exceptional_positions, valid_exceptional_values): (Buffer<u64>, Buffer<T>) =
if n_valid == 0 {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can do match validity.boolean_buffer() to switch over alltrue / allfalse and a buffer

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@danking danking requested a review from gatesn February 3, 2025 21:05
}

#[test]
#[allow(clippy::approx_constant)] // Clippy objects to 2.718, an approximation of e, the base of the natural logarithm.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's so funny

@danking danking merged commit 2bf84b0 into develop Feb 3, 2025
@danking danking deleted the dk/alp-validity-in-encoded-only-v2 branch February 3, 2025 23:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants