Skip to content

Conversation

@alamb
Copy link
Contributor

@alamb alamb commented Nov 17, 2025

Which issue does this PR close?

Rationale for this change

@jhorstmann has a suggestion on how to improve this code

What changes are included in this PR?

Implement said suggestion

Are these changes tested?

By CI and I will run benchmarks

Are there any user-facing changes?

No

@alamb
Copy link
Contributor Author

alamb commented Nov 17, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing alamb/horstmann_special (3111d1f) to d13e46a diff
BENCH_NAME=take_kernels
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench take_kernels
BENCH_FILTER=
BENCH_BRANCH_NAME=alamb_horstmann_special
Results will be posted here when complete

@alamb alamb marked this pull request as ready for review November 17, 2025 21:59
}))
.map(|(index, valid)| {
if valid {
values[index.as_usize()]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also panics the same way? But it is more readable 👍

@alamb
Copy link
Contributor Author

alamb commented Nov 17, 2025

🤖: Benchmark completed

Details

group                                                                     alamb_horstmann_special                main
-----                                                                     -----------------------                ----
take bool 1024                                                            1.00   1328.1±1.97ns        ? ?/sec    1.01   1337.2±2.08ns        ? ?/sec
take bool 512                                                             1.00    728.9±1.48ns        ? ?/sec    1.01    733.7±1.16ns        ? ?/sec
take bool null indices 1024                                               1.00   1627.7±5.24ns        ? ?/sec    1.00   1622.1±4.46ns        ? ?/sec
take bool null values 1024                                                1.00      2.6±0.01µs        ? ?/sec    1.00      2.6±0.05µs        ? ?/sec
take bool null values null indices 1024                                   1.00      3.1±0.02µs        ? ?/sec    1.00      3.1±0.01µs        ? ?/sec
take check bounds i32 1024                                                1.01   1640.7±5.91ns        ? ?/sec    1.00   1620.0±3.99ns        ? ?/sec
take check bounds i32 512                                                 1.02    825.7±2.03ns        ? ?/sec    1.00    807.9±2.09ns        ? ?/sec
take i32 1024                                                             1.01    717.3±1.35ns        ? ?/sec    1.00    711.2±1.21ns        ? ?/sec
take i32 512                                                              1.01    393.4±1.09ns        ? ?/sec    1.00    390.1±1.33ns        ? ?/sec
take i32 null indices 1024                                                1.28   1275.0±7.11ns        ? ?/sec    1.00   995.3±13.62ns        ? ?/sec
take i32 null values 1024                                                 1.00      2.0±0.00µs        ? ?/sec    1.00      2.0±0.00µs        ? ?/sec
take i32 null values null indices 1024                                    1.11      2.9±0.02µs        ? ?/sec    1.00      2.6±0.01µs        ? ?/sec
take primitive fsb value len: 12, indices: 1024                           1.06      8.4±0.09µs        ? ?/sec    1.00      7.9±0.02µs        ? ?/sec
take primitive fsb value len: 12, null values, indices: 1024              1.17     10.4±0.13µs        ? ?/sec    1.00      8.9±0.07µs        ? ?/sec
take primitive run logical len: 1024, physical len: 512, indices: 1024    1.01     20.8±0.19µs        ? ?/sec    1.00     20.6±0.10µs        ? ?/sec
take str 1024                                                             1.02     11.3±0.04µs        ? ?/sec    1.00     11.1±0.04µs        ? ?/sec
take str 512                                                              1.03      5.6±0.02µs        ? ?/sec    1.00      5.5±0.01µs        ? ?/sec
take str null indices 1024                                                1.02      7.9±0.03µs        ? ?/sec    1.00      7.7±0.03µs        ? ?/sec
take str null indices 512                                                 1.00      3.8±0.01µs        ? ?/sec    1.00      3.8±0.01µs        ? ?/sec
take str null values 1024                                                 1.00      8.7±0.08µs        ? ?/sec    1.00      8.7±0.06µs        ? ?/sec
take str null values null indices 1024                                    1.00      7.3±0.05µs        ? ?/sec    1.00      7.3±0.04µs        ? ?/sec
take stringview 1024                                                      1.00    780.2±1.13ns        ? ?/sec    1.00    777.5±1.67ns        ? ?/sec
take stringview 512                                                       1.02    481.9±3.53ns        ? ?/sec    1.00    474.4±0.88ns        ? ?/sec
take stringview null indices 1024                                         1.07  1518.5±11.25ns        ? ?/sec    1.00  1413.8±15.31ns        ? ?/sec
take stringview null indices 512                                          1.09    842.1±1.53ns        ? ?/sec    1.00    771.1±0.89ns        ? ?/sec
take stringview null values 1024                                          1.00      2.1±0.00µs        ? ?/sec    1.00      2.1±0.00µs        ? ?/sec
take stringview null values null indices 1024                             1.03      3.0±0.04µs        ? ?/sec    1.00      2.9±0.03µs        ? ?/sec

@alamb
Copy link
Contributor Author

alamb commented Nov 17, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing alamb/horstmann_special (3111d1f) to d13e46a diff
BENCH_NAME=take_kernels
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench take_kernels
BENCH_FILTER=
BENCH_BRANCH_NAME=alamb_horstmann_special
Results will be posted here when complete

@Dandandan Dandandan changed the title Clean up collect_bool to avoid a panic Clean up take_native to avoid a panic Nov 17, 2025
@Dandandan
Copy link
Contributor

Dandandan commented Nov 17, 2025

take i32 null indices 1024 1.28 1275.0±7.11ns ? ?/sec 1.00 995.3±13.62ns ? ?/sec

Looks a bit slower

@alamb
Copy link
Contributor Author

alamb commented Nov 17, 2025

🤖: Benchmark completed

Details

group                                                                     alamb_horstmann_special                main
-----                                                                     -----------------------                ----
take bool 1024                                                            1.00  1330.9±10.80ns        ? ?/sec    1.00   1335.5±3.45ns        ? ?/sec
take bool 512                                                             1.00    725.1±0.92ns        ? ?/sec    1.02    737.0±1.09ns        ? ?/sec
take bool null indices 1024                                               1.00  1639.6±23.18ns        ? ?/sec    1.00   1646.9±4.98ns        ? ?/sec
take bool null values 1024                                                1.00      2.6±0.02µs        ? ?/sec    1.00      2.6±0.00µs        ? ?/sec
take bool null values null indices 1024                                   1.07      3.3±0.01µs        ? ?/sec    1.00      3.1±0.01µs        ? ?/sec
take check bounds i32 1024                                                1.00   1614.7±2.16ns        ? ?/sec    1.00   1621.4±2.35ns        ? ?/sec
take check bounds i32 512                                                 1.00    808.6±1.08ns        ? ?/sec    1.00    809.0±3.04ns        ? ?/sec
take i32 1024                                                             1.00    714.3±1.47ns        ? ?/sec    1.00    712.7±1.36ns        ? ?/sec
take i32 512                                                              1.00    387.2±1.10ns        ? ?/sec    1.01    392.7±0.71ns        ? ?/sec
take i32 null indices 1024                                                1.28   1275.7±7.99ns        ? ?/sec    1.00   997.8±16.77ns        ? ?/sec
take i32 null values 1024                                                 1.00      2.0±0.00µs        ? ?/sec    1.00      2.0±0.00µs        ? ?/sec
take i32 null values null indices 1024                                    1.11      2.9±0.01µs        ? ?/sec    1.00      2.6±0.01µs        ? ?/sec
take primitive fsb value len: 12, indices: 1024                           1.06      8.5±0.01µs        ? ?/sec    1.00      8.0±0.03µs        ? ?/sec
take primitive fsb value len: 12, null values, indices: 1024              1.21     11.1±0.08µs        ? ?/sec    1.00      9.2±0.04µs        ? ?/sec
take primitive run logical len: 1024, physical len: 512, indices: 1024    1.01     20.6±0.09µs        ? ?/sec    1.00     20.5±0.08µs        ? ?/sec
take str 1024                                                             1.00     11.2±0.04µs        ? ?/sec    1.00     11.2±0.04µs        ? ?/sec
take str 512                                                              1.00      5.4±0.03µs        ? ?/sec    1.02      5.5±0.02µs        ? ?/sec
take str null indices 1024                                                1.02      7.9±0.02µs        ? ?/sec    1.00      7.7±0.02µs        ? ?/sec
take str null indices 512                                                 1.00      3.8±0.01µs        ? ?/sec    1.03      3.9±0.01µs        ? ?/sec
take str null values 1024                                                 1.00      8.6±0.04µs        ? ?/sec    1.00      8.6±0.04µs        ? ?/sec
take str null values null indices 1024                                    1.03      7.5±0.04µs        ? ?/sec    1.00      7.3±0.03µs        ? ?/sec
take stringview 1024                                                      1.00    776.1±1.10ns        ? ?/sec    1.09    847.1±1.24ns        ? ?/sec
take stringview 512                                                       1.01    478.1±0.53ns        ? ?/sec    1.00    473.4±0.79ns        ? ?/sec
take stringview null indices 1024                                         1.09  1516.9±14.03ns        ? ?/sec    1.00  1387.9±10.10ns        ? ?/sec
take stringview null indices 512                                          1.08    835.9±2.98ns        ? ?/sec    1.00    773.3±1.46ns        ? ?/sec
take stringview null values 1024                                          1.00      2.1±0.00µs        ? ?/sec    1.00      2.0±0.00µs        ? ?/sec
take stringview null values null indices 1024                             1.03      2.9±0.01µs        ? ?/sec    1.00      2.9±0.01µs        ? ?/sec

@jhorstmann
Copy link
Contributor

Thanks for trying this out! Seems to be a bigger regression than I thought and there are good reasons that the current code is structured that way.

@alamb
Copy link
Contributor Author

alamb commented Nov 18, 2025

I am also surprised by this too

We can probably do better handling nulls 64 bits at a time, but for now I will just drop it as I have plenty of other places to chase performance wins

Thanks @jhorstmann and @Dandandan

@alamb alamb closed this Nov 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants