Skip to content

Use hashbrown::RawTable API#2765

Closed
tustvold wants to merge 1 commit into
apache:masterfrom
tustvold:hashbrown-raw-table
Closed

Use hashbrown::RawTable API#2765
tustvold wants to merge 1 commit into
apache:masterfrom
tustvold:hashbrown-raw-table

Conversation

@tustvold

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Closes #.

Rationale for this change

The use of HashMap without a hasher nor keys, is a little perverse. It turns out hashbrown has a RawTable API that we can use directly instead. As an added bonus this provides a very minor performance improvement

string_dictionary_builder/(dict_size:20, len:1000, key_len: 5)
                        time:   [18.563 µs 18.605 µs 18.647 µs]
                        change: [-4.5058% -3.6392% -2.7326%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  3 (3.00%) low mild
  2 (2.00%) high mild
  3 (3.00%) high severe
string_dictionary_builder/(dict_size:100, len:1000, key_len: 5)
                        time:   [21.649 µs 21.693 µs 21.738 µs]
                        change: [-2.2389% -1.8649% -1.5032%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe
string_dictionary_builder/(dict_size:100, len:1000, key_len: 10)
                        time:   [21.696 µs 21.715 µs 21.734 µs]
                        change: [-2.0519% -1.8956% -1.6960%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe
string_dictionary_builder/(dict_size:100, len:10000, key_len: 10)
                        time:   [176.06 µs 176.14 µs 176.22 µs]
                        change: [-0.9251% -0.6795% -0.4273%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  2 (2.00%) high severe
string_dictionary_builder/(dict_size:100, len:10000, key_len: 100)
                        time:   [463.94 µs 464.12 µs 464.30 µs]
                        change: [-0.1715% -0.0733% +0.0237%] (p = 0.15 > 0.05)
                        No change in performance detected.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

What changes are included in this PR?

Switches to using hashbrown::RawTable in place of HashMap and the raw entry API

Are there any user-facing changes?

No

@github-actions github-actions Bot added arrow Changes to the arrow crate parquet Changes to the parquet crate labels Sep 21, 2022
@tustvold tustvold marked this pull request as draft September 21, 2022 18:20
@tustvold

Copy link
Copy Markdown
Contributor Author

Just realised this API is marked experimental

@Dandandan

Copy link
Copy Markdown
Contributor

I think you need to explicitly enable this feature?

I also used it in the hash join implementation (and I think in hash aggregate as well):
https://github.com/apache/arrow-datafusion/blob/master/datafusion/core/Cargo.toml#L72

@tustvold

Copy link
Copy Markdown
Contributor Author

Yeah, I'm just a little bit unsure what the implications of this being an experimental API are... I'll revisit this after #2769

@tustvold tustvold closed this Oct 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate parquet Changes to the parquet crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants