Bring dev branch up to date with master branch by jermp · Pull Request #95 · jermp/pthash

jermp · 2026-03-09T14:43:10Z

No description provided.

@rurban

* some toys * more * added opt_bucketer * test opt_bucketer * minor * added TODO list * added TODO list * bump external/essentials * option to specify bucketer type * output bucketer type in json format * better include paths * removed external memory code * minor * using a new parameter lambda (avg. bucket size) to govern the number of buckets: m = ceil(n/lambda) * added dense_partitioned_phf and interleaved encoding * more encoders * added missing file * fixed corner case in Rice coding * note * note * added opt2 bucketer that uses a lookup table * added secondary sort option * some experiments * timings * updated timings with microsecond resolution * using p parameter as avg_partition_size rather than num_partitions * dual interleaved encoders * removed results.md and updated todos * alpha < 1.0 for dense partitioning too * added dual tradeoff param * revisited bucketers. removed opt2. removed spline. * code style; revised example.cpp * removed correlation between seeds * sec sort in res line * sec sort in res line * up * Do not specify c++ version again The cpp version is already handled in `target_compile_features`. No need to specify it again as compiler option in `target_compile_options`. When using `target_compile_options` and including PTHash in a project that needs c++20, this otherwise breaks c++20 support for the parent project. * added additive displacement pilot search * additive displacement fix * clang format * added a new template search_type to choose xor-based displacement or add-based displacement * implemented add-based displacement also for single mphf construction with multiple threads * Fix Rice parameter calculation `std::max(0, ...)` does nothing for unsigned numbers. I saw crashes because it tried to allocate memory close to UINT64_MAX * Assert that both occurrences of the same parameters are set to the same * Use remix instead of XOR based hashing * query perfromance improvements * query perfromance improvements * query perfromance improvements * align query algo to benchmark framework and select block size to GPU * align query algo to benchmark framework and select block size to GPU * fixed dangerous arithmetic * Update util.hpp adding static to avoid linking problem * fixed tons of warnings * fixed xor * improved query performance for alpha=1 improved bucketers for alpha<1 * making init in range_bucketer consistant with Bucketer init * fixed additive displacement for single phf and parallel * Revert "fixing init in range_bucketer" * fixed many warnings (again) * parallel hashing and partitioning * Revert "parallel hashing and partitioning" * first try * first try * first try * first try * first try * first try * revert other files * readded seq algo * align bucket count per partition to GPU impl * anything that is independent of the input keys should not count towards the total space consumption * corrected space * corrected visitor * multithreaded encoding * aligned input strings to benchmark framework * fixed several bugs related to table bucketer for large n when no partitioning is used * revert bucketer changed * revert bucketer changed * refactor multi--> inter * bucketer align to theory section * bucketer align to theory section * Add some assertions and PHOBIC typedef * Mention PHOBIC in README * paper link in readme * essentials updated * fixed inter serialization * readme updated * . * merge pthash and phobic. alpha<1 doesnt work * fixes * external * example runs fine * more fixes: examples run with any alpha * removed NeedsFreeArray; now all functions have a common build interface and same template parameters * more * added test for dense_partitioned: still need to fix encoders * added test for dense_partitioned_phf * towards making compile * towards making compile * build compiles and runs fine but needs more testing * added all encoders * all good; did some tests * minor * added back choice of minimal during building * tested build command * more * readme and license updated * perf and bugfix * fixed compilation on Mac OS * very minor * README updated with some build examples; minor fix to build * fixed bug in perf * Build CI with gcc and clang, on all branches Also treat warnings as errors when compiling PTHash itself, not as a library * Avoid warning in clang * perf and bugfix * Fix uninitialized memory access * Move to default constructors for bucketer initialization * style; status badge on README * very minor to README * Reference PHOBIC a bit more prominently * added a small dev note about c vs. lambda: point to release v2 * Tweak cmake - When setting the include path to the main folder, projects depending on pthash would have stuff like `README.md` in their include paths. Also, using absolute paths (instead of include directories) causes problems when using pthash as a dependency, especially when running into diamond dependencies. - Explicitly set compiler warnings only when compiling PTHash itself. Otherwise these get propagated to projects depending on it. - With modern cmake, there is no need to specify "-pthread". Cmake does that for us if necessary and supported by the OS. * bug fix in compute_num_buckets() * added script to reproduce the (internal mem.) benchmarks from SIGIR paper, mapping c into lambda values * more configs to benchmark script * more logging * better benchmarking script * make seed to generate the input dependant from construction seed * minor * fix to script * added an example benchmark * center text in column * added results on 1B keys * templated select index * fixed typo as spotted by @rurban * results updated with sparser DArray structure: a bit less space for phobic; query time unchanged * improve parser redability as suggested by rurban * unsigned difference is always positive: > 0 should be != 0 instead * visit methods order in sdc and sdc_sequence --------- Co-authored-by: Stefan Hermann <stefan.a.hermann@gmail.com> Co-authored-by: ByteHamster <info@bytehamster.com> Co-authored-by: KHODOR HANNOUSH <49639688+khodor14@users.noreply.github.com>

- using fixed table size for partition when using --dense mode; - removed additive search: xor is faster and now handles power-of-2 table sizes. - simplified optimal bucketing; - refactored encoders and delegated more to external/bits; - less and more elegant code. - benchmarks included.

Add inline keyword to static functions in headers to comply with Clang's stricter ODR checks. This fixes compilation errors on macOS arm64 builds with Apple Clang.

Ensure the temp file is truncated if it exists.

…ror (#94) * Add inline to compute_empirical_entropy to fix multiple definition errors When pthash headers are included in multiple translation units and linked together, the non-inline compute_empirical_entropy function causes linker errors due to multiple definitions. Adding inline allows the linker to keep a single copy. * Add inline to find_avg_partition_size for macOS Clang compatibility

jermp and others added 30 commits December 9, 2024 14:57

fixed compilation with PTHASH_ENABLE_ALL_ENCODERS=On

a747347

better CMakeLists.txt

e1be88a

fix silly duplicated method

57d501c

fixed template in typedef phobic

fa5bfca

Avoid cmake warning by declaring support for 3.31 (#83)

373b6bf

Upgrade external/bits to support cmake 4.0 (#85)

fa03d5e

logos

43469e8

Merge branch 'master' of https://github.com/jermp/pthash

7d7b243

logos updated

d0c5962

updated external/bits

193a035

updated external/bits/essentials

f6fef64

fixed partitioned_phf::num_bits()

cc4c9c9

minor cleanup

dfe4832

point to benchmarks folder

a8ae919

added query time comparison: Intel x86 vs. Apple-M1 arm

063e84c

updated external/bits

fff7ea9

updated external/bits

6c30dca

Fix Clang compilation: make static functions inline (#92)

65fe3a0

Add inline keyword to static functions in headers to comply with Clang's stricter ODR checks. This fixes compilation errors on macOS arm64 builds with Apple Clang.

Truncate existing temp file in meta_partition constructor (#93)

231b62b

Ensure the temp file is truncated if it exists.

clang format

c06ee0c

updated external/bits

3e2a531

minor stylistic change

35c5734

try another seed if num. keys in partition is larger than the table size

c94d16e

updated external/bits

217917b

minor and updated external/bits

304378a

updated external/bits

f03e9ae

updated external/bits

13e0572

updated external/bits

cdeee6e

jermp closed this Mar 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bring dev branch up to date with master branch#95

Bring dev branch up to date with master branch#95
jermp wants to merge 31 commits intodevfrom
master

jermp commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jermp commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants