Conversation
* some toys * more * added opt_bucketer * test opt_bucketer * minor * added TODO list * added TODO list * bump external/essentials * option to specify bucketer type * output bucketer type in json format * better include paths * removed external memory code * minor * using a new parameter lambda (avg. bucket size) to govern the number of buckets: m = ceil(n/lambda) * added dense_partitioned_phf and interleaved encoding * more encoders * added missing file * fixed corner case in Rice coding * note * note * added opt2 bucketer that uses a lookup table * added secondary sort option * some experiments * timings * updated timings with microsecond resolution * using p parameter as avg_partition_size rather than num_partitions * dual interleaved encoders * removed results.md and updated todos * alpha < 1.0 for dense partitioning too * added dual tradeoff param * revisited bucketers. removed opt2. removed spline. * code style; revised example.cpp * removed correlation between seeds * sec sort in res line * sec sort in res line * up * Do not specify c++ version again The cpp version is already handled in `target_compile_features`. No need to specify it again as compiler option in `target_compile_options`. When using `target_compile_options` and including PTHash in a project that needs c++20, this otherwise breaks c++20 support for the parent project. * added additive displacement pilot search * additive displacement fix * clang format * added a new template search_type to choose xor-based displacement or add-based displacement * implemented add-based displacement also for single mphf construction with multiple threads * Fix Rice parameter calculation `std::max(0, ...)` does nothing for unsigned numbers. I saw crashes because it tried to allocate memory close to UINT64_MAX * Assert that both occurrences of the same parameters are set to the same * Use remix instead of XOR based hashing * query perfromance improvements * query perfromance improvements * query perfromance improvements * align query algo to benchmark framework and select block size to GPU * align query algo to benchmark framework and select block size to GPU * fixed dangerous arithmetic * Update util.hpp adding static to avoid linking problem * fixed tons of warnings * fixed xor * improved query performance for alpha=1 improved bucketers for alpha<1 * making init in range_bucketer consistant with Bucketer init * fixed additive displacement for single phf and parallel * Revert "fixing init in range_bucketer" * fixed many warnings (again) * parallel hashing and partitioning * Revert "parallel hashing and partitioning" * first try * first try * first try * first try * first try * first try * revert other files * readded seq algo * align bucket count per partition to GPU impl * anything that is independent of the input keys should not count towards the total space consumption * corrected space * corrected visitor * multithreaded encoding * aligned input strings to benchmark framework * fixed several bugs related to table bucketer for large n when no partitioning is used * revert bucketer changed * revert bucketer changed * refactor multi--> inter * bucketer align to theory section * bucketer align to theory section * Add some assertions and PHOBIC typedef * Mention PHOBIC in README * paper link in readme * essentials updated * fixed inter serialization * readme updated * . * merge pthash and phobic. alpha<1 doesnt work * fixes * external * example runs fine * more fixes: examples run with any alpha * removed NeedsFreeArray; now all functions have a common build interface and same template parameters * more * added test for dense_partitioned: still need to fix encoders * added test for dense_partitioned_phf * towards making compile * towards making compile * build compiles and runs fine but needs more testing * added all encoders * all good; did some tests * minor * added back choice of minimal during building * tested build command * more * readme and license updated * perf and bugfix * fixed compilation on Mac OS * very minor * README updated with some build examples; minor fix to build * fixed bug in perf * Build CI with gcc and clang, on all branches Also treat warnings as errors when compiling PTHash itself, not as a library * Avoid warning in clang * perf and bugfix * Fix uninitialized memory access * Move to default constructors for bucketer initialization * style; status badge on README * very minor to README * Reference PHOBIC a bit more prominently * added a small dev note about c vs. lambda: point to release v2 * Tweak cmake - When setting the include path to the main folder, projects depending on pthash would have stuff like `README.md` in their include paths. Also, using absolute paths (instead of include directories) causes problems when using pthash as a dependency, especially when running into diamond dependencies. - Explicitly set compiler warnings only when compiling PTHash itself. Otherwise these get propagated to projects depending on it. - With modern cmake, there is no need to specify "-pthread". Cmake does that for us if necessary and supported by the OS. * bug fix in compute_num_buckets() * added script to reproduce the (internal mem.) benchmarks from SIGIR paper, mapping c into lambda values * more configs to benchmark script * more logging * better benchmarking script * make seed to generate the input dependant from construction seed * minor * fix to script * added an example benchmark * center text in column * added results on 1B keys * templated select index * fixed typo as spotted by @rurban * results updated with sparser DArray structure: a bit less space for phobic; query time unchanged * improve parser redability as suggested by rurban * unsigned difference is always positive: > 0 should be != 0 instead * visit methods order in sdc and sdc_sequence --------- Co-authored-by: Stefan Hermann <stefan.a.hermann@gmail.com> Co-authored-by: ByteHamster <info@bytehamster.com> Co-authored-by: KHODOR HANNOUSH <49639688+khodor14@users.noreply.github.com>
- using fixed table size for partition when using --dense mode; - removed additive search: xor is faster and now handles power-of-2 table sizes. - simplified optimal bucketing; - refactored encoders and delegated more to external/bits; - less and more elegant code. - benchmarks included.
Add inline keyword to static functions in headers to comply with Clang's stricter ODR checks. This fixes compilation errors on macOS arm64 builds with Apple Clang.
Ensure the temp file is truncated if it exists.
…ror (#94) * Add inline to compute_empirical_entropy to fix multiple definition errors When pthash headers are included in multiple translation units and linked together, the non-inline compute_empirical_entropy function causes linker errors due to multiple definitions. Adding inline allows the linker to keep a single copy. * Add inline to find_avg_partition_size for macOS Clang compatibility
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.