[FEA] On-demand size computation to solve #65 and #39

In order to support fully asynchronous bulk operations, e.g., for multi-GPU hash tables, as requested in #65, we need to rethink the way we compute the table's `size`. 

For now, `insert` tracks the number of successful insertions on-the-fly. After the kernel has finished, we copy the number back to the host and add it to the `size_` member. This implies that `insert` synchronizes with the host.

In order to overcome this limitation for `cuco::static_reduction_map` (PR #98), 902b93a proposes a standalone `size` computation based on `thrust::count_if` and also implements fully asynchronous bulk operations.

The `size` computation using `thrust::count_if` shows near SOL performance in terms of throughput.
Additionally, since we do not need to reduce the number of added pairs during insertion, the overall performance of the `insert` bulk operation improves by ~3-5%.

IMHO we should also add this feature to the other hash table implementations.

This feature also implicitly solves issue #39.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] On-demand size computation to solve #65 and #39 #102

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[FEA] On-demand size computation to solve #65 and #39 #102

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions