Skip to content

Left join could use bitmap for left join instead of Vec<bool> #240

Description

@Dandandan

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
To save some memory usage, and potentially also is faster, the data invisited_left_side in the HashJoinStream could be stored in a bitmap instead of a Vec<bool>. This would save ~7/8 byte per left row.
If we store only 32 bit integers on the left, the savings would be ~4-5% assuming we use 4 bytes for the items and roughly 16 bytes per left side row for the hasmap. Not too big, but a nice win in some cases. This could be bigger when we use a more memory-efficient data-structure for the hashmap.

Additionally, in case every row is not matches or no row is unmatched, it could include a fast path for those cases.

Describe the solution you'd like
Use a bitmap instead of Vec<bool>. The bitmap could be from arrow or maybe the bitvec crate.

Describe alternatives you've considered
Keep using a Vec<bool>

Additional context
n/a

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions