[SPARK-2240][SQL]Spark SQL add LeftSemiBloomFilterBroadcastJoin#1127
[SPARK-2240][SQL]Spark SQL add LeftSemiBloomFilterBroadcastJoin#1127YanjieGao wants to merge 5 commits into
Conversation
Hi ,All . I want to submit a join operator called LeftSemiBloomFilterBroadcastJoin (LeftSemiJoinBFB) Sometimes the Semijoin's broadcast table can't fit memory.So we can make it as Bloomfilter to reduce the space and then broadcast it do the mapside join . Some code reference HashJoin and BroadcastNestedLoopJoin implementation. The bloomfilter code use Shark's BloomFilter class implementation.
|
Can one of the admins verify this patch? |
There was a problem hiding this comment.
Mind formatting as the rest of the code base?
|
Thanks a lot ,I will reformat it |
Reformat the code as intent 4
Reformat the intent and annotation
|
Hi Zongheng, I reformat the code .I don't know if that is ok. And i hope you can give me more suggestions . Thanks a lot |
There was a problem hiding this comment.
Indent 4 spaces. Also I'd go with the full more descriptive name instead of BFB since we are only going to have to type it out in like 2 places.
|
Hi all ,I have resolve the conflict . I don't know if this pr has the value to be merged |
|
Hi @YanjieGao, thank for working on this! I think it would be great to support this optimization. However, I think the hardest part here is going to be figuring how to hook this into the planner such that it is chosen when the data requires it, and otherwise we use the standard join algorithms. Since I think that is going to a pretty large task, perhaps it would be best to close this issue for now and revisit it when we have a full design for how choose join operators. |
|
Hi marmbrus , Got it , if i have some other good idea i will try to communicate with you ,Thanks ,I will close it latter. |
apache#1127) Co-authored-by: Rostyslav Sotnychenko <rsotnychenko@mapr.com>
Hi ,All .
JIRA:https://issues.apache.org/jira/browse/SPARK-2240
I want to submit a join operator called
LeftSemiBloomFilterBroadcastJoin (LeftSemiJoinBFB)
Sometimes the Semijoin's broadcast table can't fit memory.So we can make it as Bloomfilter to reduce the space and then broadcast it do the mapside join .
Some code reference HashJoin and BroadcastNestedLoopJoin implementation.
The bloomfilter code use Shark's BloomFilter class implementation.