[GLUTEN-6650] Remove bloomfilter from partition filter#6652
[GLUTEN-6650] Remove bloomfilter from partition filter#6652WangGuangxin wants to merge 1 commit into
Conversation
|
Run Gluten Clickhouse CI |
| partitionFilters.filter(isDynamicPruningFilter) | ||
| partitionFilters | ||
| .filter(isDynamicPruningFilter) | ||
| .filterNot(isBloomFilterMightContain) |
There was a problem hiding this comment.
Do bloom filters still take effect in Gluten if we do this?
There was a problem hiding this comment.
It doesn't task effect in this case, and partition filter will not re-evalute in following FilterExec or other operator.
There was a problem hiding this comment.
So the PR completely disables runtime bloom-filter in Gluten? Am I missing something?
There was a problem hiding this comment.
completely
@zhztheplayer Only disable the case when bloomfilter is applied on hive table's partition column ( join key is hive table's partition column).
In most case, bloom filter is applied on data column, which is not affected by this PR.
I remember we had made Velox bloom-filter functions runnable in vanilla Spark operators so there shouldn't have compatibility issues. #5435. Am I missing something? Or there is a bug? |
@zhztheplayer Got it. Let me check. |
@zhztheplayer I digged into it, the main reason is that in this case, its a partition filter and the bloom filter is evaluted in driver side, so we need to change VeloxBloomFilterMightContain's codegen from to Otherwise, the But after this change, the and |
|
This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days. |
|
This PR was auto-closed because it has been stalled for 10 days with no activity. Please feel free to reopen if it is still valid. Thanks. |
|
Sorry for missing the latest message and I am now revisiting. Thanks! @WangGuangxin |
What changes were proposed in this pull request?
If
BloomFilterMightContainsis pushed down to partition filter, it's executed in Spark driver using codegen. But since the bloom filter contructed by native are different with Spark, a NPE will throw in this case(Fixes: #6650)
How was this patch tested?
UT
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)