[GLUTEN-6650] Remove bloomfilter from partition filter by WangGuangxin · Pull Request #6652 · apache/gluten

WangGuangxin · 2024-07-31T05:14:17Z

What changes were proposed in this pull request?

If BloomFilterMightContains is pushed down to partition filter, it's executed in Spark driver using codegen. But since the bloom filter contructed by native are different with Spark, a NPE will throw in this case

(Fixes: #6650)

How was this patch tested?

UT

(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

github-actions · 2024-07-31T05:14:34Z

#6650

github-actions · 2024-07-31T05:14:52Z

Run Gluten Clickhouse CI

zhztheplayer · 2024-07-31T07:30:34Z

-      partitionFilters.filter(isDynamicPruningFilter)
+      partitionFilters
+        .filter(isDynamicPruningFilter)
+        .filterNot(isBloomFilterMightContain)


Do bloom filters still take effect in Gluten if we do this?

It doesn't task effect in this case, and partition filter will not re-evalute in following FilterExec or other operator.

So the PR completely disables runtime bloom-filter in Gluten? Am I missing something?

completely

@zhztheplayer Only disable the case when bloomfilter is applied on hive table's partition column ( join key is hive table's partition column).

In most case, bloom filter is applied on data column, which is not affected by this PR.

zhztheplayer · 2024-08-13T09:01:15Z

But since the bloom filter contructed by native are different with Spark, a NPE will throw in this case

I remember we had made Velox bloom-filter functions runnable in vanilla Spark operators so there shouldn't have compatibility issues. #5435. Am I missing something? Or there is a bug?

WangGuangxin · 2024-08-14T07:33:22Z

But since the bloom filter contructed by native are different with Spark, a NPE will throw in this case

I remember we had made Velox bloom-filter functions runnable in vanilla Spark operators so there shouldn't have compatibility issues. #5435. Am I missing something? Or there is a bug?

@zhztheplayer Got it. Let me check.

WangGuangxin · 2024-08-14T09:54:14Z

But since the bloom filter contructed by native are different with Spark, a NPE will throw in this case

I remember we had made Velox bloom-filter functions runnable in vanilla Spark operators so there shouldn't have compatibility issues. #5435. Am I missing something? Or there is a bug?

@zhztheplayer Got it. Let me check.

@zhztheplayer I digged into it, the main reason is that in this case, its a partition filter and the bloom filter is evaluted in driver side, so we need to change VeloxBloomFilterMightContain's codegen from

val bf = ctx.addMutableState(className, "bloomFilter")
    ctx.addPartitionInitializationStatement(s"$bf = $className.readFrom($bfData);")

to

val bf = ctx.addMutableState(className, "bloomFilter", bf => s"$bf = $className.readFrom($bfData);")

Otherwise, the bf object is not initialized when executed in driver and a NPE is thrown.

But after this change, the VeloxBloomFilter still cannot evaluted on driver side, since most of JNI and Resource Management will check whether we are in a spark task or not.
such as

Runtimes.contextInstance

and

Spillable reservation listener must be used in a Spark task.

github-actions · 2024-09-29T02:02:43Z

This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days.

github-actions · 2024-10-10T01:58:14Z

This PR was auto-closed because it has been stalled for 10 days with no activity. Please feel free to reopen if it is still valid. Thanks.

zhztheplayer · 2025-06-26T06:15:32Z

Sorry for missing the latest message and I am now revisiting. Thanks! @WangGuangxin

Remove bloomfilter from partition filter

c9f6f57

zhztheplayer reviewed Jul 31, 2024

View reviewed changes

github-actions Bot added the stale stale label Sep 29, 2024

github-actions Bot closed this Oct 10, 2024

zhouyuan mentioned this pull request Jun 23, 2025

[GLUTEN-9849][VL] Avoid VeloxBloomFilterMightContain being applied to FileSourceScan partition filters #9850

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[GLUTEN-6650] Remove bloomfilter from partition filter#6652

[GLUTEN-6650] Remove bloomfilter from partition filter#6652
WangGuangxin wants to merge 1 commit into
apache:mainfrom
WangGuangxin:fix_bloomfilter

WangGuangxin commented Jul 31, 2024

Uh oh!

github-actions Bot commented Jul 31, 2024

Uh oh!

github-actions Bot commented Jul 31, 2024

Uh oh!

zhztheplayer Jul 31, 2024

Uh oh!

WangGuangxin Jul 31, 2024

Uh oh!

zhztheplayer Aug 1, 2024

Uh oh!

WangGuangxin Aug 1, 2024 •

edited

Loading

Uh oh!

zhztheplayer commented Aug 13, 2024 •

edited

Loading

Uh oh!

WangGuangxin commented Aug 14, 2024

Uh oh!

WangGuangxin commented Aug 14, 2024

Uh oh!

github-actions Bot commented Sep 29, 2024

Uh oh!

github-actions Bot commented Oct 10, 2024

Uh oh!

zhztheplayer commented Jun 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

WangGuangxin commented Jul 31, 2024

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

github-actions Bot commented Jul 31, 2024

Uh oh!

github-actions Bot commented Jul 31, 2024

Uh oh!

zhztheplayer Jul 31, 2024

Choose a reason for hiding this comment

Uh oh!

WangGuangxin Jul 31, 2024

Choose a reason for hiding this comment

Uh oh!

zhztheplayer Aug 1, 2024

Choose a reason for hiding this comment

Uh oh!

WangGuangxin Aug 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zhztheplayer commented Aug 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

WangGuangxin commented Aug 14, 2024

Uh oh!

WangGuangxin commented Aug 14, 2024

Uh oh!

github-actions Bot commented Sep 29, 2024

Uh oh!

github-actions Bot commented Oct 10, 2024

Uh oh!

zhztheplayer commented Jun 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

WangGuangxin Aug 1, 2024 •

edited

Loading

zhztheplayer commented Aug 13, 2024 •

edited

Loading