[VL] Add a bad test case when bloom_filter_agg is fallen back while might_contain is not#5433
Conversation
|
Run Gluten Clickhouse CI |
36b4524 to
1ec4c7e
Compare
|
Run Gluten Clickhouse CI |
|
===== Performance report for TPCH SF2000 with Velox backend, for reference only ====
|
| .toDF("col") | ||
| .createOrReplaceTempView(table) | ||
| withSQLConf( | ||
| GlutenConfig.COLUMNAR_HASHAGG_ENABLED.key -> "false" |
There was a problem hiding this comment.
If this is the only case that triggers bloom_filter_agg fallback?
If yes we may change the flag for native bloom filter as below:
def enableNativeBloomFilter: Boolean = conf.getConf(COLUMNAR_NATIVE_BLOOMFILTER_ENABLED) && conf.getConf(COLUMNAR_HASHAGG_ENABLED)
There was a problem hiding this comment.
If this is the only case that triggers bloom_filter_agg fallback?
Probably there are still some cases making agg fallback, e.g., validation failures by other agg functions. Since the agg and might_contain are not in the same query/sub-query, plus taking AQE on/off and other validation/transformation rules into account, doing such co-fallback can be a very dirty work. Let's continue with the new approach introduced in #5435 to let vanilla Spark be able to run Velox's bloom filter then we can thoroughly solve all the issues related to bloom filter mismatch including these fallback problems.
So far the case fails with error
Refer to our previous effort on co-fallback of
blook_filter_aggandmight_contain: #3994