Skip to content

[VL] Add a bad test case when bloom_filter_agg is fallen back while might_contain is not#5433

Merged
zhztheplayer merged 1 commit into
apache:mainfrom
zhztheplayer:wip-bloomagg
Apr 17, 2024
Merged

[VL] Add a bad test case when bloom_filter_agg is fallen back while might_contain is not#5433
zhztheplayer merged 1 commit into
apache:mainfrom
zhztheplayer:wip-bloomagg

Conversation

@zhztheplayer

@zhztheplayer zhztheplayer commented Apr 17, 2024

Copy link
Copy Markdown
Member

So far the case fails with error

org.apache.gluten.exception.GlutenException: java.lang.RuntimeException: Exception: VeloxUserError
Error Source: USER
Error Code: INVALID_ARGUMENT
Reason: (1 vs. 0)
Retriable: False
Expression: kBloomFilterV1 == version
Function: merge
File: ../.././velox/common/base/BloomFilter.h
Line: 67
Stack trace:
# 0  std::shared_ptr<facebook::velox::VeloxException::State const> facebook::velox::VeloxException::State::make<facebook::velox::VeloxException::make(char const*, unsigned long, char const*, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, bool, facebook::velox::VeloxException::Type, std::basic_string_view<char, std::char_traits<char> >)::{lambda(auto:1&)#1}>(facebook::velox::VeloxException::Type, facebook::velox::VeloxException::make(char const*, unsigned long, char const*, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, bool, facebook::velox::VeloxException::Type, std::basic_string_view<char, std::char_traits<char> >)::{lambda(auto:1&)#1})
# 1  facebook::velox::VeloxException::VeloxException(char const*, unsigned long, char const*, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, bool, facebook::velox::VeloxException::Type, std::basic_string_view<char, std::char_traits<char> >)
# 2  facebook::velox::VeloxUserError::VeloxUserError(char const*, unsigned long, char const*, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, bool, std::basic_string_view<char, std::char_traits<char> >)
# 3  void facebook::velox::detail::veloxCheckFail<facebook::velox::VeloxUserError, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&>(facebook::velox::detail::VeloxCheckFailArgs const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
# 4  facebook::velox::BloomFilter<std::allocator<unsigned long> >::merge(char const*)
# 5  facebook::velox::functions::sparksql::BloomFilterMightContainFunction<facebook::velox::exec::VectorExec>::initialize(std::vector<std::shared_ptr<facebook::velox::Type const>, std::allocator<std::shared_ptr<facebook::velox::Type const> > > const&, facebook::velox::core::QueryConfig const&, facebook::velox::StringView const*, long const*)
# 6  void facebook::velox::exec::SimpleFunctionAdapter<facebook::velox::core::UDFHolder<facebook::velox::functions::sparksql::BloomFilterMightContainFunction<facebook::velox::exec::VectorExec>, facebook::velox::exec::VectorExec, bool, facebook::velox::ConstantChecker<facebook::velox::Varbinary, long>, facebook::velox::Varbinary, long> >::unpackInitialize<2, facebook::velox::StringView, long>(std::vector<std::shared_ptr<facebook::velox::Type const>, std::allocator<std::shared_ptr<facebook::velox::Type const> > > const&, facebook::velox::core::QueryConfig const&, std::vector<std::shared_ptr<facebook::velox::BaseVector>, std::allocator<std::shared_ptr<facebook::velox::BaseVector> > > const&, facebook::velox::StringView const*, long const*) const
# 7  void facebook::velox::exec::SimpleFunctionAdapter<facebook::velox::core::UDFHolder<facebook::velox::functions::sparksql::BloomFilterMightContainFunction<facebook::velox::exec::VectorExec>, facebook::velox::exec::VectorExec, bool, facebook::velox::ConstantChecker<facebook::velox::Varbinary, long>, facebook::velox::Varbinary, long> >::unpackInitialize<1, facebook::velox::StringView>(std::vector<std::shared_ptr<facebook::velox::Type const>, std::allocator<std::shared_ptr<facebook::velox::Type const> > > const&, facebook::velox::core::QueryConfig const&, std::vector<std::shared_ptr<facebook::velox::BaseVector>, std::allocator<std::shared_ptr<facebook::velox::BaseVector> > > const&, facebook::velox::StringView const*) const
# 8  void facebook::velox::exec::SimpleFunctionAdapter<facebook::velox::core::UDFHolder<facebook::velox::functions::sparksql::BloomFilterMightContainFunction<facebook::velox::exec::VectorExec>, facebook::velox::exec::VectorExec, bool, facebook::velox::ConstantChecker<facebook::velox::Varbinary, long>, facebook::velox::Varbinary, long> >::unpackInitialize<0>(std::vector<std::shared_ptr<facebook::velox::Type const>, std::allocator<std::shared_ptr<facebook::velox::Type const> > > const&, facebook::velox::core::QueryConfig const&, std::vector<std::shared_ptr<facebook::velox::BaseVector>, std::allocator<std::shared_ptr<facebook::velox::BaseVector> > > const&) const
# 9  facebook::velox::exec::SimpleFunctionAdapter<facebook::velox::core::UDFHolder<facebook::velox::functions::sparksql::BloomFilterMightContainFunction<facebook::velox::exec::VectorExec>, facebook::velox::exec::VectorExec, bool, facebook::velox::ConstantChecker<facebook::velox::Varbinary, long>, facebook::velox::Varbinary, long> >::SimpleFunctionAdapter(std::vector<std::shared_ptr<facebook::velox::Type const>, std::allocator<std::shared_ptr<facebook::velox::Type const> > > const&, facebook::velox::core::QueryConfig const&, std::vector<std::shared_ptr<facebook::velox::BaseVector>, std::allocator<std::shared_ptr<facebook::velox::BaseVector> > > const&)
# 10 std::_MakeUniq<facebook::velox::exec::SimpleFunctionAdapter<facebook::velox::core::UDFHolder<facebook::velox::functions::sparksql::BloomFilterMightContainFunction<facebook::velox::exec::VectorExec>, facebook::velox::exec::VectorExec, bool, facebook::velox::ConstantChecker<facebook::velox::Varbinary, long>, facebook::velox::Varbinary, long> > >::__single_object std::make_unique<facebook::velox::exec::SimpleFunctionAdapter<facebook::velox::core::UDFHolder<facebook::velox::functions::sparksql::BloomFilterMightContainFunction<facebook::velox::exec::VectorExec>, facebook::velox::exec::VectorExec, bool, facebook::velox::ConstantChecker<facebook::velox::Varbinary, long>, facebook::velox::Varbinary, long> >, std::vector<std::shared_ptr<facebook::velox::Type const>, std::allocator<std::shared_ptr<facebook::velox::Type const> > > const&, facebook::velox::core::QueryConfig const&, std::vector<std::shared_ptr<facebook::velox::BaseVector>, std::allocator<std::shared_ptr<facebook::velox::BaseVector> > > const&>(std::vector<std::shared_ptr<facebook::velox::Type const>, std::allocator<std::shared_ptr<facebook::velox::Type const> > > const&, facebook::velox::core::QueryConfig const&, std::vector<std::shared_ptr<facebook::velox::BaseVector>, std::allocator<std::shared_ptr<facebook::velox::BaseVector> > > const&)
# 11 facebook::velox::exec::SimpleFunctionAdapterFactoryImpl<facebook::velox::core::UDFHolder<facebook::velox::functions::sparksql::BloomFilterMightContainFunction<facebook::velox::exec::VectorExec>, facebook::velox::exec::VectorExec, bool, facebook::velox::ConstantChecker<facebook::velox::Varbinary, long>, facebook::velox::Varbinary, long> >::createVectorFunction(std::vector<std::shared_ptr<facebook::velox::Type const>, std::allocator<std::shared_ptr<facebook::velox::Type const> > > const&, std::vector<std::shared_ptr<facebook::velox::BaseVector>, std::allocator<std::shared_ptr<facebook::velox::BaseVector> > > const&, facebook::velox::core::QueryConfig const&) const
# 12 facebook::velox::exec::(anonymous namespace)::compileRewrittenExpression(std::shared_ptr<facebook::velox::core::ITypedExpr const> const&, facebook::velox::exec::(anonymous namespace)::Scope*, facebook::velox::core::QueryConfig const&, facebook::velox::memory::MemoryPool*, std::unordered_set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, bool)
# 13 facebook::velox::exec::(anonymous namespace)::compileExpression(std::shared_ptr<facebook::velox::core::ITypedExpr const> const&, facebook::velox::exec::(anonymous namespace)::Scope*, facebook::velox::core::QueryConfig const&, facebook::velox::memory::MemoryPool*, std::unordered_set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, bool)
# 14 facebook::velox::exec::compileExpressions(std::vector<std::shared_ptr<facebook::velox::core::ITypedExpr const>, std::allocator<std::shared_ptr<facebook::velox::core::ITypedExpr const> > > const&, facebook::velox::core::ExecCtx*, facebook::velox::exec::ExprSet*, bool)
# 15 facebook::velox::exec::ExprSet::ExprSet(std::vector<std::shared_ptr<facebook::velox::core::ITypedExpr const>, std::allocator<std::shared_ptr<facebook::velox::core::ITypedExpr const> > > const&, facebook::velox::core::ExecCtx*, bool)
# 16 std::_MakeUniq<facebook::velox::exec::ExprSet>::__single_object std::make_unique<facebook::velox::exec::ExprSet, std::vector<std::shared_ptr<facebook::velox::core::ITypedExpr const>, std::allocator<std::shared_ptr<facebook::velox::core::ITypedExpr const> > >, facebook::velox::core::ExecCtx*&>(std::vector<std::shared_ptr<facebook::velox::core::ITypedExpr const>, std::allocator<std::shared_ptr<facebook::velox::core::ITypedExpr const> > >&&, facebook::velox::core::ExecCtx*&)
# 17 facebook::velox::exec::makeExprSetFromFlag(std::vector<std::shared_ptr<facebook::velox::core::ITypedExpr const>, std::allocator<std::shared_ptr<facebook::velox::core::ITypedExpr const> > >&&, facebook::velox::core::ExecCtx*)
# 18 facebook::velox::exec::FilterProject::initialize()
# 19 facebook::velox::exec::Driver::initializeOperators()
# 20 facebook::velox::exec::Driver::runInternal(std::shared_ptr<facebook::velox::exec::Driver>&, std::shared_ptr<facebook::velox::exec::BlockingState>&, std::shared_ptr<facebook::velox::RowVector>&)
# 21 facebook::velox::exec::Driver::next(std::shared_ptr<facebook::velox::exec::BlockingState>&)
# 22 facebook::velox::exec::Task::next(folly::SemiFuture<folly::Unit>*)
# 23 gluten::WholeStageResultIterator::next()
# 24 gluten::ResultIterator::getNext()
# 25 gluten::ResultIterator::hasNext()
# 26 Java_org_apache_gluten_vectorized_ColumnarBatchOutIterator_nativeHasNext
# 27 0x00007ff779018507

	at org.apache.gluten.vectorized.GeneralOutIterator.hasNext(GeneralOutIterator.java:39)
	at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:45)
	at org.apache.gluten.utils.IteratorCompleter.hasNext(Iterators.scala:69)
	at org.apache.gluten.utils.PayloadCloser.hasNext(Iterators.scala:35)
	at org.apache.gluten.utils.PipelineTimeAccumulator.hasNext(Iterators.scala:98)
	at scala.collection.Iterator.isEmpty(Iterator.scala:387)
	at scala.collection.Iterator.isEmpty$(Iterator.scala:387)
	at org.apache.gluten.utils.PipelineTimeAccumulator.isEmpty(Iterators.scala:88)
	at org.apache.gluten.execution.VeloxColumnarToRowExec$.toRowIterator(VeloxColumnarToRowExec.scala:119)
	at org.apache.gluten.execution.VeloxColumnarToRowExec.$anonfun$doExecuteInternal$1(VeloxColumnarToRowExec.scala:83)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:855)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:855)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:136)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)

Refer to our previous effort on co-fallback of blook_filter_agg and might_contain: #3994

@github-actions

Copy link
Copy Markdown

Run Gluten Clickhouse CI

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a nit change

@apache apache deleted a comment from github-actions Bot Apr 17, 2024
@github-actions

Copy link
Copy Markdown

Run Gluten Clickhouse CI

@GlutenPerfBot

Copy link
Copy Markdown
Contributor

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query log/native_5433_time.csv log/native_master_04_16_2024_33d183203_time.csv difference percentage
q1 38.23 38.05 -0.183 99.52%
q2 23.70 24.39 0.691 102.91%
q3 37.27 37.50 0.237 100.64%
q4 37.43 38.79 1.366 103.65%
q5 70.88 70.62 -0.256 99.64%
q6 7.23 9.68 2.446 133.81%
q7 83.45 83.76 0.312 100.37%
q8 85.60 86.47 0.874 101.02%
q9 123.56 124.25 0.691 100.56%
q10 45.86 47.10 1.248 102.72%
q11 19.80 20.01 0.211 101.07%
q12 27.65 30.02 2.373 108.58%
q13 54.96 53.95 -1.007 98.17%
q14 19.35 17.11 -2.244 88.41%
q15 30.26 29.43 -0.834 97.24%
q16 13.82 13.85 0.032 100.23%
q17 99.56 101.18 1.623 101.63%
q18 143.81 144.50 0.682 100.47%
q19 16.22 16.75 0.523 103.23%
q20 26.87 26.88 0.006 100.02%
q21 289.97 287.81 -2.166 99.25%
q22 16.05 16.10 0.041 100.26%
total 1311.52 1318.19 6.667 100.51%

.toDF("col")
.createOrReplaceTempView(table)
withSQLConf(
GlutenConfig.COLUMNAR_HASHAGG_ENABLED.key -> "false"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is the only case that triggers bloom_filter_agg fallback?
If yes we may change the flag for native bloom filter as below:
def enableNativeBloomFilter: Boolean = conf.getConf(COLUMNAR_NATIVE_BLOOMFILTER_ENABLED) && conf.getConf(COLUMNAR_HASHAGG_ENABLED)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is the only case that triggers bloom_filter_agg fallback?

Probably there are still some cases making agg fallback, e.g., validation failures by other agg functions. Since the agg and might_contain are not in the same query/sub-query, plus taking AQE on/off and other validation/transformation rules into account, doing such co-fallback can be a very dirty work. Let's continue with the new approach introduced in #5435 to let vanilla Spark be able to run Velox's bloom filter then we can thoroughly solve all the issues related to bloom filter mismatch including these fallback problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants