fix: split "union" and "interleave"#6045
Conversation
Silently interleaving partitions is confusing and causes bugs. Fixes apache#5970.
846a867 to
6c0ae5f
Compare
|
Could you please add a UT for the rule itself ? |
|
I'm afraid the current approach will not work in below case, additional Hash Repartition will still be added: select count(*) from {
select distinct name from table t1
union all
select distinct name from table t2
) group by name |
|
will add a unit test for the optimizer. will also try to add a test for the query listed above. I think this can be fixed by pushing the |
|
I have this working locally by changing the optimizer. Instead of having an extra optimizer, this has to be integrated into the "dist enforcement" transform-up system. I'll need to add more tests though and clean up the code, but I think the result is satisfying. |
Yes, this should works, thanks! |
|
LGTM, except for some wrong comments. |
4b0e2ba to
5a03a96
Compare
This avoids a top-level repartition in the following query: ```sql SELECT c1, c9 FROM aggregate_test_100 UNION ALL SELECT c1, c3 FROM aggregate_test_100 ORDER BY c9 DESC LIMIT 5 ```
5a03a96 to
7818c18
Compare
|
Does the new @mustafasrepo, PTAL |
|
|
I think this PR can be merged to main. |
|
Thanks @mingmwang |
Which issue does this PR close?
Fixes #5970.
This is an alternative to #6009. Closes #6009
Rationale for this change
UnionExecshall be a simple node and not perform some optimizations on its own. If repartition / partition concatenation is desired, then there should be an optimizer pass for that. Being a simple node avoids confusion and optimizer bugs like #5970.What changes are included in this PR?
Remove the entire "partition aware" logic from
UnionExecAre these changes tested?
Are there any user-facing changes?
Less bugs.