[fix](nereids) partition topn optimization requires all window expressions are in the same order#56622
Merged
Merged
Conversation
Contributor
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
Contributor
Author
|
run buildall |
morrySnow
reviewed
Sep 29, 2025
| } | ||
| } | ||
| // check all windowExpression's order key is empty or is the same as chosenWindowFunc's order key | ||
| for (NamedExpression windowExpr : windowExpressions) { |
ClickBench: Total hot run time: 30.55 s |
Contributor
FE UT Coverage ReportIncrement line coverage |
6f2b4a0 to
6ca52cc
Compare
Contributor
Author
|
run buildall |
TPC-DS: Total hot run time: 190309 ms |
ClickBench: Total hot run time: 30.35 s |
Contributor
FE UT Coverage ReportIncrement line coverage |
Contributor
FE Regression Coverage ReportIncrement line coverage |
starocean999
approved these changes
Oct 14, 2025
Contributor
|
PR approved by at least one committer and no changes requested. |
Contributor
|
PR approved by anyone and no changes requested. |
zhangstar333
approved these changes
Oct 15, 2025
github-actions Bot
pushed a commit
that referenced
this pull request
Oct 15, 2025
…sions are in the same order (#56622) ### What problem does this PR solve? fix bug: partition topn optimization requires all window expressions are in the same order
Closed
github-actions Bot
pushed a commit
that referenced
this pull request
Nov 17, 2025
…sions are in the same order (#56622) ### What problem does this PR solve? fix bug: partition topn optimization requires all window expressions are in the same order
yiguolei
pushed a commit
to yiguolei/incubator-doris
that referenced
this pull request
Dec 30, 2025
… requires all window expressions are in the same order apache#56622 (apache#5752) cherry-picks from apache#56622 Related to apache#5098 Co-authored-by: minghong <zhouminghong@selectdb.com>
5 tasks
CalvinKirs
added a commit
to CalvinKirs/incubator-doris
that referenced
this pull request
Jun 24, 2026
…on key to be a subset of co-located ones
When a single LogicalWindow holds multiple window functions, a filter such as
`rn <= k` may be turned into a partitionTopN and pushed below the whole window
node. The generated partitionTopN keeps the per-partition top-k of the chosen
window function and thus prunes the input rows shared by ALL co-located window
functions.
This is only correct when the chosen window function's partition key is a
SUBSET of every other co-located window function's partition key (i.e. the
chosen one is coarser). Then any row that could change another window's value
for a surviving row lies in the same chosen-partition with an order value not
greater than the surviving row's, so its chosen-rank is within top-k and it is
kept. When this does not hold the pruning drops rows the other windows still
need and produces a wrong result, e.g.
row_number() over (partition by g1 order by c) as rn1, -- chosen
row_number() over (partition by g2 order by c) as rn2 -- independent
row_number() over (partition by c1, c2 order by c) as rn, -- chosen (finer)
rank() over (partition by c1 order by c) as rk -- coarser
getPushDownWindowFuncAndLimit() previously only required the order keys of all
co-located window functions to be compatible (apache#56622). It now also requires the
partition keys to satisfy the above subset relation, otherwise the optimization
is disabled.
CalvinKirs
added a commit
that referenced
this pull request
Jun 27, 2026
…on key to be a subset of co-located ones (#64764) ### What problem does this PR solve? When a single `LogicalWindow` holds multiple window functions, a filter like `rn <= k` may be converted into a `partitionTopN` and pushed **below the whole window node**. The generated `partitionTopN` keeps the per-partition top-k of the *chosen* window function, so it prunes the input rows that are **shared by all co-located window functions**. This is only correct when the chosen window function's partition key is a **subset** of every other co-located window function's partition key (i.e. the chosen one is *coarser*). Otherwise the pruning drops rows the other windows still need and produces a wrong result, e.g. ```sql -- independent partitions: chosen rn1(g1) prunes rows rn2(g2) needs row_number() over (partition by g1 order by ord_key) as rn1, -- chosen row_number() over (partition by g2 order by ord_key) as rn2 -- chosen is finer than a co-located window: chosen rn(c1,c2) prunes rows rk(c1) needs row_number() over (partition by c1, c2 order by c3) as rn, -- chosen (finer) rank() over (partition by c1 order by c3) as rk -- coarser ``` `LogicalWindow.getPushDownWindowFuncAndLimit()` already required the **order keys** of all co-located window functions to be compatible (#56622), but it never checked the **partition keys**. This PR additionally requires `windowFunc.getPartitionKeys().containsAll(chosenWindowFunc.getPartitionKeys())` for every co-located window function; otherwise the partition-topn optimization is disabled. ### Why the subset rule is correct (and not just equality) The pruning keeps the per-`P0` (chosen partition) top-k by the order key. For another window function `W` partitioned by `P1`: - **`P0 ⊆ P1` (chosen coarser) → safe.** Any row that could change `W`'s value for a surviving row `r` is in the same `P1` partition with a smaller order value; being in the same `P1` it is also in the same `P0` partition with order `<= r`, so its `P0`-rank is within top-k and it is **kept**. Nothing `W` needs is pruned. - **`P0 ⊋ P1` or independent → unsafe.** A finer/independent `P0` can drop a row that ranks early in `P1`, corrupting `W`. So equality is unnecessarily strict; the precise safe condition is the **subset** relation, expressed with `containsAll` (which also makes the check order-insensitive in the partition-key list, matching the set semantics of `PARTITION BY`). ### When the optimization still applies Single window, multiple windows with the **same** partition key, and the **subset** case above all keep firing. Example where the chosen `rank(partition by g1)` is coarser than `row_number(partition by g1, g2)`, so a `VPartitionTopN(partition by g1)` is still generated: ```sql select id, g1, g2, ord_key, rk, rn from ( select id, g1, g2, ord_key, rank() over (partition by g1 order by ord_key) as rk, -- chosen (coarser) row_number() over (partition by g1, g2 order by ord_key) as rn from multi_window_cases ) q where rk <= 2; ``` ``` 6:VANALYTIC partition by: g1, g2 order by: ord_key <- computes rn 4:VANALYTIC partition by: g1 order by: ord_key <- computes rk 1:VPartitionTopN partition by: g1 order by: ord_key partition limit: 2 <- safe (g1 ⊆ {g1,g2}) 0:VOlapScanNode ``` ### Reproduce (the wrong-result case) ```sql CREATE TABLE multi_window_cases ( id INT, g1 VARCHAR(8), g2 VARCHAR(8), ord_key INT, amt INT ) DUPLICATE KEY(id) DISTRIBUTED BY HASH(id) BUCKETS 1 PROPERTIES ("replication_num" = "1"); INSERT INTO multi_window_cases VALUES (1,'A','X',1,10),(2,'A','X',2,20),(3,'A','Y',3,30),(4,'B','X',4,40), (5,'B','Y',5,50),(6,'B','Y',6,60),(7,'C','X',7,70),(8,'C','Z',8,80); SELECT id, g1, g2, ord_key, rn1, rn2 FROM ( SELECT id, g1, g2, ord_key, row_number() OVER (PARTITION BY g1 ORDER BY ord_key) AS rn1, row_number() OVER (PARTITION BY g2 ORDER BY ord_key) AS rn2 FROM multi_window_cases ) q WHERE rn1 <= 1 ORDER BY id; ``` Wrong result (before), `rn2` should be `1,3,4`: ``` +----+----+----+---------+-----+-----+ | id | g1 | g2 | ord_key | rn1 | rn2 | +----+----+----+---------+-----+-----+ | 1 | A | X | 1 | 1 | 1 | | 4 | B | X | 4 | 1 | 2 | <- wrong (should be 3) | 7 | C | X | 7 | 1 | 3 | <- wrong (should be 4) +----+----+----+---------+-----+-----+ ``` Correct result (after, matches MySQL 8.4): ``` +------+------+------+---------+-----+-----+ | id | g1 | g2 | ord_key | rn1 | rn2 | +------+------+------+---------+-----+-----+ | 1 | A | X | 1 | 1 | 1 | | 4 | B | X | 4 | 1 | 3 | | 7 | C | X | 7 | 1 | 4 | +------+------+------+---------+-----+-----+ ``` ### EXPLAIN before (buggy) A `VPartitionTopN(partition by g1)` is inserted **below both analytic nodes**, so it prunes rows before `rn2` is computed: ``` 8:VSORT order by: id 7:VANALYTIC partition by: g2, order by: ord_key <- computes rn2 | predicates: (rn1 <= 1) 6:VSORT order by: g2, ord_key 4:VANALYTIC partition by: g1, order by: ord_key <- computes rn1 3:VSORT order by: g1, ord_key 1:VPartitionTopN partition by: g1, order by: ord_key <- prunes input (WRONG) 0:VOlapScanNode ``` ### EXPLAIN after (fixed) No `VPartitionTopN`; both window functions are computed over the full input and `rn1 <= 1` stays as an ordinary predicate above them: ``` 7:VSORT order by: id 6:VANALYTIC partition by: g2, order by: ord_key <- computes rn2 (full input) | predicates: (rn1 <= 1) 5:VSORT order by: g2, ord_key 3:VANALYTIC partition by: g1, order by: ord_key <- computes rn1 2:VSORT order by: g1, ord_key 0:VOlapScanNode ``` ### Release note Fix wrong result of multiple window functions (`row_number`/`rank`/`dense_rank`) with incompatible partition keys when a top-n filter (e.g. `rn <= k`) is applied; the partition-topn pushdown is now restricted to the cases where it is provably safe (the chosen function's partition key is a subset of the co-located ones). ### Check List (For author) - [x] Test - [x] Regression test (`regression-test/suites/query_p0/partition_topn/check_partitionkey.groovy`, `.../push_down_filter_through_window/push_down_multi_filter_through_window.groovy`) - [x] Unit test (`GeneratePartitionTopnFromWindowTest`: `testMultipleWindowsWithDifferentPartitions`, `testMultipleWindowsSubsetPartitionGeneratesTopn`) - [x] Behavior changed: - [x] Function behavior changed (returns correct results for the cases above)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)