[fix](nereids) partition topn optimization requires all window expressions are in the same order by englefly · Pull Request #56622 · apache/doris

englefly · 2025-09-29T06:01:05Z

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

Test
- Regression test
- Unit Test
- Manual test (add detailed scripts or steps below)
- No need to test or manual test. Explain why:
  - This is a refactor/code format and no logic has been changed.
  - Previous test can cover this change.
  - No code files have been changed.
  - Other reason
Behavior changed:
- No.
- Yes.
Does this need documentation?
- No.
- Yes.

Check List (For Reviewer who merge this PR)

Confirm the release note
Confirm test cases
Confirm document
Add branch pick label

Thearas · 2025-09-29T06:01:11Z

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

What problem was fixed (it's best to include specific error reporting information). How it was fixed.
Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
What features were added. Why was this function added?
Which code was refactored and why was this part of the code refactored?
Which functions were optimized and what is the difference before and after the optimization?

englefly · 2025-09-29T06:01:23Z

run buildall

morrySnow · 2025-09-29T06:02:28Z

            }
        }
+        // check all windowExpression's order key is empty or is the same as chosenWindowFunc's order key
+        for (NamedExpression windowExpr : windowExpressions) {


doris-robot · 2025-09-29T07:05:42Z

ClickBench: Total hot run time: 30.55 s

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 6f2b4a036c2a40e8a6da6a05532d415cded85e82, data reload: false

query1	0.06	0.05	0.05
query2	0.09	0.05	0.06
query3	0.25	0.08	0.08
query4	1.61	0.12	0.12
query5	0.28	0.26	0.25
query6	1.19	0.66	0.64
query7	0.03	0.03	0.03
query8	0.06	0.05	0.05
query9	0.64	0.52	0.53
query10	0.57	0.58	0.58
query11	0.17	0.12	0.11
query12	0.15	0.12	0.12
query13	0.63	0.62	0.62
query14	1.04	1.05	1.05
query15	0.87	0.88	0.86
query16	0.41	0.42	0.40
query17	1.04	1.07	1.07
query18	0.21	0.20	0.20
query19	2.00	1.85	1.86
query20	0.01	0.01	0.01
query21	15.41	0.94	0.57
query22	0.76	1.20	0.98
query23	14.77	1.40	0.68
query24	6.95	1.46	0.49
query25	0.46	0.13	0.05
query26	0.66	0.16	0.14
query27	0.07	0.06	0.05
query28	9.44	1.35	0.93
query29	12.57	3.92	3.24
query30	0.28	0.13	0.11
query31	2.82	0.60	0.40
query32	3.24	0.57	0.47
query33	3.12	3.09	3.07
query34	16.19	5.57	4.85
query35	4.91	4.95	4.95
query36	0.68	0.52	0.51
query37	0.10	0.08	0.08
query38	0.07	0.05	0.05
query39	0.04	0.03	0.03
query40	0.17	0.15	0.14
query41	0.08	0.03	0.03
query42	0.04	0.04	0.03
query43	0.05	0.04	0.04
Total cold run time: 104.19 s
Total hot run time: 30.55 s

hello-stephen · 2025-09-29T07:06:28Z

FE UT Coverage Report

Increment line coverage 44.44% (4/9) 🎉
Increment coverage report
Complete coverage report

englefly · 2025-09-29T08:11:06Z

run buildall

doris-robot · 2025-09-29T08:52:46Z

TPC-DS: Total hot run time: 190309 ms

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 6ca52cc92c1d462f3dbd2cd3bb21fd1af9b252ef, data reload: false

query1	1080	441	407	407
query2	6558	1755	1692	1692
query3	6750	221	223	221
query4	26605	23660	23222	23222
query5	5223	691	494	494
query6	347	254	230	230
query7	4674	519	297	297
query8	334	275	264	264
query9	8749	2574	2579	2574
query10	557	345	302	302
query11	16070	15370	14773	14773
query12	193	126	114	114
query13	1689	573	460	460
query14	12665	9132	9237	9132
query15	219	194	172	172
query16	7659	680	466	466
query17	1585	784	644	644
query18	2157	447	376	376
query19	240	211	191	191
query20	136	130	133	130
query21	228	154	121	121
query22	4637	4752	4627	4627
query23	34801	34758	33897	33897
query24	8336	2538	2538	2538
query25	592	545	480	480
query26	1308	295	169	169
query27	3188	520	394	394
query28	4357	2213	2194	2194
query29	817	673	525	525
query30	315	247	221	221
query31	926	884	799	799
query32	84	77	108	77
query33	591	407	344	344
query34	863	879	535	535
query35	841	860	769	769
query36	1035	1075	943	943
query37	133	127	88	88
query38	3596	3546	3530	3530
query39	1483	1401	1405	1401
query40	232	137	122	122
query41	79	76	58	58
query42	122	121	110	110
query43	486	503	461	461
query44	1383	865	829	829
query45	192	180	170	170
query46	887	1066	654	654
query47	1755	1871	1702	1702
query48	391	436	325	325
query49	773	494	401	401
query50	684	722	409	409
query51	4023	4084	3764	3764
query52	109	111	102	102
query53	247	273	210	210
query54	603	588	521	521
query55	89	95	86	86
query56	320	313	318	313
query57	1183	1211	1115	1115
query58	288	283	310	283
query59	2583	2687	2480	2480
query60	356	341	349	341
query61	162	159	187	159
query62	809	730	673	673
query63	246	201	207	201
query64	4540	1256	949	949
query65	4047	3972	3963	3963
query66	1060	424	337	337
query67	15707	15316	15218	15218
query68	8193	971	591	591
query69	500	323	300	300
query70	1406	1264	1349	1264
query71	509	341	316	316
query72	6032	4873	4791	4791
query73	685	596	360	360
query74	9192	9072	8869	8869
query75	4074	3357	2894	2894
query76	3791	1177	744	744
query77	822	423	313	313
query78	9712	9809	8870	8870
query79	2618	878	615	615
query80	694	580	512	512
query81	524	336	227	227
query82	479	160	143	143
query83	269	267	262	262
query84	261	110	93	93
query85	908	469	425	425
query86	393	323	328	323
query87	3793	3752	3670	3670
query88	3707	2254	2216	2216
query89	400	317	295	295
query90	2019	223	228	223
query91	246	163	142	142
query92	95	70	65	65
query93	2141	992	647	647
query94	705	444	335	335
query95	401	322	313	313
query96	495	577	286	286
query97	2928	2967	2861	2861
query98	245	219	218	218
query99	1351	1417	1357	1357
Total cold run time: 284565 ms
Total hot run time: 190309 ms

doris-robot · 2025-09-29T08:58:16Z

ClickBench: Total hot run time: 30.35 s

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 6ca52cc92c1d462f3dbd2cd3bb21fd1af9b252ef, data reload: false

query1	0.06	0.05	0.04
query2	0.09	0.05	0.05
query3	0.25	0.08	0.08
query4	1.60	0.11	0.12
query5	0.27	0.26	0.24
query6	1.15	0.65	0.65
query7	0.03	0.03	0.03
query8	0.06	0.05	0.05
query9	0.61	0.52	0.51
query10	0.58	0.59	0.58
query11	0.18	0.11	0.14
query12	0.16	0.12	0.11
query13	0.64	0.62	0.61
query14	1.03	1.03	1.03
query15	0.88	0.84	0.87
query16	0.40	0.40	0.39
query17	1.06	1.05	1.06
query18	0.22	0.20	0.20
query19	1.92	1.93	1.82
query20	0.02	0.01	0.01
query21	15.42	0.92	0.56
query22	0.77	1.16	0.65
query23	15.00	1.40	0.61
query24	7.09	1.04	0.85
query25	0.50	0.09	0.06
query26	0.65	0.16	0.14
query27	0.06	0.06	0.05
query28	9.53	1.34	0.93
query29	12.62	3.99	3.26
query30	0.28	0.14	0.11
query31	2.82	0.60	0.39
query32	3.26	0.56	0.48
query33	3.09	3.07	3.13
query34	16.18	5.48	4.94
query35	4.91	4.88	4.94
query36	0.70	0.52	0.50
query37	0.10	0.07	0.08
query38	0.07	0.05	0.04
query39	0.04	0.04	0.03
query40	0.19	0.17	0.13
query41	0.09	0.04	0.03
query42	0.04	0.03	0.03
query43	0.05	0.03	0.03
Total cold run time: 104.67 s
Total hot run time: 30.35 s

hello-stephen · 2025-09-29T09:10:39Z

FE UT Coverage Report

Increment line coverage 66.67% (6/9) 🎉
Increment coverage report
Complete coverage report

hello-stephen · 2025-09-29T10:52:39Z

FE Regression Coverage Report

Increment line coverage 66.67% (6/9) 🎉
Increment coverage report
Complete coverage report

github-actions · 2025-10-14T13:25:30Z

PR approved by at least one committer and no changes requested.

github-actions · 2025-10-14T13:25:32Z

PR approved by anyone and no changes requested.

…sions are in the same order (#56622) ### What problem does this PR solve? fix bug: partition topn optimization requires all window expressions are in the same order

…indow expressions are in the same order #56622 (#56976) Cherry-picked from #56622 Co-authored-by: minghong <zhouminghong@selectdb.com>

…sions are in the same order (#56622) ### What problem does this PR solve? fix bug: partition topn optimization requires all window expressions are in the same order

…indow expressions are in the same order #56622 (#58080) Cherry-picked from #56622 Co-authored-by: minghong <zhouminghong@selectdb.com>

… requires all window expressions are in the same order apache#56622 (apache#5752) cherry-picks from apache#56622 Related to apache#5098 Co-authored-by: minghong <zhouminghong@selectdb.com>

…on key to be a subset of co-located ones When a single LogicalWindow holds multiple window functions, a filter such as `rn <= k` may be turned into a partitionTopN and pushed below the whole window node. The generated partitionTopN keeps the per-partition top-k of the chosen window function and thus prunes the input rows shared by ALL co-located window functions. This is only correct when the chosen window function's partition key is a SUBSET of every other co-located window function's partition key (i.e. the chosen one is coarser). Then any row that could change another window's value for a surviving row lies in the same chosen-partition with an order value not greater than the surviving row's, so its chosen-rank is within top-k and it is kept. When this does not hold the pruning drops rows the other windows still need and produces a wrong result, e.g. row_number() over (partition by g1 order by c) as rn1, -- chosen row_number() over (partition by g2 order by c) as rn2 -- independent row_number() over (partition by c1, c2 order by c) as rn, -- chosen (finer) rank() over (partition by c1 order by c) as rk -- coarser getPushDownWindowFuncAndLimit() previously only required the order keys of all co-located window functions to be compatible (apache#56622). It now also requires the partition keys to satisfy the above subset relation, otherwise the optimization is disabled.

…on key to be a subset of co-located ones (#64764) ### What problem does this PR solve? When a single `LogicalWindow` holds multiple window functions, a filter like `rn <= k` may be converted into a `partitionTopN` and pushed **below the whole window node**. The generated `partitionTopN` keeps the per-partition top-k of the *chosen* window function, so it prunes the input rows that are **shared by all co-located window functions**. This is only correct when the chosen window function's partition key is a **subset** of every other co-located window function's partition key (i.e. the chosen one is *coarser*). Otherwise the pruning drops rows the other windows still need and produces a wrong result, e.g. ```sql -- independent partitions: chosen rn1(g1) prunes rows rn2(g2) needs row_number() over (partition by g1 order by ord_key) as rn1, -- chosen row_number() over (partition by g2 order by ord_key) as rn2 -- chosen is finer than a co-located window: chosen rn(c1,c2) prunes rows rk(c1) needs row_number() over (partition by c1, c2 order by c3) as rn, -- chosen (finer) rank() over (partition by c1 order by c3) as rk -- coarser ``` `LogicalWindow.getPushDownWindowFuncAndLimit()` already required the **order keys** of all co-located window functions to be compatible (#56622), but it never checked the **partition keys**. This PR additionally requires `windowFunc.getPartitionKeys().containsAll(chosenWindowFunc.getPartitionKeys())` for every co-located window function; otherwise the partition-topn optimization is disabled. ### Why the subset rule is correct (and not just equality) The pruning keeps the per-`P0` (chosen partition) top-k by the order key. For another window function `W` partitioned by `P1`: - **`P0 ⊆ P1` (chosen coarser) → safe.** Any row that could change `W`'s value for a surviving row `r` is in the same `P1` partition with a smaller order value; being in the same `P1` it is also in the same `P0` partition with order `<= r`, so its `P0`-rank is within top-k and it is **kept**. Nothing `W` needs is pruned. - **`P0 ⊋ P1` or independent → unsafe.** A finer/independent `P0` can drop a row that ranks early in `P1`, corrupting `W`. So equality is unnecessarily strict; the precise safe condition is the **subset** relation, expressed with `containsAll` (which also makes the check order-insensitive in the partition-key list, matching the set semantics of `PARTITION BY`). ### When the optimization still applies Single window, multiple windows with the **same** partition key, and the **subset** case above all keep firing. Example where the chosen `rank(partition by g1)` is coarser than `row_number(partition by g1, g2)`, so a `VPartitionTopN(partition by g1)` is still generated: ```sql select id, g1, g2, ord_key, rk, rn from ( select id, g1, g2, ord_key, rank() over (partition by g1 order by ord_key) as rk, -- chosen (coarser) row_number() over (partition by g1, g2 order by ord_key) as rn from multi_window_cases ) q where rk <= 2; ``` ``` 6:VANALYTIC partition by: g1, g2 order by: ord_key <- computes rn 4:VANALYTIC partition by: g1 order by: ord_key <- computes rk 1:VPartitionTopN partition by: g1 order by: ord_key partition limit: 2 <- safe (g1 ⊆ {g1,g2}) 0:VOlapScanNode ``` ### Reproduce (the wrong-result case) ```sql CREATE TABLE multi_window_cases ( id INT, g1 VARCHAR(8), g2 VARCHAR(8), ord_key INT, amt INT ) DUPLICATE KEY(id) DISTRIBUTED BY HASH(id) BUCKETS 1 PROPERTIES ("replication_num" = "1"); INSERT INTO multi_window_cases VALUES (1,'A','X',1,10),(2,'A','X',2,20),(3,'A','Y',3,30),(4,'B','X',4,40), (5,'B','Y',5,50),(6,'B','Y',6,60),(7,'C','X',7,70),(8,'C','Z',8,80); SELECT id, g1, g2, ord_key, rn1, rn2 FROM ( SELECT id, g1, g2, ord_key, row_number() OVER (PARTITION BY g1 ORDER BY ord_key) AS rn1, row_number() OVER (PARTITION BY g2 ORDER BY ord_key) AS rn2 FROM multi_window_cases ) q WHERE rn1 <= 1 ORDER BY id; ``` Wrong result (before), `rn2` should be `1,3,4`: ``` +----+----+----+---------+-----+-----+ | id | g1 | g2 | ord_key | rn1 | rn2 | +----+----+----+---------+-----+-----+ | 1 | A | X | 1 | 1 | 1 | | 4 | B | X | 4 | 1 | 2 | <- wrong (should be 3) | 7 | C | X | 7 | 1 | 3 | <- wrong (should be 4) +----+----+----+---------+-----+-----+ ``` Correct result (after, matches MySQL 8.4): ``` +------+------+------+---------+-----+-----+ | id | g1 | g2 | ord_key | rn1 | rn2 | +------+------+------+---------+-----+-----+ | 1 | A | X | 1 | 1 | 1 | | 4 | B | X | 4 | 1 | 3 | | 7 | C | X | 7 | 1 | 4 | +------+------+------+---------+-----+-----+ ``` ### EXPLAIN before (buggy) A `VPartitionTopN(partition by g1)` is inserted **below both analytic nodes**, so it prunes rows before `rn2` is computed: ``` 8:VSORT order by: id 7:VANALYTIC partition by: g2, order by: ord_key <- computes rn2 | predicates: (rn1 <= 1) 6:VSORT order by: g2, ord_key 4:VANALYTIC partition by: g1, order by: ord_key <- computes rn1 3:VSORT order by: g1, ord_key 1:VPartitionTopN partition by: g1, order by: ord_key <- prunes input (WRONG) 0:VOlapScanNode ``` ### EXPLAIN after (fixed) No `VPartitionTopN`; both window functions are computed over the full input and `rn1 <= 1` stays as an ordinary predicate above them: ``` 7:VSORT order by: id 6:VANALYTIC partition by: g2, order by: ord_key <- computes rn2 (full input) | predicates: (rn1 <= 1) 5:VSORT order by: g2, ord_key 3:VANALYTIC partition by: g1, order by: ord_key <- computes rn1 2:VSORT order by: g1, ord_key 0:VOlapScanNode ``` ### Release note Fix wrong result of multiple window functions (`row_number`/`rank`/`dense_rank`) with incompatible partition keys when a top-n filter (e.g. `rn <= k`) is applied; the partition-topn pushdown is now restricted to the cases where it is provably safe (the chosen function's partition key is a subset of the co-located ones). ### Check List (For author) - [x] Test - [x] Regression test (`regression-test/suites/query_p0/partition_topn/check_partitionkey.groovy`, `.../push_down_filter_through_window/push_down_multi_filter_through_window.groovy`) - [x] Unit test (`GeneratePartitionTopnFromWindowTest`: `testMultipleWindowsWithDifferentPartitions`, `testMultipleWindowsSubsetPartitionGeneratesTopn`) - [x] Behavior changed: - [x] Function behavior changed (returns correct results for the cases above)

morrySnow added the dev/4.0.x label Sep 29, 2025

morrySnow reviewed Sep 29, 2025

View reviewed changes

englefly added 2 commits September 29, 2025 15:23

winExpr order key

b697714

ut

6ca52cc

englefly force-pushed the partition_topn_orderkey branch from 6f2b4a0 to 6ca52cc Compare September 29, 2025 08:09

starocean999 approved these changes Oct 14, 2025

View reviewed changes

github-actions Bot added the approved Indicates a PR has been approved by one committer. label Oct 14, 2025

github-actions Bot added the reviewed label Oct 14, 2025

zhangstar333 approved these changes Oct 15, 2025

View reviewed changes

englefly merged commit fffbeab into apache:master Oct 15, 2025
27 of 28 checks passed

englefly deleted the partition_topn_orderkey branch October 15, 2025 02:17

github-actions Bot mentioned this pull request Oct 15, 2025

branch-4.0: [fix](nereids) partition topn optimization requires all window expressions are in the same order #56622 #56976

Merged

yiguolei pushed a commit that referenced this pull request Oct 16, 2025

branch-4.0: [fix](nereids) partition topn optimization requires all w…

a41f409

…indow expressions are in the same order #56622 (#56976) Cherry-picked from #56622 Co-authored-by: minghong <zhouminghong@selectdb.com>

wm1581066 added the p0_w label Oct 16, 2025

yiguolei mentioned this pull request Nov 5, 2025

4.0.1 Release Notes #57717

Closed

morrySnow added dev/4.0.1-merged dev/3.1.x and removed dev/4.0.x labels Nov 11, 2025

github-actions Bot mentioned this pull request Nov 17, 2025

branch-3.1: [fix](nereids) partition topn optimization requires all window expressions are in the same order #56622 #58080

Merged

morrySnow pushed a commit that referenced this pull request Nov 25, 2025

branch-3.1: [fix](nereids) partition topn optimization requires all w…

bfca441

…indow expressions are in the same order #56622 (#58080) Cherry-picked from #56622 Co-authored-by: minghong <zhouminghong@selectdb.com>

morrySnow added dev/3.1.4-merged and removed dev/3.1.x labels Nov 25, 2025

CalvinKirs mentioned this pull request Jun 24, 2026

[fix](nereids) partition topn opt requires chosen window func partition key to be a subset of co-located ones #64764

Merged

5 tasks

Uh oh!

Conversation

englefly commented Sep 29, 2025

What problem does this PR solve?

Release note

Check List (For Author)

Check List (For Reviewer who merge this PR)

Uh oh!

Thearas commented Sep 29, 2025

Uh oh!

englefly commented Sep 29, 2025

Uh oh!

morrySnow Sep 29, 2025

Choose a reason for hiding this comment

Uh oh!

doris-robot commented Sep 29, 2025

Uh oh!

hello-stephen commented Sep 29, 2025

FE UT Coverage Report

Uh oh!

englefly commented Sep 29, 2025

Uh oh!

doris-robot commented Sep 29, 2025

Uh oh!

doris-robot commented Sep 29, 2025

Uh oh!

hello-stephen commented Sep 29, 2025

FE UT Coverage Report

Uh oh!

hello-stephen commented Sep 29, 2025

FE Regression Coverage Report

Uh oh!

github-actions Bot commented Oct 14, 2025

Uh oh!

github-actions Bot commented Oct 14, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants