Skip to content

[fix](nereids) partition topn optimization requires all window expressions are in the same order#56622

Merged
englefly merged 2 commits into
apache:masterfrom
englefly:partition_topn_orderkey
Oct 15, 2025
Merged

[fix](nereids) partition topn optimization requires all window expressions are in the same order#56622
englefly merged 2 commits into
apache:masterfrom
englefly:partition_topn_orderkey

Conversation

@englefly

Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@Thearas

Thearas commented Sep 29, 2025

Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@englefly

Copy link
Copy Markdown
Contributor Author

run buildall

}
}
// check all windowExpression's order key is empty or is the same as chosenWindowFunc's order key
for (NamedExpression windowExpr : windowExpressions) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add feut

@doris-robot

Copy link
Copy Markdown
ClickBench: Total hot run time: 30.55 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 6f2b4a036c2a40e8a6da6a05532d415cded85e82, data reload: false

query1	0.06	0.05	0.05
query2	0.09	0.05	0.06
query3	0.25	0.08	0.08
query4	1.61	0.12	0.12
query5	0.28	0.26	0.25
query6	1.19	0.66	0.64
query7	0.03	0.03	0.03
query8	0.06	0.05	0.05
query9	0.64	0.52	0.53
query10	0.57	0.58	0.58
query11	0.17	0.12	0.11
query12	0.15	0.12	0.12
query13	0.63	0.62	0.62
query14	1.04	1.05	1.05
query15	0.87	0.88	0.86
query16	0.41	0.42	0.40
query17	1.04	1.07	1.07
query18	0.21	0.20	0.20
query19	2.00	1.85	1.86
query20	0.01	0.01	0.01
query21	15.41	0.94	0.57
query22	0.76	1.20	0.98
query23	14.77	1.40	0.68
query24	6.95	1.46	0.49
query25	0.46	0.13	0.05
query26	0.66	0.16	0.14
query27	0.07	0.06	0.05
query28	9.44	1.35	0.93
query29	12.57	3.92	3.24
query30	0.28	0.13	0.11
query31	2.82	0.60	0.40
query32	3.24	0.57	0.47
query33	3.12	3.09	3.07
query34	16.19	5.57	4.85
query35	4.91	4.95	4.95
query36	0.68	0.52	0.51
query37	0.10	0.08	0.08
query38	0.07	0.05	0.05
query39	0.04	0.03	0.03
query40	0.17	0.15	0.14
query41	0.08	0.03	0.03
query42	0.04	0.04	0.03
query43	0.05	0.04	0.04
Total cold run time: 104.19 s
Total hot run time: 30.55 s

@hello-stephen

Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 44.44% (4/9) 🎉
Increment coverage report
Complete coverage report

@englefly englefly force-pushed the partition_topn_orderkey branch from 6f2b4a0 to 6ca52cc Compare September 29, 2025 08:09
@englefly

Copy link
Copy Markdown
Contributor Author

run buildall

@doris-robot

Copy link
Copy Markdown
TPC-DS: Total hot run time: 190309 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 6ca52cc92c1d462f3dbd2cd3bb21fd1af9b252ef, data reload: false

query1	1080	441	407	407
query2	6558	1755	1692	1692
query3	6750	221	223	221
query4	26605	23660	23222	23222
query5	5223	691	494	494
query6	347	254	230	230
query7	4674	519	297	297
query8	334	275	264	264
query9	8749	2574	2579	2574
query10	557	345	302	302
query11	16070	15370	14773	14773
query12	193	126	114	114
query13	1689	573	460	460
query14	12665	9132	9237	9132
query15	219	194	172	172
query16	7659	680	466	466
query17	1585	784	644	644
query18	2157	447	376	376
query19	240	211	191	191
query20	136	130	133	130
query21	228	154	121	121
query22	4637	4752	4627	4627
query23	34801	34758	33897	33897
query24	8336	2538	2538	2538
query25	592	545	480	480
query26	1308	295	169	169
query27	3188	520	394	394
query28	4357	2213	2194	2194
query29	817	673	525	525
query30	315	247	221	221
query31	926	884	799	799
query32	84	77	108	77
query33	591	407	344	344
query34	863	879	535	535
query35	841	860	769	769
query36	1035	1075	943	943
query37	133	127	88	88
query38	3596	3546	3530	3530
query39	1483	1401	1405	1401
query40	232	137	122	122
query41	79	76	58	58
query42	122	121	110	110
query43	486	503	461	461
query44	1383	865	829	829
query45	192	180	170	170
query46	887	1066	654	654
query47	1755	1871	1702	1702
query48	391	436	325	325
query49	773	494	401	401
query50	684	722	409	409
query51	4023	4084	3764	3764
query52	109	111	102	102
query53	247	273	210	210
query54	603	588	521	521
query55	89	95	86	86
query56	320	313	318	313
query57	1183	1211	1115	1115
query58	288	283	310	283
query59	2583	2687	2480	2480
query60	356	341	349	341
query61	162	159	187	159
query62	809	730	673	673
query63	246	201	207	201
query64	4540	1256	949	949
query65	4047	3972	3963	3963
query66	1060	424	337	337
query67	15707	15316	15218	15218
query68	8193	971	591	591
query69	500	323	300	300
query70	1406	1264	1349	1264
query71	509	341	316	316
query72	6032	4873	4791	4791
query73	685	596	360	360
query74	9192	9072	8869	8869
query75	4074	3357	2894	2894
query76	3791	1177	744	744
query77	822	423	313	313
query78	9712	9809	8870	8870
query79	2618	878	615	615
query80	694	580	512	512
query81	524	336	227	227
query82	479	160	143	143
query83	269	267	262	262
query84	261	110	93	93
query85	908	469	425	425
query86	393	323	328	323
query87	3793	3752	3670	3670
query88	3707	2254	2216	2216
query89	400	317	295	295
query90	2019	223	228	223
query91	246	163	142	142
query92	95	70	65	65
query93	2141	992	647	647
query94	705	444	335	335
query95	401	322	313	313
query96	495	577	286	286
query97	2928	2967	2861	2861
query98	245	219	218	218
query99	1351	1417	1357	1357
Total cold run time: 284565 ms
Total hot run time: 190309 ms

@doris-robot

Copy link
Copy Markdown
ClickBench: Total hot run time: 30.35 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 6ca52cc92c1d462f3dbd2cd3bb21fd1af9b252ef, data reload: false

query1	0.06	0.05	0.04
query2	0.09	0.05	0.05
query3	0.25	0.08	0.08
query4	1.60	0.11	0.12
query5	0.27	0.26	0.24
query6	1.15	0.65	0.65
query7	0.03	0.03	0.03
query8	0.06	0.05	0.05
query9	0.61	0.52	0.51
query10	0.58	0.59	0.58
query11	0.18	0.11	0.14
query12	0.16	0.12	0.11
query13	0.64	0.62	0.61
query14	1.03	1.03	1.03
query15	0.88	0.84	0.87
query16	0.40	0.40	0.39
query17	1.06	1.05	1.06
query18	0.22	0.20	0.20
query19	1.92	1.93	1.82
query20	0.02	0.01	0.01
query21	15.42	0.92	0.56
query22	0.77	1.16	0.65
query23	15.00	1.40	0.61
query24	7.09	1.04	0.85
query25	0.50	0.09	0.06
query26	0.65	0.16	0.14
query27	0.06	0.06	0.05
query28	9.53	1.34	0.93
query29	12.62	3.99	3.26
query30	0.28	0.14	0.11
query31	2.82	0.60	0.39
query32	3.26	0.56	0.48
query33	3.09	3.07	3.13
query34	16.18	5.48	4.94
query35	4.91	4.88	4.94
query36	0.70	0.52	0.50
query37	0.10	0.07	0.08
query38	0.07	0.05	0.04
query39	0.04	0.04	0.03
query40	0.19	0.17	0.13
query41	0.09	0.04	0.03
query42	0.04	0.03	0.03
query43	0.05	0.03	0.03
Total cold run time: 104.67 s
Total hot run time: 30.35 s

@hello-stephen

Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 66.67% (6/9) 🎉
Increment coverage report
Complete coverage report

@hello-stephen

Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 66.67% (6/9) 🎉
Increment coverage report
Complete coverage report

@github-actions github-actions Bot added the approved Indicates a PR has been approved by one committer. label Oct 14, 2025
@github-actions

Copy link
Copy Markdown
Contributor

PR approved by at least one committer and no changes requested.

@github-actions

Copy link
Copy Markdown
Contributor

PR approved by anyone and no changes requested.

@englefly englefly merged commit fffbeab into apache:master Oct 15, 2025
27 of 28 checks passed
@englefly englefly deleted the partition_topn_orderkey branch October 15, 2025 02:17
github-actions Bot pushed a commit that referenced this pull request Oct 15, 2025
…sions are in the same order (#56622)

### What problem does this PR solve?
fix bug: partition topn optimization requires all window expressions are in the same order
yiguolei pushed a commit that referenced this pull request Oct 16, 2025
…indow expressions are in the same order #56622 (#56976)

Cherry-picked from #56622

Co-authored-by: minghong <zhouminghong@selectdb.com>
@wm1581066 wm1581066 added the p0_w label Oct 16, 2025
@yiguolei yiguolei mentioned this pull request Nov 5, 2025
github-actions Bot pushed a commit that referenced this pull request Nov 17, 2025
…sions are in the same order (#56622)

### What problem does this PR solve?
fix bug: partition topn optimization requires all window expressions are in the same order
morrySnow pushed a commit that referenced this pull request Nov 25, 2025
…indow expressions are in the same order #56622 (#58080)

Cherry-picked from #56622

Co-authored-by: minghong <zhouminghong@selectdb.com>
yiguolei pushed a commit to yiguolei/incubator-doris that referenced this pull request Dec 30, 2025
… requires all window expressions are in the same order apache#56622 (apache#5752)

cherry-picks from apache#56622
Related to apache#5098

Co-authored-by: minghong <zhouminghong@selectdb.com>
CalvinKirs added a commit to CalvinKirs/incubator-doris that referenced this pull request Jun 24, 2026
…on key to be a subset of co-located ones

When a single LogicalWindow holds multiple window functions, a filter such as
`rn <= k` may be turned into a partitionTopN and pushed below the whole window
node. The generated partitionTopN keeps the per-partition top-k of the chosen
window function and thus prunes the input rows shared by ALL co-located window
functions.

This is only correct when the chosen window function's partition key is a
SUBSET of every other co-located window function's partition key (i.e. the
chosen one is coarser). Then any row that could change another window's value
for a surviving row lies in the same chosen-partition with an order value not
greater than the surviving row's, so its chosen-rank is within top-k and it is
kept. When this does not hold the pruning drops rows the other windows still
need and produces a wrong result, e.g.

    row_number() over (partition by g1 order by c) as rn1,  -- chosen
    row_number() over (partition by g2 order by c) as rn2   -- independent

    row_number() over (partition by c1, c2 order by c) as rn,  -- chosen (finer)
    rank()       over (partition by c1 order by c) as rk       -- coarser

getPushDownWindowFuncAndLimit() previously only required the order keys of all
co-located window functions to be compatible (apache#56622). It now also requires the
partition keys to satisfy the above subset relation, otherwise the optimization
is disabled.
CalvinKirs added a commit that referenced this pull request Jun 27, 2026
…on key to be a subset of co-located ones (#64764)

### What problem does this PR solve?

When a single `LogicalWindow` holds multiple window functions, a filter
like `rn <= k` may be converted into a `partitionTopN` and pushed
**below the whole window node**. The generated `partitionTopN` keeps the
per-partition top-k of the *chosen* window function, so it prunes the
input rows that are **shared by all co-located window functions**.

This is only correct when the chosen window function's partition key is
a **subset** of every other co-located window function's partition key
(i.e. the chosen one is *coarser*). Otherwise the pruning drops rows the
other windows still need and produces a wrong result, e.g.

```sql
-- independent partitions: chosen rn1(g1) prunes rows rn2(g2) needs
row_number() over (partition by g1 order by ord_key) as rn1,  -- chosen
row_number() over (partition by g2 order by ord_key) as rn2

-- chosen is finer than a co-located window: chosen rn(c1,c2) prunes rows rk(c1) needs
row_number() over (partition by c1, c2 order by c3) as rn,    -- chosen (finer)
rank()       over (partition by c1 order by c3) as rk         -- coarser
```

`LogicalWindow.getPushDownWindowFuncAndLimit()` already required the
**order keys** of all co-located window functions to be compatible
(#56622), but it never checked the **partition keys**. This PR
additionally requires
`windowFunc.getPartitionKeys().containsAll(chosenWindowFunc.getPartitionKeys())`
for every co-located window function; otherwise the partition-topn
optimization is disabled.

### Why the subset rule is correct (and not just equality)

The pruning keeps the per-`P0` (chosen partition) top-k by the order
key. For another window function `W` partitioned by `P1`:

- **`P0 ⊆ P1` (chosen coarser) → safe.** Any row that could change `W`'s
value for a surviving row `r` is in the same `P1` partition with a
smaller order value; being in the same `P1` it is also in the same `P0`
partition with order `<= r`, so its `P0`-rank is within top-k and it is
**kept**. Nothing `W` needs is pruned.
- **`P0 ⊋ P1` or independent → unsafe.** A finer/independent `P0` can
drop a row that ranks early in `P1`, corrupting `W`.

So equality is unnecessarily strict; the precise safe condition is the
**subset** relation, expressed with `containsAll` (which also makes the
check order-insensitive in the partition-key list, matching the set
semantics of `PARTITION BY`).

### When the optimization still applies

Single window, multiple windows with the **same** partition key, and the
**subset** case above all keep firing. Example where the chosen
`rank(partition by g1)` is coarser than `row_number(partition by g1,
g2)`, so a `VPartitionTopN(partition by g1)` is still generated:

```sql
select id, g1, g2, ord_key, rk, rn
from (
  select id, g1, g2, ord_key,
    rank()       over (partition by g1 order by ord_key) as rk,   -- chosen (coarser)
    row_number() over (partition by g1, g2 order by ord_key) as rn
  from multi_window_cases
) q
where rk <= 2;
```

```
  6:VANALYTIC          partition by: g1, g2   order by: ord_key   <- computes rn
  4:VANALYTIC          partition by: g1       order by: ord_key   <- computes rk
  1:VPartitionTopN     partition by: g1       order by: ord_key   partition limit: 2   <- safe (g1 ⊆ {g1,g2})
  0:VOlapScanNode
```

### Reproduce (the wrong-result case)

```sql
CREATE TABLE multi_window_cases (
  id      INT,
  g1      VARCHAR(8),
  g2      VARCHAR(8),
  ord_key INT,
  amt     INT
)
DUPLICATE KEY(id)
DISTRIBUTED BY HASH(id) BUCKETS 1
PROPERTIES ("replication_num" = "1");

INSERT INTO multi_window_cases VALUES
(1,'A','X',1,10),(2,'A','X',2,20),(3,'A','Y',3,30),(4,'B','X',4,40),
(5,'B','Y',5,50),(6,'B','Y',6,60),(7,'C','X',7,70),(8,'C','Z',8,80);

SELECT id, g1, g2, ord_key, rn1, rn2
FROM (
  SELECT id, g1, g2, ord_key,
    row_number() OVER (PARTITION BY g1 ORDER BY ord_key) AS rn1,
    row_number() OVER (PARTITION BY g2 ORDER BY ord_key) AS rn2
  FROM multi_window_cases
) q
WHERE rn1 <= 1
ORDER BY id;
```

Wrong result (before), `rn2` should be `1,3,4`:

```
+----+----+----+---------+-----+-----+
| id | g1 | g2 | ord_key | rn1 | rn2 |
+----+----+----+---------+-----+-----+
|  1 | A  | X  |       1 |   1 |   1 |
|  4 | B  | X  |       4 |   1 |   2 |   <- wrong (should be 3)
|  7 | C  | X  |       7 |   1 |   3 |   <- wrong (should be 4)
+----+----+----+---------+-----+-----+
```

Correct result (after, matches MySQL 8.4):

```
+------+------+------+---------+-----+-----+
| id   | g1   | g2   | ord_key | rn1 | rn2 |
+------+------+------+---------+-----+-----+
|    1 | A    | X    |       1 |   1 |   1 |
|    4 | B    | X    |       4 |   1 |   3 |
|    7 | C    | X    |       7 |   1 |   4 |
+------+------+------+---------+-----+-----+
```

### EXPLAIN before (buggy)

A `VPartitionTopN(partition by g1)` is inserted **below both analytic
nodes**, so it prunes rows before `rn2` is computed:

```
  8:VSORT              order by: id
  7:VANALYTIC          partition by: g2,  order by: ord_key   <- computes rn2
  |  predicates: (rn1 <= 1)
  6:VSORT              order by: g2, ord_key
  4:VANALYTIC          partition by: g1,  order by: ord_key   <- computes rn1
  3:VSORT              order by: g1, ord_key
  1:VPartitionTopN     partition by: g1,  order by: ord_key   <- prunes input (WRONG)
  0:VOlapScanNode
```

### EXPLAIN after (fixed)

No `VPartitionTopN`; both window functions are computed over the full
input and `rn1 <= 1` stays as an ordinary predicate above them:

```
  7:VSORT              order by: id
  6:VANALYTIC          partition by: g2,  order by: ord_key   <- computes rn2 (full input)
  |  predicates: (rn1 <= 1)
  5:VSORT              order by: g2, ord_key
  3:VANALYTIC          partition by: g1,  order by: ord_key   <- computes rn1
  2:VSORT              order by: g1, ord_key
  0:VOlapScanNode
```

### Release note

Fix wrong result of multiple window functions
(`row_number`/`rank`/`dense_rank`) with incompatible partition keys when
a top-n filter (e.g. `rn <= k`) is applied; the partition-topn pushdown
is now restricted to the cases where it is provably safe (the chosen
function's partition key is a subset of the co-located ones).

### Check List (For author)

- [x] Test
- [x] Regression test
(`regression-test/suites/query_p0/partition_topn/check_partitionkey.groovy`,
`.../push_down_filter_through_window/push_down_multi_filter_through_window.groovy`)
- [x] Unit test (`GeneratePartitionTopnFromWindowTest`:
`testMultipleWindowsWithDifferentPartitions`,
`testMultipleWindowsSubsetPartitionGeneratesTopn`)

- [x] Behavior changed:
- [x] Function behavior changed (returns correct results for the cases
above)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/3.1.4-merged dev/4.0.1-merged p0_w reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants