Skip to content

[feature](Nereids): Pushdown TopN-Distinct through Union#27628

Merged
jackwener merged 1 commit into
apache:masterfrom
jackwener:topn_union
Nov 28, 2023
Merged

[feature](Nereids): Pushdown TopN-Distinct through Union#27628
jackwener merged 1 commit into
apache:masterfrom
jackwener:topn_union

Conversation

@jackwener

@jackwener jackwener commented Nov 27, 2023

Copy link
Copy Markdown
Member

Proposed changes

 * TopN-Distinct
 * -> Union All
 * -> child plan1
 * -> child plan2
 * -> child plan3
 *
 * rewritten to
 *
 * TopN-Distinct
 * -> Union All
 *   -> TopN-Distinct
 *     -> child plan1
 *   -> TopN-Distinct
 *     -> child plan2
 *   -> TopN-Distinct
 *     -> child plan3

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@jackwener

Copy link
Copy Markdown
Member Author

run buildall

@doris-robot

Copy link
Copy Markdown

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 46.17 seconds
stream load tsv: 567 seconds loaded 74807831229 Bytes, about 125 MB/s
stream load json: 32 seconds loaded 2358488459 Bytes, about 70 MB/s
stream load orc: 66 seconds loaded 1101869774 Bytes, about 15 MB/s
stream load parquet: 34 seconds loaded 861443392 Bytes, about 24 MB/s
insert into select: 28.8 seconds inserted 10000000 Rows, about 347K ops/s
storage size: 17099033376 Bytes

@doris-robot

Copy link
Copy Markdown
TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit 66c0ec073ce0d70ae760f2c6ef51521445f14ac2, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4910	4663	4686	4663
q2	378	179	158	158
q3	2041	1983	1901	1901
q4	1411	1287	1289	1287
q5	3939	3979	4018	3979
q6	257	132	131	131
q7	1400	888	897	888
q8	2810	2815	2774	2774
q9	9794	9754	9799	9754
q10	3467	3519	3507	3507
q11	382	240	236	236
q12	442	295	297	295
q13	4614	3844	3835	3835
q14	311	288	289	288
q15	585	535	527	527
q16	668	594	581	581
q17	1136	954	914	914
q18	7917	7554	7449	7449
q19	1694	1677	1685	1677
q20	543	331	311	311
q21	4480	4072	4051	4051
q22	486	384	388	384
Total cold run time: 53665 ms
Total hot run time: 49590 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4571	4548	4623	4548
q2	333	234	246	234
q3	4060	4022	4011	4011
q4	2747	2731	2737	2731
q5	9678	9691	9739	9691
q6	249	124	123	123
q7	3066	2509	2496	2496
q8	4426	4423	4433	4423
q9	13337	13223	13202	13202
q10	4062	4177	4186	4177
q11	837	646	656	646
q12	992	817	800	800
q13	4331	3563	3569	3563
q14	384	345	350	345
q15	581	531	520	520
q16	740	664	686	664
q17	3946	3949	3836	3836
q18	9565	9191	9072	9072
q19	1825	1819	1770	1770
q20	2407	2093	2042	2042
q21	8860	8841	8360	8360
q22	857	822	796	796
Total cold run time: 81854 ms
Total hot run time: 78050 ms

@github-actions github-actions Bot added the approved Indicates a PR has been approved by one committer. label Nov 28, 2023
@github-actions

Copy link
Copy Markdown
Contributor

PR approved by at least one committer and no changes requested.

@github-actions

Copy link
Copy Markdown
Contributor

PR approved by anyone and no changes requested.

@jackwener jackwener merged commit 91f56ce into apache:master Nov 28, 2023
@jackwener jackwener deleted the topn_union branch November 28, 2023 07:23
XuJianxu pushed a commit to XuJianxu/doris that referenced this pull request Dec 14, 2023
```
  TopN-Distinct
  -> Union All
  -> child plan1
  -> child plan2
  -> child plan3
 
  rewritten to
 
  TopN-Distinct
  -> Union All
    -> TopN-Distinct
      -> child plan1
    -> TopN-Distinct
      -> child plan2
    -> TopN-Distinct
      -> child plan3
```
morrySnow added a commit to morrySnow/incubator-doris that referenced this pull request Jun 26, 2026
Issue Number: close #xxx

Related PR: apache#27628

Problem Summary: PushDownTopNDistinctThroughUnion used the rewritten child plan output to map UNION output slots when pushing TopN distinct into UNION branches. For UNION DISTINCT queries with duplicate projection columns and window output, the child plan output can be shorter than the UNION regular child output, causing IndexOutOfBoundsException during rewrite. Use the UNION regular child output for slot remapping and add focused FE and regression coverage for duplicate output with window plus outer ORDER BY LIMIT.

None

- Test: Regression test / Unit Test
    - ./run-fe-ut.sh --run org.apache.doris.nereids.rules.rewrite.PushDownTopNDistinctThroughUnionTest
    - ./run-regression-test.sh --run -d nereids_rules_p0/push_down_top_n -s push_down_top_n_distinct_through_union -forceGenOut
- Behavior changed: No
- Does this need documentation: No
morrySnow added a commit to morrySnow/incubator-doris that referenced this pull request Jun 26, 2026
Issue Number: close #xxx

Related PR: apache#27628

Problem Summary: PushDownTopNDistinctThroughUnion used the rewritten child plan output to map UNION output slots when pushing TopN distinct into UNION branches. For UNION DISTINCT queries with duplicate projection columns and window output, the child plan output can be shorter than the UNION regular child output, causing IndexOutOfBoundsException during rewrite. Use the UNION regular child output for slot remapping and add focused FE and regression coverage for duplicate output with window plus outer ORDER BY LIMIT.

None

- Test: Regression test / Unit Test
    - ./run-fe-ut.sh --run org.apache.doris.nereids.rules.rewrite.PushDownTopNDistinctThroughUnionTest
    - ./run-regression-test.sh --run -d nereids_rules_p0/push_down_top_n -s push_down_top_n_distinct_through_union -forceGenOut
- Behavior changed: No
- Does this need documentation: No
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants