[fix](nereids) partition topn opt requires chosen window func partition key to be a subset of co-located ones#64764
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
|
/review |
There was a problem hiding this comment.
I reviewed the partition-topn/window change and did not find a blocking issue. The new guard in LogicalWindow.getPushDownWindowFuncAndLimit() covers the shared-input pruning hazard for all current callers that insert LogicalPartitionTopN below a LogicalWindow, including filter-derived and limit/topn-derived paths. The added unit test and regression test both target the prior wrong-result shape, and the regression expected rn2 values prove full-window evaluation before filtering.
Critical checkpoint conclusions:
- Goal/test proof: the PR fixes the different-partition co-located window wrong-result case and includes both FE unit and regression coverage for that shape.
- Scope/focus: the code change is small and localized to the partition-topn eligibility check.
- Concurrency/lifecycle/config/compatibility/persistence/write paths: not involved beyond reading the existing
enable_partition_topnsession variable. - Parallel paths:
CreatePartitionTopNFromWindow,PushDownLimit, andPushDownTopNThroughWindowall flow through the updated helper; physical window grouping later separates partition groups, but the logical rewrite can run before that, so the guard is needed. - Tests/results: regression uses a hardcoded table, drop-before-use, deterministic
order by, an EXPLAINnotContains("VPartitionTopN")guard, and generated expected output. I did not run tests locally. - Performance/observability: this intentionally disables an unsafe optimization for mixed partition keys; no new observability is needed.
Subagent conclusions: optimizer-rewrite reported no candidates after checking rewrite semantics and partition-topn execution paths. tests-session-config reported no candidates after checking regression/session/config/style risks. No candidates were accepted, dismissed, or deduplicated into inline comments. Convergence round 1 ended with both live subagents reporting NO_NEW_VALUABLE_FINDINGS for the same final ledger/comment set.
User focus: no additional user-provided review focus was supplied.
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
TPC-H: Total hot run time: 28948 ms |
TPC-DS: Total hot run time: 172179 ms |
ClickBench: Total hot run time: 25.26 s |
FE Regression Coverage ReportIncrement line coverage |
…on key to be a subset of co-located ones
When a single LogicalWindow holds multiple window functions, a filter such as
`rn <= k` may be turned into a partitionTopN and pushed below the whole window
node. The generated partitionTopN keeps the per-partition top-k of the chosen
window function and thus prunes the input rows shared by ALL co-located window
functions.
This is only correct when the chosen window function's partition key is a
SUBSET of every other co-located window function's partition key (i.e. the
chosen one is coarser). Then any row that could change another window's value
for a surviving row lies in the same chosen-partition with an order value not
greater than the surviving row's, so its chosen-rank is within top-k and it is
kept. When this does not hold the pruning drops rows the other windows still
need and produces a wrong result, e.g.
row_number() over (partition by g1 order by c) as rn1, -- chosen
row_number() over (partition by g2 order by c) as rn2 -- independent
row_number() over (partition by c1, c2 order by c) as rn, -- chosen (finer)
rank() over (partition by c1 order by c) as rk -- coarser
getPushDownWindowFuncAndLimit() previously only required the order keys of all
co-located window functions to be compatible (apache#56622). It now also requires the
partition keys to satisfy the above subset relation, otherwise the optimization
is disabled.
0a20399 to
d3f7594
Compare
|
run buildall |
TPC-H: Total hot run time: 28618 ms |
TPC-DS: Total hot run time: 173856 ms |
ClickBench: Total hot run time: 25.37 s |
FE Regression Coverage ReportIncrement line coverage |
|
PR approved by at least one committer and no changes requested. |
|
run external |
|
run nonConcurrent |
FE Regression Coverage ReportIncrement line coverage |
1 similar comment
FE Regression Coverage ReportIncrement line coverage |
|
run external |
|
/review |
There was a problem hiding this comment.
Reviewed PR 64764. I found one small but concrete hygiene issue to fix before merging: the new generated regression .out file leaves an extra blank line at EOF, and git diff --check flags the patch.
Critical checkpoints:
- Goal and behavior: the PR adds a partition-key compatibility guard for partition top-n generation from co-located window functions. The changed helper is used by the filter, limit, and top-n pushdown paths, and the intended subset/incompatible partition cases were reviewed.
- Scope: the code change is focused in
LogicalWindow; the remaining changes are targeted FE unit/regression coverage and expected output. - Concurrency, lifecycle, persistence, FE/BE protocol, and storage compatibility: not involved in this PR.
- Configuration/session behavior: no new config is added; the existing
enable_partition_topnsession variable remains the gate. - Parallel paths:
CreatePartitionTopNFromWindow,PushDownLimit, andPushDownTopNThroughWindowall route through the updated helper, so the new guard applies across the relevant generation paths. - Tests/results: the new tests cover incompatible partition keys and subset-safe cases, but the new
.outfile needs the EOF whitespace fix below. - Existing comments and user focus: there were no existing inline review threads and no additional user-provided focus points.
Subagent conclusions: optimizer-rewrite found no optimizer correctness candidate. tests-session-config proposed TSC-001, which was merged as MAIN-001 and submitted below. Java test line-length was dismissed because test LineLength is suppressed by the FE checkstyle suppressions. Convergence round 1 ended with both live subagents replying NO_NEW_VALUABLE_FINDINGS for the same final ledger/comment set.
What problem does this PR solve?
When a single
LogicalWindowholds multiple window functions, a filter likern <= kmay be converted into apartitionTopNand pushed below the whole window node. The generatedpartitionTopNkeeps the per-partition top-k of the chosen window function, so it prunes the input rows that are shared by all co-located window functions.This is only correct when the chosen window function's partition key is a subset of every other co-located window function's partition key (i.e. the chosen one is coarser). Otherwise the pruning drops rows the other windows still need and produces a wrong result, e.g.
LogicalWindow.getPushDownWindowFuncAndLimit()already required the order keys of all co-located window functions to be compatible (#56622), but it never checked the partition keys. This PR additionally requireswindowFunc.getPartitionKeys().containsAll(chosenWindowFunc.getPartitionKeys())for every co-located window function; otherwise the partition-topn optimization is disabled.Why the subset rule is correct (and not just equality)
The pruning keeps the per-
P0(chosen partition) top-k by the order key. For another window functionWpartitioned byP1:P0 ⊆ P1(chosen coarser) → safe. Any row that could changeW's value for a surviving rowris in the sameP1partition with a smaller order value; being in the sameP1it is also in the sameP0partition with order<= r, so itsP0-rank is within top-k and it is kept. NothingWneeds is pruned.P0 ⊋ P1or independent → unsafe. A finer/independentP0can drop a row that ranks early inP1, corruptingW.So equality is unnecessarily strict; the precise safe condition is the subset relation, expressed with
containsAll(which also makes the check order-insensitive in the partition-key list, matching the set semantics ofPARTITION BY).When the optimization still applies
Single window, multiple windows with the same partition key, and the subset case above all keep firing. Example where the chosen
rank(partition by g1)is coarser thanrow_number(partition by g1, g2), so aVPartitionTopN(partition by g1)is still generated:Reproduce (the wrong-result case)
Wrong result (before),
rn2should be1,3,4:Correct result (after, matches MySQL 8.4):
EXPLAIN before (buggy)
A
VPartitionTopN(partition by g1)is inserted below both analytic nodes, so it prunes rows beforern2is computed:EXPLAIN after (fixed)
No
VPartitionTopN; both window functions are computed over the full input andrn1 <= 1stays as an ordinary predicate above them:Release note
Fix wrong result of multiple window functions (
row_number/rank/dense_rank) with incompatible partition keys when a top-n filter (e.g.rn <= k) is applied; the partition-topn pushdown is now restricted to the cases where it is provably safe (the chosen function's partition key is a subset of the co-located ones).Check List (For author)
Test
regression-test/suites/query_p0/partition_topn/check_partitionkey.groovy,.../push_down_filter_through_window/push_down_multi_filter_through_window.groovy)GeneratePartitionTopnFromWindowTest:testMultipleWindowsWithDifferentPartitions,testMultipleWindowsSubsetPartitionGeneratesTopn)Behavior changed: