[fix](fe) Merge TopN with child prefix order keys#64685
Conversation
### What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary: Consecutive TopN nodes were merged only when the child order key list was a prefix of the parent order key list. When the parent order key list was shorter and was instead a prefix of the child list, the rule kept both TopN nodes even though the child ordering can serve as a deterministic tie-breaker for the parent ordering. This change allows that prefix direction, keeps the longer order key list in the merged TopN, and adjusts LogicalTopN.withOrderKeys typing so callers preserve their child type.
### Release note
None
### Check List (For Author)
- Test: Regression test / Unit Test
- ./run-fe-ut.sh --run org.apache.doris.nereids.rules.rewrite.MergeTopNsTest
- ./run-regression-test.sh --run -d nereids_rules_p0/limit_push_down -s merge_topn_prefix_key -forceGenOut
- Behavior changed: No
- Does this need documentation: No
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
|
/review |
There was a problem hiding this comment.
Review result: request changes.
I found one test determinism issue in the new regression coverage. The optimizer rewrite itself is small and focused: the prefix-order merge keeps the longer ordering as a valid tie-order refinement, preserves output slots, and reuses the existing consecutive-TopN offset/limit formula. I did not find a concurrency, lifecycle, config propagation, compatibility, transaction/persistence, or observability concern in the changed FE paths.
Critical checkpoints:
- Goal/test proof: the implementation goal is covered by unit and regression tests, but the new value regression asserts exact rows after an unordered tie.
- Scope: code changes are focused on MergeTopNs and the LogicalTopN helper typing.
- Parallel paths: join/project TopN paths were checked; no distinct missed optimizer path was found.
- Test results: the new .out is generated, but one expected result is not SQL-deterministic.
User focus: no additional user-provided focus points.
Subagent conclusions: optimizer-rewrite reported no inline candidates after dismissing semantic, ExprId/slot, and rule-placement concerns with evidence. tests-session-config proposed the accepted nondeterministic regression issue; its LSan candidate was dismissed as outside the actual GitHub PR file set. Convergence round 1 ended with both subagents replying NO_NEW_VALUABLE_FINDINGS for the same final ledger/comment set.
TPC-H: Total hot run time: 28588 ms |
TPC-DS: Total hot run time: 173944 ms |
FE UT Coverage ReportIncrement line coverage |
ClickBench: Total hot run time: 25.3 s |
FE Regression Coverage ReportIncrement line coverage |
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary: Consecutive TopN nodes were merged only when the child order key list was a prefix of the parent order key list. When the parent order key list was shorter and was instead a prefix of the child list, the rule kept both TopN nodes even though the child ordering can serve as a deterministic tie-breaker for the parent ordering. This change allows that prefix direction, keeps the longer order key list in the merged TopN, and adjusts LogicalTopN.withOrderKeys typing so callers preserve their child type.
Release note
None
Check List (For Author)