Skip to content

[fix](fe) Merge TopN with child prefix order keys#64685

Merged
morrySnow merged 1 commit into
apache:masterfrom
morrySnow:merge-topn
Jun 23, 2026
Merged

[fix](fe) Merge TopN with child prefix order keys#64685
morrySnow merged 1 commit into
apache:masterfrom
morrySnow:merge-topn

Conversation

@morrySnow

Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: Consecutive TopN nodes were merged only when the child order key list was a prefix of the parent order key list. When the parent order key list was shorter and was instead a prefix of the child list, the rule kept both TopN nodes even though the child ordering can serve as a deterministic tie-breaker for the parent ordering. This change allows that prefix direction, keeps the longer order key list in the merged TopN, and adjusts LogicalTopN.withOrderKeys typing so callers preserve their child type.

Release note

None

Check List (For Author)

  • Test: Regression test / Unit Test
    • ./run-fe-ut.sh --run org.apache.doris.nereids.rules.rewrite.MergeTopNsTest
    • ./run-regression-test.sh --run -d nereids_rules_p0/limit_push_down -s merge_topn_prefix_key -forceGenOut
  • Behavior changed: No
  • Does this need documentation: No

### What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: Consecutive TopN nodes were merged only when the child order key list was a prefix of the parent order key list. When the parent order key list was shorter and was instead a prefix of the child list, the rule kept both TopN nodes even though the child ordering can serve as a deterministic tie-breaker for the parent ordering. This change allows that prefix direction, keeps the longer order key list in the merged TopN, and adjusts LogicalTopN.withOrderKeys typing so callers preserve their child type.

### Release note

None

### Check List (For Author)

- Test: Regression test / Unit Test
    - ./run-fe-ut.sh --run org.apache.doris.nereids.rules.rewrite.MergeTopNsTest
    - ./run-regression-test.sh --run -d nereids_rules_p0/limit_push_down -s merge_topn_prefix_key -forceGenOut
- Behavior changed: No
- Does this need documentation: No
@hello-stephen

Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@morrySnow

Copy link
Copy Markdown
Contributor Author

run buildall

@morrySnow

Copy link
Copy Markdown
Contributor Author

/review

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review result: request changes.

I found one test determinism issue in the new regression coverage. The optimizer rewrite itself is small and focused: the prefix-order merge keeps the longer ordering as a valid tie-order refinement, preserves output slots, and reuses the existing consecutive-TopN offset/limit formula. I did not find a concurrency, lifecycle, config propagation, compatibility, transaction/persistence, or observability concern in the changed FE paths.

Critical checkpoints:

  • Goal/test proof: the implementation goal is covered by unit and regression tests, but the new value regression asserts exact rows after an unordered tie.
  • Scope: code changes are focused on MergeTopNs and the LogicalTopN helper typing.
  • Parallel paths: join/project TopN paths were checked; no distinct missed optimizer path was found.
  • Test results: the new .out is generated, but one expected result is not SQL-deterministic.

User focus: no additional user-provided focus points.

Subagent conclusions: optimizer-rewrite reported no inline candidates after dismissing semantic, ExprId/slot, and rule-placement concerns with evidence. tests-session-config proposed the accepted nondeterministic regression issue; its LSan candidate was dismissed as outside the actual GitHub PR file set. Convergence round 1 ended with both subagents replying NO_NEW_VALUABLE_FINDINGS for the same final ledger/comment set.

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-H: Total hot run time: 28588 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 864b8962ab8f403b0de9478adc243b29a7704df0, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17868	4029	3986	3986
q2	2033	309	184	184
q3	10344	1373	799	799
q4	4678	460	338	338
q5	7516	873	569	569
q6	195	171	133	133
q7	746	830	620	620
q8	9743	1467	1505	1467
q9	6357	4475	4445	4445
q10	6863	1798	1516	1516
q11	432	270	233	233
q12	652	416	291	291
q13	18100	3459	2766	2766
q14	272	251	237	237
q15	q16	780	784	705	705
q17	1274	1037	856	856
q18	6965	5670	5529	5529
q19	1374	1286	988	988
q20	498	399	253	253
q21	5636	2658	2374	2374
q22	425	358	299	299
Total cold run time: 102751 ms
Total hot run time: 28588 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4419	4336	4403	4336
q2	332	351	223	223
q3	4626	4942	4417	4417
q4	2068	2152	1354	1354
q5	4457	4317	4337	4317
q6	233	179	130	130
q7	1744	1799	1785	1785
q8	2515	2260	2106	2106
q9	7924	8002	7899	7899
q10	4826	4739	4283	4283
q11	612	411	369	369
q12	733	738	531	531
q13	3455	3485	2926	2926
q14	324	315	285	285
q15	q16	763	729	684	684
q17	1366	1324	1316	1316
q18	7993	7387	7288	7288
q19	1193	1088	1108	1088
q20	2216	2222	1951	1951
q21	5279	4633	4450	4450
q22	519	450	393	393
Total cold run time: 57597 ms
Total hot run time: 52131 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-DS: Total hot run time: 173944 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 864b8962ab8f403b0de9478adc243b29a7704df0, data reload: false

query5	4315	630	479	479
query6	440	189	180	180
query7	4836	523	299	299
query8	366	220	200	200
query9	8733	4085	4052	4052
query10	440	315	254	254
query11	5924	2330	2136	2136
query12	154	137	95	95
query13	1232	610	419	419
query14	6531	5374	5050	5050
query14_1	4368	4371	4381	4371
query15	205	195	173	173
query16	1010	445	410	410
query17	1098	686	571	571
query18	2708	467	332	332
query19	191	180	138	138
query20	114	107	104	104
query21	217	138	119	119
query22	13660	13667	13423	13423
query23	17403	16605	16191	16191
query23_1	16286	16226	16325	16226
query24	7487	1809	1307	1307
query24_1	1332	1321	1328	1321
query25	542	425	366	366
query26	1301	316	168	168
query27	2671	566	335	335
query28	4489	2052	2026	2026
query29	1115	619	479	479
query30	311	231	191	191
query31	1107	1069	969	969
query32	105	58	57	57
query33	502	311	246	246
query34	1185	1180	645	645
query35	744	775	684	684
query36	1367	1394	1243	1243
query37	150	106	89	89
query38	1888	1726	1712	1712
query39	930	918	904	904
query39_1	874	916	882	882
query40	225	127	104	104
query41	69	68	67	67
query42	89	94	87	87
query43	328	330	293	293
query44	1513	795	780	780
query45	197	201	175	175
query46	1086	1193	759	759
query47	2393	2352	2245	2245
query48	421	436	314	314
query49	637	482	366	366
query50	1008	385	264	264
query51	4352	4318	4274	4274
query52	83	84	74	74
query53	262	273	192	192
query54	286	230	229	229
query55	76	73	68	68
query56	243	242	237	237
query57	1438	1438	1336	1336
query58	238	220	216	216
query59	1645	1682	1429	1429
query60	300	260	241	241
query61	172	173	170	170
query62	696	654	589	589
query63	234	195	203	195
query64	2578	807	664	664
query65	4930	4834	4831	4831
query66	1788	470	352	352
query67	29791	29654	29593	29593
query68	3199	1572	1000	1000
query69	428	312	277	277
query70	1089	974	959	959
query71	297	241	225	225
query72	2967	2599	2317	2317
query73	847	828	439	439
query74	5114	5003	4774	4774
query75	2619	2586	2216	2216
query76	2328	1222	780	780
query77	364	389	282	282
query78	12414	12487	11885	11885
query79	2227	1194	733	733
query80	1355	484	382	382
query81	528	276	249	249
query82	600	158	120	120
query83	310	274	248	248
query84	267	145	119	119
query85	919	525	415	415
query86	427	302	293	293
query87	1836	1841	1762	1762
query88	3776	2802	2799	2799
query89	425	371	329	329
query90	1852	191	180	180
query91	171	155	137	137
query92	65	57	53	53
query93	1652	1457	854	854
query94	737	350	307	307
query95	675	479	336	336
query96	1061	786	355	355
query97	2708	2701	2557	2557
query98	211	205	196	196
query99	1174	1143	1034	1034
Total cold run time: 260372 ms
Total hot run time: 173944 ms

@hello-stephen

Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 37.50% (3/8) 🎉
Increment coverage report
Complete coverage report

@hello-stephen

Copy link
Copy Markdown
Contributor
ClickBench: Total hot run time: 25.3 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 864b8962ab8f403b0de9478adc243b29a7704df0, data reload: false

query1	0.00	0.00	0.00
query2	0.10	0.04	0.05
query3	0.25	0.14	0.13
query4	1.62	0.14	0.14
query5	0.26	0.23	0.22
query6	1.23	1.08	1.08
query7	0.04	0.01	0.01
query8	0.11	0.04	0.04
query9	0.38	0.33	0.32
query10	0.56	0.60	0.54
query11	0.19	0.14	0.15
query12	0.18	0.15	0.15
query13	0.46	0.48	0.46
query14	1.01	1.00	1.01
query15	0.63	0.61	0.60
query16	0.33	0.31	0.33
query17	1.08	1.07	1.10
query18	0.23	0.21	0.22
query19	1.98	1.95	1.94
query20	0.02	0.01	0.01
query21	15.44	0.23	0.14
query22	4.82	0.05	0.05
query23	16.14	0.32	0.12
query24	2.98	0.44	0.32
query25	0.10	0.07	0.04
query26	0.73	0.20	0.15
query27	0.04	0.04	0.04
query28	3.50	0.85	0.51
query29	12.47	4.33	3.47
query30	0.29	0.15	0.16
query31	2.78	0.58	0.31
query32	3.22	0.59	0.49
query33	3.21	3.24	3.30
query34	15.58	4.19	3.57
query35	3.51	3.50	3.50
query36	0.54	0.44	0.42
query37	0.08	0.06	0.06
query38	0.05	0.04	0.04
query39	0.04	0.03	0.03
query40	0.18	0.17	0.15
query41	0.08	0.04	0.03
query42	0.04	0.03	0.03
query43	0.03	0.04	0.03
Total cold run time: 96.51 s
Total hot run time: 25.3 s

@hello-stephen

Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 4.26% (4/94) 🎉
Increment coverage report
Complete coverage report

@github-actions github-actions Bot added the approved Indicates a PR has been approved by one committer. label Jun 23, 2026
@github-actions

Copy link
Copy Markdown
Contributor

PR approved by at least one committer and no changes requested.

@github-actions

Copy link
Copy Markdown
Contributor

PR approved by anyone and no changes requested.

@morrySnow morrySnow merged commit 4b85732 into apache:master Jun 23, 2026
40 checks passed
@morrySnow morrySnow deleted the merge-topn branch June 23, 2026 06:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants