Skip to content

[fix](fe) Add MergeProjectable after ColumnPruning in DPHyp join reorder to merge consecutive projects#64409

Draft
starocean999 wants to merge 1 commit into
apache:masterfrom
starocean999:master_0424
Draft

[fix](fe) Add MergeProjectable after ColumnPruning in DPHyp join reorder to merge consecutive projects#64409
starocean999 wants to merge 1 commit into
apache:masterfrom
starocean999:master_0424

Conversation

@starocean999

Copy link
Copy Markdown
Contributor

Related PR: (#61146)

After DPHyp join reorder, the ColumnPruning rule may produce consecutive Project nodes in the plan tree. Subsequent optimization rules expect normalized plan shapes and may not handle chains of consecutive projects correctly, leading to plan corruption or incorrect results.

This is because the DPHyp reorder path runs a separate rewrite pipeline (pushDownRewrite → columnPrune) on the reordered plan before re-inserting it into the memo. Unlike the main rewrite pipeline which includes MergeProjectable in its standard rule sequence, the DPHyp post-reorder pipeline omitted this cleanup step.

Fix: Add MergeProjectable after ColumnPruning in the DPHyp rewrite pipeline within Optimizer.dpHypOptimize(). This ensures that any consecutive Project nodes generated by column pruning are merged into a single project, maintaining a normalized plan shape for downstream rules.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@starocean999

Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen

Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-H: Total hot run time: 29202 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 1ee6eea51fb2c366ccacbf68237543819b3bc0ca, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17681	4140	4023	4023
q2	q3	10766	1385	821	821
q4	4685	472	340	340
q5	7527	891	590	590
q6	178	172	137	137
q7	765	887	623	623
q8	9337	1580	1656	1580
q9	5793	4494	4493	4493
q10	6743	1797	1518	1518
q11	445	269	255	255
q12	626	420	289	289
q13	18140	3427	2829	2829
q14	268	261	238	238
q15	q16	815	764	707	707
q17	986	841	975	841
q18	6850	5839	5570	5570
q19	1309	1293	1111	1111
q20	515	412	256	256
q21	6109	2878	2657	2657
q22	459	368	324	324
Total cold run time: 99997 ms
Total hot run time: 29202 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	5236	4902	4835	4835
q2	q3	4891	5364	4604	4604
q4	2169	2220	1392	1392
q5	4789	4936	4658	4658
q6	235	184	134	134
q7	1885	1765	1552	1552
q8	2410	2094	2130	2094
q9	7904	7392	7449	7392
q10	4759	4677	4183	4183
q11	522	384	353	353
q12	719	743	526	526
q13	3064	3362	2842	2842
q14	268	280	250	250
q15	q16	675	698	609	609
q17	1282	1265	1254	1254
q18	7271	6793	6863	6793
q19	1184	1077	1091	1077
q20	2229	2213	1928	1928
q21	5388	4642	4514	4514
q22	533	458	417	417
Total cold run time: 57413 ms
Total hot run time: 51407 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-DS: Total hot run time: 168511 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 1ee6eea51fb2c366ccacbf68237543819b3bc0ca, data reload: false

query5	4338	629	478	478
query6	429	192	173	173
query7	4809	549	292	292
query8	375	213	207	207
query9	8729	3943	4021	3943
query10	437	320	262	262
query11	5927	2331	2150	2150
query12	150	103	100	100
query13	1301	570	437	437
query14	6349	5409	5050	5050
query14_1	4344	4365	4292	4292
query15	199	194	171	171
query16	982	476	404	404
query17	1054	677	561	561
query18	2414	472	327	327
query19	192	175	140	140
query20	110	104	102	102
query21	207	135	112	112
query22	13651	13896	13427	13427
query23	17343	16526	16101	16101
query23_1	16323	16260	16286	16260
query24	7527	1754	1304	1304
query24_1	1260	1304	1288	1288
query25	561	488	397	397
query26	1303	333	166	166
query27	2677	575	334	334
query28	4400	2012	2000	2000
query29	1094	667	495	495
query30	309	239	198	198
query31	1117	1076	969	969
query32	111	62	59	59
query33	532	321	249	249
query34	1170	1154	659	659
query35	762	783	681	681
query36	1409	1405	1250	1250
query37	157	103	96	96
query38	3204	3142	3021	3021
query39	936	933	911	911
query39_1	878	904	865	865
query40	220	125	106	106
query41	68	66	64	64
query42	95	95	98	95
query43	334	322	278	278
query44	
query45	201	188	181	181
query46	1064	1186	753	753
query47	2354	2382	2235	2235
query48	415	415	293	293
query49	639	497	359	359
query50	991	365	254	254
query51	4350	4362	4272	4272
query52	89	89	79	79
query53	245	262	196	196
query54	304	221	203	203
query55	87	79	70	70
query56	245	229	216	216
query57	1440	1417	1358	1358
query58	244	232	211	211
query59	1555	1637	1471	1471
query60	294	241	245	241
query61	175	169	197	169
query62	709	644	589	589
query63	227	194	191	191
query64	2541	761	604	604
query65	
query66	1808	456	331	331
query67	29628	29676	29437	29437
query68	
query69	416	297	250	250
query70	941	969	961	961
query71	301	216	204	204
query72	2874	2794	2390	2390
query73	916	766	421	421
query74	5116	4954	4772	4772
query75	2640	2528	2232	2232
query76	2344	1144	761	761
query77	348	383	282	282
query78	12478	12402	11856	11856
query79	1405	1114	755	755
query80	591	464	390	390
query81	454	283	239	239
query82	592	150	121	121
query83	353	275	250	250
query84	
query85	843	506	406	406
query86	364	305	277	277
query87	3401	3344	3204	3204
query88	3582	2748	2677	2677
query89	421	375	329	329
query90	2000	182	176	176
query91	171	156	141	141
query92	66	62	62	62
query93	1494	1542	951	951
query94	569	366	302	302
query95	661	480	345	345
query96	1008	822	372	372
query97	2739	2698	2592	2592
query98	208	205	196	196
query99	1133	1174	1031	1031
Total cold run time: 250114 ms
Total hot run time: 168511 ms

@hello-stephen

Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 100.00% (3/3) 🎉
Increment coverage report
Complete coverage report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants