Skip to content

[feat](olap) Support lazy reading mode for pruned complex columns#59263

Open
mrhhsg wants to merge 8 commits into
apache:masterfrom
mrhhsg:opt_lazy_read
Open

[feat](olap) Support lazy reading mode for pruned complex columns#59263
mrhhsg wants to merge 8 commits into
apache:masterfrom
mrhhsg:opt_lazy_read

Conversation

@mrhhsg

@mrhhsg mrhhsg commented Dec 22, 2025

Copy link
Copy Markdown
Member

What problem does this PR solve?

The subcolumns of a pruned complex-type column can fall into two categories:

  1. Predicate columns — columns required to evaluate filter predicates, which need to be read upfront.
  2. Non-predicate columns — columns that are not needed when evaluating filter predicates.

For non-predicate columns, Doris can defer reading them until after predicate evaluation, reducing unnecessary I/O for nested columns.

This update also preserves predicate metadata paths separately from final lazy-materialization data paths. Predicate evaluation may still need current-level metadata, such as OFFSET for cardinality()/length() and NULL for IS NULL, even when a covering data path is also needed later. The BE nested column iterators consume those current-level metadata paths at the correct iterator level without forwarding them to child iterators or incorrectly switching mixed data reads into metadata-only mode.

This PR also handles predicate-only nested access paths. A complex iterator should skip access-path setup only when both final access paths and predicate access paths are empty. Otherwise predicate-only child or metadata paths may be ignored in lazy read mode, causing predicates such as array element IS NULL to evaluate on unread placeholder data.

This PR also removes references to olap/rowset/segment_v2/column_reader.h from many header files, avoiding large-scale recompilation of source files caused by changes to ColumnReader/ColumnIterator.

Issue Number: None

Related PR: None

Release note

For non-predicate columns, Doris can defer reading them until after predicate evaluation, which may significantly reduce the amount of data read.

Check List (For Author)

  • Test

    • Regression test
      • doris-local-regression --network 10.26.20.3/24 run -d nereids_rules_p0/column_pruning -s string_length_column_pruning
      • doris-local-regression --network 10.26.20.3/24 run -d datatype_p0/complex_types -s test_pruned_columns
      • doris-local-regression --network 10.26.20.3/24 run -d inverted_index_p0/array_contains -s test_index_compaction_null_arr
    • Unit Test
      • ./run-be-ut.sh --run --filter=ColumnReaderTest.*
    • Manual test
      • build-support/clang-format.sh
      • git diff --check
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes. Non-predicate nested columns can be lazily materialized after predicate evaluation; predicate metadata paths and predicate-only nested access paths are preserved for correctness.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@Thearas

Thearas commented Dec 22, 2025

Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@mrhhsg

mrhhsg commented Dec 22, 2025

Copy link
Copy Markdown
Member Author

run buildall

@doris-robot

Copy link
Copy Markdown

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 79.50% (1765/2220)
Line Coverage 65.03% (31062/47763)
Region Coverage 65.53% (15491/23639)
Branch Coverage 56.17% (8239/14668)

@mrhhsg

mrhhsg commented Dec 23, 2025

Copy link
Copy Markdown
Member Author

run buildall

@hello-stephen

Copy link
Copy Markdown
Contributor

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 79.48% (1766/2222)
Line Coverage 64.80% (31237/48205)
Region Coverage 65.30% (15539/23798)
Branch Coverage 55.99% (8266/14764)

@hello-stephen

Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 100.00% (1/1) 🎉
Increment coverage report
Complete coverage report

@hello-stephen

Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 100.00% (1/1) 🎉
Increment coverage report
Complete coverage report

@mrhhsg

mrhhsg commented Dec 29, 2025

Copy link
Copy Markdown
Member Author

run buildall

@hello-stephen

Copy link
Copy Markdown
Contributor

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 79.52% (1771/2227)
Line Coverage 64.85% (31324/48299)
Region Coverage 65.42% (15591/23831)
Branch Coverage 56.02% (8288/14796)

@doris-robot

Copy link
Copy Markdown

BE UT Coverage Report

Increment line coverage 38.84% (127/327) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.37% (18956/35520)
Line Coverage 39.26% (175857/447953)
Region Coverage 33.83% (136113/402320)
Branch Coverage 34.74% (58761/169125)

@mrhhsg

mrhhsg commented Dec 30, 2025

Copy link
Copy Markdown
Member Author

run buildall

@mrhhsg

mrhhsg commented Dec 30, 2025

Copy link
Copy Markdown
Member Author

run buildall

@doris-robot

Copy link
Copy Markdown

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 79.38% (1771/2231)
Line Coverage 64.68% (31323/48424)
Region Coverage 65.24% (15584/23887)
Branch Coverage 55.89% (8289/14832)

@hello-stephen

Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 100.00% (1/1) 🎉
Increment coverage report
Complete coverage report

@doris-robot

Copy link
Copy Markdown
TPC-H: Total hot run time: 34393 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 1734d3f540e3ae83d84d4a498a6e5ff594aee288, data reload: false

------ Round 1 ----------------------------------
q1	17660	4235	4069	4069
q2	2021	377	254	254
q3	10179	1305	747	747
q4	10239	925	338	338
q5	7543	2162	1874	1874
q6	188	170	135	135
q7	934	830	686	686
q8	9280	1451	1115	1115
q9	6880	5153	5160	5153
q10	6754	1795	1436	1436
q11	521	323	271	271
q12	686	724	591	591
q13	17805	3808	3090	3090
q14	288	298	278	278
q15	572	507	497	497
q16	704	695	637	637
q17	730	743	640	640
q18	7569	7378	7345	7345
q19	887	970	608	608
q20	407	374	256	256
q21	4167	3922	3411	3411
q22	1081	1017	962	962
Total cold run time: 107095 ms
Total hot run time: 34393 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4084	4035	4022	4022
q2	335	394	318	318
q3	2070	2602	2243	2243
q4	1315	1764	1355	1355
q5	4121	4062	5061	4062
q6	209	174	133	133
q7	2093	1942	1719	1719
q8	2592	2349	2464	2349
q9	7346	7154	7025	7025
q10	2522	2692	2321	2321
q11	573	485	469	469
q12	719	815	641	641
q13	3713	4018	3314	3314
q14	292	307	279	279
q15	563	521	503	503
q16	681	713	645	645
q17	1204	1355	1373	1355
q18	7918	7921	7766	7766
q19	901	873	966	873
q20	1995	2109	1931	1931
q21	4768	4640	4268	4268
q22	1086	1014	1000	1000
Total cold run time: 51100 ms
Total hot run time: 48591 ms

@doris-robot

Copy link
Copy Markdown
TPC-DS: Total hot run time: 174201 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 1734d3f540e3ae83d84d4a498a6e5ff594aee288, data reload: false

query5	4811	607	457	457
query6	347	238	221	221
query7	4252	457	263	263
query8	362	258	241	241
query9	8776	2641	2644	2641
query10	513	378	326	326
query11	15095	14979	14757	14757
query12	192	119	111	111
query13	1258	489	411	411
query14	6500	2985	2725	2725
query14_1	2666	2607	2653	2607
query15	207	191	179	179
query16	997	479	453	453
query17	1071	701	591	591
query18	2695	438	341	341
query19	224	226	199	199
query20	122	119	120	119
query21	217	156	119	119
query22	3870	4005	3778	3778
query23	15915	15862	15371	15371
query23_1	15575	15427	15401	15401
query24	7551	1595	1208	1208
query24_1	1239	1197	1205	1197
query25	580	479	421	421
query26	1249	276	172	172
query27	2727	457	297	297
query28	4494	2193	2197	2193
query29	828	598	429	429
query30	306	237	218	218
query31	807	649	550	550
query32	81	71	69	69
query33	532	327	285	285
query34	889	878	533	533
query35	750	781	720	720
query36	872	877	811	811
query37	139	94	82	82
query38	2667	2699	2655	2655
query39	771	763	728	728
query39_1	705	701	722	701
query40	213	131	116	116
query41	70	64	64	64
query42	102	103	99	99
query43	440	473	383	383
query44	1376	770	767	767
query45	187	183	174	174
query46	875	992	613	613
query47	1413	1442	1283	1283
query48	323	324	261	261
query49	611	420	352	352
query50	647	282	205	205
query51	3784	3963	3788	3788
query52	110	119	100	100
query53	295	325	283	283
query54	279	263	241	241
query55	78	74	76	74
query56	280	293	278	278
query57	1037	1026	938	938
query58	259	257	250	250
query59	2152	2180	2146	2146
query60	315	319	294	294
query61	162	162	157	157
query62	401	371	310	310
query63	301	264	277	264
query64	4963	1293	994	994
query65	3776	3763	3741	3741
query66	1362	428	314	314
query67	15339	14707	15389	14707
query68	4810	1045	734	734
query69	495	337	302	302
query70	1075	979	930	930
query71	365	304	271	271
query72	6353	5150	5040	5040
query73	757	681	324	324
query74	8773	8824	8571	8571
query75	2903	2886	2491	2491
query76	3887	1064	658	658
query77	520	381	274	274
query78	9857	9951	9139	9139
query79	1018	864	610	610
query80	1141	568	492	492
query81	549	260	228	228
query82	405	147	109	109
query83	362	257	246	246
query84	251	117	99	99
query85	913	528	454	454
query86	389	287	321	287
query87	2887	2855	2739	2739
query88	3273	2286	2305	2286
query89	378	359	341	341
query90	1935	163	151	151
query91	173	173	143	143
query92	73	68	64	64
query93	1081	943	564	564
query94	634	326	287	287
query95	593	327	298	298
query96	594	461	217	217
query97	2311	2418	2263	2263
query98	214	199	204	199
query99	595	572	508	508
Total cold run time: 252828 ms
Total hot run time: 174201 ms

@doris-robot

Copy link
Copy Markdown
ClickBench: Total hot run time: 28 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 1734d3f540e3ae83d84d4a498a6e5ff594aee288, data reload: false

query1	0.05	0.04	0.05
query2	0.10	0.04	0.05
query3	0.25	0.08	0.08
query4	1.61	0.11	0.11
query5	0.28	0.26	0.27
query6	1.15	0.67	0.65
query7	0.03	0.02	0.03
query8	0.06	0.04	0.05
query9	0.56	0.50	0.49
query10	0.56	0.54	0.55
query11	0.16	0.11	0.11
query12	0.16	0.13	0.13
query13	0.61	0.60	0.61
query14	0.99	0.98	0.98
query15	0.80	0.79	0.80
query16	0.41	0.40	0.41
query17	1.09	1.05	1.03
query18	0.23	0.21	0.22
query19	1.84	1.87	1.85
query20	0.02	0.02	0.01
query21	15.41	0.26	0.14
query22	5.10	0.05	0.05
query23	15.92	0.30	0.11
query24	0.94	0.70	0.71
query25	0.07	0.08	0.09
query26	0.14	0.14	0.14
query27	0.08	0.08	0.08
query28	5.38	1.07	0.88
query29	12.67	4.01	3.16
query30	0.28	0.13	0.12
query31	2.83	0.65	0.38
query32	3.23	0.56	0.47
query33	2.93	2.99	3.01
query34	16.71	5.14	4.51
query35	4.50	4.92	5.01
query36	0.70	0.55	0.56
query37	0.11	0.07	0.06
query38	0.07	0.05	0.03
query39	0.04	0.03	0.03
query40	0.18	0.14	0.14
query41	0.08	0.03	0.03
query42	0.04	0.02	0.02
query43	0.04	0.03	0.03
Total cold run time: 98.41 s
Total hot run time: 28 s

@hello-stephen

Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 100.00% (1/1) 🎉
Increment coverage report
Complete coverage report

@hello-stephen

Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 38.81% (130/335) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.36% (18957/35529)
Line Coverage 39.25% (175956/448305)
Region Coverage 33.80% (136092/402611)
Branch Coverage 34.74% (58816/169288)

@hello-stephen

Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 82.39% (276/335) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 72.21% (25080/34734)
Line Coverage 58.96% (263613/447113)
Region Coverage 53.85% (219041/406774)
Branch Coverage 55.38% (94059/169850)

@mrhhsg

mrhhsg commented Dec 31, 2025

Copy link
Copy Markdown
Member Author

run buildall

@doris-robot

Copy link
Copy Markdown

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 79.52% (1774/2231)
Line Coverage 64.82% (31417/48466)
Region Coverage 65.38% (15626/23901)
Branch Coverage 56.05% (8320/14844)

@doris-robot

Copy link
Copy Markdown
TPC-H: Total hot run time: 35535 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 91706c6ce9208368a2edf2c6aa3cb7d5471ef1dd, data reload: false

------ Round 1 ----------------------------------
q1	17625	4313	4063	4063
q2	2028	360	246	246
q3	10166	1308	748	748
q4	10240	853	323	323
q5	7750	2177	1873	1873
q6	221	167	134	134
q7	934	801	669	669
q8	9276	1419	1130	1130
q9	6924	5202	5210	5202
q10	6842	1824	1422	1422
q11	519	295	278	278
q12	727	739	591	591
q13	17810	3833	3070	3070
q14	290	296	281	281
q15	583	506	502	502
q16	723	680	665	665
q17	706	764	608	608
q18	7671	7522	8044	7522
q19	1137	1048	641	641
q20	431	378	265	265
q21	4509	4232	4238	4232
q22	1158	1095	1070	1070
Total cold run time: 108270 ms
Total hot run time: 35535 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4336	4261	4188	4188
q2	328	410	321	321
q3	2236	2949	2460	2460
q4	1446	1874	1401	1401
q5	4565	4271	4497	4271
q6	223	171	128	128
q7	2003	1909	1722	1722
q8	2521	2359	2314	2314
q9	7153	7495	7047	7047
q10	2477	2518	2153	2153
q11	527	460	452	452
q12	672	696	563	563
q13	3368	3813	3036	3036
q14	274	291	268	268
q15	531	488	490	488
q16	605	641	625	625
q17	1075	1295	1374	1295
q18	7230	7495	7128	7128
q19	828	822	855	822
q20	1910	1994	1816	1816
q21	4555	4308	4153	4153
q22	1113	1021	982	982
Total cold run time: 49976 ms
Total hot run time: 47633 ms

@doris-robot

Copy link
Copy Markdown
TPC-DS: Total hot run time: 175439 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 91706c6ce9208368a2edf2c6aa3cb7d5471ef1dd, data reload: false

query5	4423	588	449	449
query6	347	245	221	221
query7	4207	453	276	276
query8	356	256	240	240
query9	8766	2615	2636	2615
query10	508	346	327	327
query11	15259	14989	14821	14821
query12	177	117	112	112
query13	1273	505	399	399
query14	6285	2952	2700	2700
query14_1	2593	2589	2595	2589
query15	203	195	189	189
query16	1007	393	464	393
query17	1100	699	590	590
query18	2439	445	352	352
query19	223	231	198	198
query20	123	117	117	117
query21	222	139	119	119
query22	4170	4224	4248	4224
query23	15999	15676	15231	15231
query23_1	15603	15487	15412	15412
query24	7590	1589	1208	1208
query24_1	1242	1201	1228	1201
query25	570	479	432	432
query26	1271	277	167	167
query27	2750	459	300	300
query28	4523	2209	2191	2191
query29	811	549	471	471
query30	308	239	215	215
query31	780	642	555	555
query32	77	67	63	63
query33	518	319	290	290
query34	898	876	527	527
query35	732	781	693	693
query36	835	835	832	832
query37	124	87	74	74
query38	2762	2730	2656	2656
query39	764	742	736	736
query39_1	730	720	718	718
query40	223	131	115	115
query41	67	63	65	63
query42	106	103	104	103
query43	435	438	388	388
query44	1330	745	749	745
query45	193	184	173	173
query46	861	975	609	609
query47	1353	1513	1456	1456
query48	314	367	246	246
query49	599	411	327	327
query50	638	278	210	210
query51	3783	3834	3790	3790
query52	101	104	94	94
query53	287	328	270	270
query54	288	251	245	245
query55	75	73	70	70
query56	286	285	290	285
query57	1008	1031	964	964
query58	273	253	252	252
query59	2070	2119	2187	2119
query60	323	312	304	304
query61	156	159	157	157
query62	407	350	301	301
query63	295	266	270	266
query64	4982	1274	986	986
query65	3831	3662	3740	3662
query66	1434	437	317	317
query67	15140	15677	15814	15677
query68	6000	1023	724	724
query69	500	348	314	314
query70	1054	875	954	875
query71	361	302	273	273
query72	6336	4806	4759	4759
query73	681	571	304	304
query74	8783	8824	8552	8552
query75	2910	2878	2534	2534
query76	3874	1062	652	652
query77	516	380	278	278
query78	9831	9886	9182	9182
query79	1016	958	620	620
query80	945	585	473	473
query81	536	263	228	228
query82	407	139	111	111
query83	272	267	248	248
query84	251	120	109	109
query85	892	527	449	449
query86	377	320	318	318
query87	2931	2849	2756	2756
query88	3294	2260	2321	2260
query89	401	364	342	342
query90	1982	159	154	154
query91	169	163	141	141
query92	68	65	63	63
query93	1095	979	570	570
query94	638	321	294	294
query95	581	329	313	313
query96	595	472	208	208
query97	2323	2378	2291	2291
query98	210	201	198	198
query99	587	577	512	512
Total cold run time: 252888 ms
Total hot run time: 175439 ms

@doris-robot

Copy link
Copy Markdown
ClickBench: Total hot run time: 27.99 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 91706c6ce9208368a2edf2c6aa3cb7d5471ef1dd, data reload: false

query1	0.05	0.04	0.04
query2	0.14	0.08	0.07
query3	0.32	0.08	0.07
query4	1.61	0.12	0.11
query5	0.26	0.24	0.24
query6	1.15	0.66	0.65
query7	0.02	0.02	0.03
query8	0.07	0.06	0.07
query9	0.57	0.51	0.48
query10	0.56	0.57	0.55
query11	0.26	0.13	0.14
query12	0.27	0.15	0.15
query13	0.63	0.61	0.61
query14	1.01	1.00	1.01
query15	0.88	0.81	0.81
query16	0.40	0.41	0.42
query17	1.07	1.00	1.07
query18	0.23	0.22	0.21
query19	1.96	1.81	1.86
query20	0.02	0.02	0.02
query21	15.39	0.28	0.25
query22	5.00	0.10	0.10
query23	15.44	0.41	0.23
query24	2.42	0.46	0.31
query25	0.10	0.10	0.10
query26	0.18	0.17	0.17
query27	0.09	0.09	0.09
query28	3.64	1.13	0.97
query29	12.59	4.03	3.23
query30	0.32	0.12	0.11
query31	2.81	0.66	0.44
query32	3.25	0.61	0.50
query33	3.07	3.08	3.05
query34	16.82	5.14	4.50
query35	4.51	4.53	4.53
query36	0.61	0.49	0.49
query37	0.25	0.09	0.08
query38	0.22	0.05	0.06
query39	0.07	0.05	0.05
query40	0.21	0.17	0.15
query41	0.13	0.07	0.06
query42	0.07	0.05	0.05
query43	0.06	0.05	0.04
Total cold run time: 98.73 s
Total hot run time: 27.99 s

@mrhhsg

mrhhsg commented Jun 25, 2026

Copy link
Copy Markdown
Member Author

run cloud_p0

@hello-stephen

Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 65.69% (381/580) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 54.97% (21624/39338)
Line Coverage 38.42% (206572/537705)
Region Coverage 34.50% (162664/471471)
Branch Coverage 35.52% (71285/200681)

@hello-stephen

Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 69.48% (403/580) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 64.33% (24700/38396)
Line Coverage 48.03% (256727/534507)
Region Coverage 44.81% (212191/473572)
Branch Coverage 45.91% (92250/200924)

mrhhsg added a commit to mrhhsg/doris that referenced this pull request Jun 25, 2026
Issue Number: None

Related PR: apache#59263

Problem Summary: Struct lazy-read access-path routing could skip a child that only appeared in predicate access paths. For example, with projection path `s.a` and predicate path `s.b`, `StructFileColumnIterator::set_access_paths()` decided whether to route a child only from ordinary projection access paths, so child `b` was marked as skipped before its predicate path was forwarded. This change collects both projection and predicate subpaths for each child first, and routes the child when either side requires it.

None

- Test: Unit Test
    - `./run-be-ut.sh --run --filter=ColumnReaderTest.StructPredicateOnlyChildPathStillRoutesToChild:ColumnReaderTest.MapFullProjectionStillRoutesPredicateSubPaths:ColumnReaderTest.AccessPathsPropagatePredicateToChildren`
    - `./run-be-ut.sh --run --filter=ColumnReaderTest.*`
- Behavior changed: No
- Does this need documentation: No
@hello-stephen

Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 66.67% (14/21) 🎉
Increment coverage report
Complete coverage report

mrhhsg added a commit to mrhhsg/doris that referenced this pull request Jun 26, 2026
Issue Number: None

Related PR: apache#59263

Problem Summary: Struct lazy-read access-path routing could skip a child that only appeared in predicate access paths. For example, with projection path `s.a` and predicate path `s.b`, `StructFileColumnIterator::set_access_paths()` decided whether to route a child only from ordinary projection access paths, so child `b` was marked as skipped before its predicate path was forwarded. This change collects both projection and predicate subpaths for each child first, and routes the child when either side requires it.

None

- Test: Unit Test
    - `./run-be-ut.sh --run --filter=ColumnReaderTest.StructPredicateOnlyChildPathStillRoutesToChild:ColumnReaderTest.MapFullProjectionStillRoutesPredicateSubPaths:ColumnReaderTest.AccessPathsPropagatePredicateToChildren`
    - `./run-be-ut.sh --run --filter=ColumnReaderTest.*`
- Behavior changed: No
- Does this need documentation: No
@hello-stephen

Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 95.66% (617/645) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 74.23% (28509/38404)
Line Coverage 58.09% (310577/534645)
Region Coverage 54.92% (260142/473633)
Branch Coverage 56.23% (113009/200970)

@hello-stephen

Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 46.67% (14/30) 🎉
Increment coverage report
Complete coverage report

mrhhsg added a commit to mrhhsg/doris that referenced this pull request Jun 26, 2026
Issue Number: None

Related PR: apache#59263

Problem Summary: Struct lazy-read access-path routing could skip a child that only appeared in predicate access paths. For example, with projection path `s.a` and predicate path `s.b`, `StructFileColumnIterator::set_access_paths()` decided whether to route a child only from ordinary projection access paths, so child `b` was marked as skipped before its predicate path was forwarded. This change collects both projection and predicate subpaths for each child first, and routes the child when either side requires it.

None

- Test: Unit Test
    - `./run-be-ut.sh --run --filter=ColumnReaderTest.StructPredicateOnlyChildPathStillRoutesToChild:ColumnReaderTest.MapFullProjectionStillRoutesPredicateSubPaths:ColumnReaderTest.AccessPathsPropagatePredicateToChildren`
    - `./run-be-ut.sh --run --filter=ColumnReaderTest.*`
- Behavior changed: No
- Does this need documentation: No
@hello-stephen

Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 42.86% (9/21) 🎉
Increment coverage report
Complete coverage report

@hello-stephen

Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 4.42% (13/294) 🎉
Increment coverage report
Complete coverage report

@hello-stephen

Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 68.81% (450/654) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 54.92% (21632/39391)
Line Coverage 38.42% (206729/538146)
Region Coverage 34.49% (162707/471696)
Branch Coverage 35.53% (71339/200781)

@hello-stephen

Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 72.02% (471/654) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 64.26% (24708/38449)
Line Coverage 48.00% (256748/534939)
Region Coverage 44.77% (212096/473779)
Branch Coverage 45.88% (92217/201014)

@hello-stephen

Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 3.85% (13/338) 🎉
Increment coverage report
Complete coverage report

@hello-stephen

Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 72.02% (471/654) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 64.26% (24708/38449)
Line Coverage 48.00% (256748/534939)
Region Coverage 44.77% (212096/473779)
Branch Coverage 45.88% (92217/201014)

@hello-stephen

Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 3.85% (13/338) 🎉
Increment coverage report
Complete coverage report

mrhhsg added 7 commits June 27, 2026 18:05
Issue Number: None

Related PR: apache#59263

Problem Summary: Pruned complex columns can separate predicate subcolumns from non-predicate subcolumns. Predicate branches must be read before filter evaluation, while non-predicate branches can be deferred until after filtering to avoid unnecessary IO and materialization. This change adds the scan and iterator state needed to keep predicate, lazy, meta-only, and skipped branches distinct for struct/map/array columns, recovers placeholder columns after lazy materialization, and keeps nested iterator reading flags explicit. It also avoids generating TopN runtime filters for unsupported complex order key types so BE does not receive invalid runtime predicate keys.

Support lazy reading of non-predicate fields for pruned complex columns, and avoid unsupported complex TopN runtime filter keys.

- Test: Unit Test / Build / Format
    - `./run-fe-ut.sh --run org.apache.doris.nereids.postprocess.TopNRuntimeFilterTest#testNotUseTopNRfForUnsupportedComplexOrderKey`
    - `ninja -C be/ut_build_ASAN src/exec/CMakeFiles/Exec.dir/operator/olap_scan_operator.cpp.o src/exec/CMakeFiles/Exec.dir/scan/olap_scanner.cpp.o src/storage/CMakeFiles/Storage.dir/segment/column_reader.cpp.o src/storage/CMakeFiles/Storage.dir/segment/segment_iterator.cpp.o test/CMakeFiles/doris_be_test.dir/storage/segment/column_reader_test.cpp.o`
    - `build-support/clang-format.sh be/src/exec/operator/olap_scan_operator.cpp be/src/exec/operator/olap_scan_operator.h be/src/exec/scan/olap_scanner.cpp be/src/runtime/runtime_state.h be/src/storage/olap_common.h be/src/storage/segment/column_reader.cpp be/src/storage/segment/column_reader.h be/src/storage/segment/segment_iterator.cpp be/src/storage/segment/segment_iterator.h be/test/storage/segment/column_reader_test.cpp`
    - `git diff --check && git diff --cached --check`
- Behavior changed: Yes. Non-predicate subcolumns of pruned complex columns can be read lazily, and unsupported complex TopN order keys no longer create runtime filters.
- Does this need documentation: No
Issue Number: None

Related PR: apache#59263

Problem Summary: Nested column pruning stripped predicate metadata paths when a non-predicate data path covered the same nested container for lazy materialization. A query that projected a nested map-array-struct field but filtered with cardinality(map['key']) could evaluate the predicate without value-array offsets and filter out matching rows. Keep predicate metadata paths covered by predicate-phase paths, preserve final predicate metadata paths even when lazy materialization covers the corresponding data path, and avoid forwarding current-level array metadata predicate paths to item iterators.

Fix nested complex column pruning for predicates that need array/map metadata.

- Test:

    - Build: `~/.codex/skills/doris-local-regression/scripts/doris-local-regression.sh --network 10.26.20.3/24 all -d nereids_rules_p0/column_pruning -s string_length_column_pruning` built FE/BE successfully.

    - Regression test: `~/.codex/skills/doris-local-regression/scripts/doris-local-regression.sh --network 10.26.20.3/24 start && ~/.codex/skills/doris-local-regression/scripts/doris-local-regression.sh --network 10.26.20.3/24 run -d nereids_rules_p0/column_pruning -s string_length_column_pruning`.

    - Code style: `build-support/clang-format.sh be/src/storage/segment/column_reader.cpp`, `build-support/check-format.sh be/src/storage/segment/column_reader.cpp`, `git diff --check`, and `git diff --cached --check`.

    - Static analysis: attempted `build-support/run-clang-tidy.sh --build-dir be/build_Release`, but it failed on existing analysis diagnostics outside the changed lines and a missing system `stddef.h` resource include.

- Behavior changed: Yes. Nested predicate metadata paths are preserved so filtered lazy reads evaluate predicates with the required array/map metadata.

- Does this need documentation: No
### What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: Clarify the NestedColumnPruning phase 1.5 comment so it matches the current predicate metadata path stripping logic. Predicate access paths are stripped only by predicate-phase paths, while all access paths are stripped by self-covering paths.

### Release note

None

### Check List (For Author)

- Test: No need to test (comment-only change); ran git diff --check and git diff --cached --check

- Behavior changed: No

- Does this need documentation: No
### What problem does this PR solve?

Issue Number: None

Related PR: apache#59263

Problem Summary:
Nested column pruning could treat metadata paths used by predicate evaluation as redundant when a covering data path also existed for final lazy materialization. This is unsafe because predicate evaluation still needs current-level metadata, such as OFFSET for cardinality()/length() and NULL for IS NULL, before row filtering. Removing these paths could make BE enter metadata-only modes incorrectly or forward current-level metadata paths to child iterators.

This change keeps predicate metadata paths in the FE plan, removes the obsolete MetaPathStriper logic, and updates BE nested column iterators to consume current-level metadata paths at the correct iterator level without letting redundant metadata paths switch mixed data reads into meta-only mode. Regression expectations are updated to assert the retained metadata-path contract.

### Release note

None

### Check List (For Author)

- Test:
    - Regression test: `doris-local-regression --network 10.26.20.3/24 run -d nereids_rules_p0/column_pruning`
    - Manual test: `build-support/clang-format.sh`
    - Manual test: `git diff --check`
- Behavior changed: No
- Does this need documentation: No
Issue Number: None

Related PR: None

Problem Summary: Predicate evaluation can require nested access paths even when final materialization does not need any data access path at the current complex iterator level. The complex column iterators previously returned early when all access paths were empty, which ignored predicate-only metadata or child access paths. This could leave predicate-only nested columns unread in lazy read mode and produce incorrect predicate results for cases such as array element IS NULL predicates. This change only skips access-path setup when both final access paths and predicate access paths are empty.

None

- Test: Unit Test and Regression test
    - Unit Test: ./run-be-ut.sh --run --filter=ColumnReaderTest.*
    - Regression test: doris-local-regression --network 10.26.20.3/24 run -d nereids_rules_p0/column_pruning -s string_length_column_pruning
    - Regression test: doris-local-regression --network 10.26.20.3/24 run -d datatype_p0/complex_types -s test_pruned_columns
    - Regression test: doris-local-regression --network 10.26.20.3/24 run -d inverted_index_p0/array_contains -s test_index_compaction_null_arr
    - Manual test: build-support/clang-format.sh
    - Manual test: git diff --check
- Behavior changed: No
- Does this need documentation: No
Issue Number: None

Related PR: apache#59263

Problem Summary: Struct lazy-read access-path routing could skip a child that only appeared in predicate access paths. For example, with projection path `s.a` and predicate path `s.b`, `StructFileColumnIterator::set_access_paths()` decided whether to route a child only from ordinary projection access paths, so child `b` was marked as skipped before its predicate path was forwarded. This change collects both projection and predicate subpaths for each child first, and routes the child when either side requires it.

None

- Test: Unit Test
    - `./run-be-ut.sh --run --filter=ColumnReaderTest.StructPredicateOnlyChildPathStillRoutesToChild:ColumnReaderTest.MapFullProjectionStillRoutesPredicateSubPaths:ColumnReaderTest.AccessPathsPropagatePredicateToChildren`
    - `./run-be-ut.sh --run --filter=ColumnReaderTest.*`
- Behavior changed: No
- Does this need documentation: No
### What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: Add focused coverage for ColumnReader nested iterator routing, including access-path modes, prefetcher collection, next_batch behavior, and read_by_rowids behavior for STRUCT/ARRAY/MAP nullable and meta-only cases. Extend the nested container offset pruning regression case to cover mixed empty/non-empty containers and predicate/output access-path combinations.

### Release note

None

### Check List (For Author)

- Test: Regression test / Unit Test

    - DORIS_HOME=/mnt/disk7/hushenggang/doris-fix-spill ninja -C be/ut_build_ASAN test/CMakeFiles/doris_be_test.dir/storage/segment/column_reader_test.cpp.o

    - DORIS_HOME=/mnt/disk7/hushenggang/doris-fix-spill ./run-be-ut.sh --run --filter=ColumnReaderTest.*

    - doris-local-regression.sh --network 10.26.20.3/24 run -d nereids_rules_p0/column_pruning -s nested_container_offset_pruning -forceGenOut

    - doris-local-regression.sh --network 10.26.20.3/24 run -d nereids_rules_p0/column_pruning -s nested_container_offset_pruning

    - build-support/clang-format.sh be/test/storage/segment/column_reader_test.cpp

    - build-support/check-format.sh

    - git diff --check

- Behavior changed: No

- Does this need documentation: No
@hello-stephen

Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 76.30% (499/654) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 64.27% (24710/38449)
Line Coverage 48.00% (256789/534948)
Region Coverage 44.78% (212153/473797)
Branch Coverage 45.89% (92249/201024)

### What problem does this PR solve?

Issue Number: None

Related PR: apache#59263

Problem Summary: Extract lazy-pruned column recovery in SegmentIterator into a private helper so the internal rowid mapping, lazy read phase, read_by_rowids invocation, and finalize_lazy_phase behavior can be tested directly. Add white-box UT coverage for non-empty filtered selection and empty selection finalize path. Also add a regression profile assertion with a SQL shape that filters on one nested struct path and projects another path from the same parent column, so SegmentIterator must recover lazy-pruned output data after common-expression filtering.

### Release note

None

### Check List (For Author)

- Test: Unit Test / Regression test
    - ./run-be-ut.sh --run --filter=SegmentIteratorLazyPrunedTest.*
    - Attempted: doris-local-regression run -d datatype_p0/complex_types -s test_pruned_columns; did not complete because the local FE query port refused connections before the suite executed.
- Behavior changed: No
- Does this need documentation: No
@hello-stephen

Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 76.90% (506/658) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 54.94% (21640/39392)
Line Coverage 38.44% (206887/538150)
Region Coverage 34.56% (163029/471704)
Branch Coverage 35.56% (71389/200784)

@hello-stephen

Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 92.71% (610/658) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 74.18% (28521/38450)
Line Coverage 58.09% (310773/534952)
Region Coverage 54.99% (260553/473805)
Branch Coverage 56.26% (113090/201027)

@hello-stephen

Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 66.67% (14/21) 🎉
Increment coverage report
Complete coverage report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants