Skip to content

[fix](load) only load source scanners update load counters#63781

Merged
liaoxin01 merged 2 commits into
apache:masterfrom
liaoxin01:fix-cir-20393-master
Jun 25, 2026
Merged

[fix](load) only load source scanners update load counters#63781
liaoxin01 merged 2 commits into
apache:masterfrom
liaoxin01:fix-cir-20393-master

Conversation

@liaoxin01

@liaoxin01 liaoxin01 commented May 28, 2026

Copy link
Copy Markdown
Contributor

What

  • Gate the load counter update in Scanner::_collect_profile_before_close() on a new virtual _should_update_load_counters():
    • Base Scanner reports only when _is_load (classic stream/broker/routine load scanners with src tuple desc).
    • FileScanner additionally reports for FILE_STREAM scans: TVF based loads (http_stream, group commit) plan the load source as a tvf query scan without src tuple desc (_is_load is false), but their WHERE clause filtered rows must still be reported as NumberUnselectedRows / counted into NumberTotalRows.
  • Add a deterministic regression case covering INSERT-SELECT / DELETE / UPDATE whose scans filter out all rows.

Why

For DELETE/UPDATE/INSERT INTO ... SELECT executed through the insert path, rows filtered by query scan predicates (including runtime filters) were added to the RuntimeState load counters. When all scanned rows are filtered, num_rows_load_success() (total - filtered - unselected) goes negative, BE reports a negative dpp.norm.ALL, and FE fails the insert_max_filter_ratio check with errors like:

Insert has too many filtered data 0/-2 insert_max_filter_ratio is 1.000000

This only triggers with enable_insert_strict=false and insert_max_filter_ratio > 0 (the strict branch only checks filteredRows > 0). The intermittency in the field comes from runtime filter arrival timing: rows are only counted when the RF arrives in time to filter inside the scanner.

This was historically gated by if (!enable_profile && !_is_load) return; (already buggy with enable_profile=true), and became unconditional after #57314 removed the early return.

Compared to the previous revision of this PR (thrift skip_query_scan_load_counters option set by FE for DELETE/UPDATE):

  • No thrift / FE changes needed.
  • Also fixes plain INSERT INTO ... SELECT: AbstractInsertExecutor sets query_type=LOAD for all insert-path commands, so the query-type based gate still let OlapScanner predicate-filtered rows pollute the counters there.
  • FILE_STREAM is a precise discriminator: only the http_stream / group_commit TVFs use it, and they require a backend id on the ConnectContext (only present for stream-load style HTTP requests), so they can never appear in a normal query/DELETE/UPDATE.

FileScanner::_counter.num_rows_filtered is only accumulated in _convert_to_output_block (load-only path), so for http_stream the NumberFilteredRows reported to clients comes from sink validation and is unaffected by this gate.

Test

  • New regression case regression-test/suites/load_p0/insert/test_scan_filtered_rows_not_pollute_load_counter.groovy: uses value-column predicates on an AGGREGATE KEY table (cannot be pushed down to storage, evaluated by scanner conjuncts) to deterministically reproduce; before this fix the INSERT-SELECT/DELETE/UPDATE statements fail with 0/-10.
  • test_group_commit_http_stream semantics preserved: insert into ... select from http_stream(...) where ... still reports NumberUnselectedRows/NumberTotalRows (the previous _is_load-only attempt broke this with expected: <6> but was: <5>).

Issue: CIR-20393

Copilot AI review requested due to automatic review settings May 28, 2026 03:30
@hello-stephen

Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes load-quality counter accounting in BE scanners by ensuring only load scanners update RuntimeState’s load filtered/unselected row counters, preventing query-side predicate filtering from polluting INSERT/DELETE load statistics (especially when enable_profile is enabled).

Changes:

  • Gate RuntimeState::update_num_rows_load_filtered/unselected() updates behind _is_load in Scanner::_collect_profile_before_close().
  • Clarify via comments that query predicate filtering must not contribute to load-quality counters.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread be/src/exec/scan/scanner.cpp Outdated
@liaoxin01 liaoxin01 force-pushed the fix-cir-20393-master branch from 6a52d13 to 3b09c76 Compare May 28, 2026 15:39
@liaoxin01

Copy link
Copy Markdown
Contributor Author

/review

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review summary: no blocking issues found.

Critical checkpoint conclusions:

  • Goal/test: The PR restricts load quality counter updates to scanners that are actually part of load processing. This matches the described DELETE/subquery failure mode, and the added regression enables profile and exercises DELETE with a TVF subquery predicate.
  • Scope/focus: The production change is small and focused; it preserves scan profile counter updates while excluding only load-quality counters for non-load scanners.
  • Concurrency/lifecycle: No new shared mutable state, threads, locks, lifecycle ownership, or static initialization concerns were introduced. The existing atomic RuntimeState counters remain used only for statistics/accounting.
  • Compatibility/configuration: No protocol, storage format, persisted metadata, or configuration changes are introduced.
  • Parallel paths: The central Scanner::_collect_profile_before_close path covers OlapScanner/FileScanner subclasses that delegate to it; load scanners still update load filtered/unselected counters through the existing _is_load classification.
  • Error handling/data correctness: No ignored Status or visibility-version/delete-bitmap changes. The change prevents query-side predicate filtering from corrupting load success/filtered row accounting.
  • Performance/observability: The added branch is trivial and not on a hot per-row path; existing scan profile counters remain collected for observability.
  • Test coverage: Regression coverage was added for the profile-enabled DELETE + TVF subquery case. I did not run the regression suite in this runner.

User focus: No additional user-provided review focus was specified.

@liaoxin01

Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-H: Total hot run time: 31896 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 3b09c7607eaad3d8e7fa7bc922df315d9faa71e1, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17848	4295	4176	4176
q2	q3	10752	1388	825	825
q4	4685	479	354	354
q5	7702	2296	2104	2104
q6	238	183	139	139
q7	985	789	637	637
q8	9353	1688	1604	1604
q9	5205	4960	4958	4958
q10	6386	2224	1878	1878
q11	431	277	246	246
q12	632	424	296	296
q13	18118	3431	2746	2746
q14	271	261	246	246
q15	q16	818	776	709	709
q17	938	975	971	971
q18	6999	5792	5530	5530
q19	1345	1356	1236	1236
q20	547	435	314	314
q21	6193	2878	2595	2595
q22	460	372	332	332
Total cold run time: 99906 ms
Total hot run time: 31896 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	5100	4974	4963	4963
q2	q3	4921	5377	4559	4559
q4	2134	2209	1365	1365
q5	4980	4727	4744	4727
q6	229	185	135	135
q7	1877	1739	1636	1636
q8	2476	2210	2207	2207
q9	7879	7450	7404	7404
q10	4786	4692	4291	4291
q11	531	404	382	382
q12	714	736	538	538
q13	3084	3370	2829	2829
q14	279	272	265	265
q15	q16	685	705	613	613
q17	1288	1259	1255	1255
q18	7314	6899	6849	6849
q19	1101	1100	1086	1086
q20	2236	2228	1963	1963
q21	5301	4731	4580	4580
q22	538	492	411	411
Total cold run time: 57453 ms
Total hot run time: 52058 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-DS: Total hot run time: 173476 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 3b09c7607eaad3d8e7fa7bc922df315d9faa71e1, data reload: false

query5	4352	689	509	509
query6	338	237	200	200
query7	4220	593	317	317
query8	335	238	224	224
query9	8846	4082	4096	4082
query10	458	355	298	298
query11	5806	2638	2222	2222
query12	193	129	130	129
query13	1315	608	430	430
query14	6279	5541	5252	5252
query14_1	4563	4590	4520	4520
query15	216	206	180	180
query16	994	460	381	381
query17	1062	740	599	599
query18	2521	496	360	360
query19	216	205	159	159
query20	138	143	129	129
query21	217	148	122	122
query22	13746	13681	13427	13427
query23	17360	16700	16258	16258
query23_1	16372	16381	16420	16381
query24	7495	1842	1349	1349
query24_1	1346	1347	1338	1338
query25	592	533	454	454
query26	1309	327	180	180
query27	2677	556	361	361
query28	4474	2045	1997	1997
query29	1039	623	522	522
query30	305	248	196	196
query31	1151	1090	968	968
query32	87	76	77	76
query33	564	383	293	293
query34	1183	1176	653	653
query35	823	832	715	715
query36	1374	1420	1202	1202
query37	159	106	91	91
query38	3259	3242	3145	3145
query39	926	903	919	903
query39_1	882	868	866	866
query40	232	146	126	126
query41	63	63	62	62
query42	116	110	111	110
query43	343	350	307	307
query44	
query45	218	207	195	195
query46	1104	1243	775	775
query47	2339	2341	2215	2215
query48	418	433	310	310
query49	665	518	403	403
query50	955	362	263	263
query51	4404	4356	4308	4308
query52	112	117	101	101
query53	254	285	208	208
query54	317	275	262	262
query55	95	91	86	86
query56	306	322	305	305
query57	1430	1407	1287	1287
query58	304	280	279	279
query59	1679	1738	1514	1514
query60	329	328	314	314
query61	163	160	184	160
query62	709	679	579	579
query63	246	203	212	203
query64	2446	816	648	648
query65	
query66	1750	520	365	365
query67	29217	29795	29669	29669
query68	
query69	478	345	313	313
query70	1053	1074	1030	1030
query71	307	279	274	274
query72	3003	2773	2554	2554
query73	836	832	460	460
query74	5139	4985	4903	4903
query75	2816	2669	2296	2296
query76	2276	1224	804	804
query77	431	437	358	358
query78	12472	12517	11910	11910
query79	1319	1028	752	752
query80	630	601	488	488
query81	457	291	243	243
query82	245	162	126	126
query83	295	289	263	263
query84	273	148	121	121
query85	936	648	548	548
query86	391	335	334	334
query87	3469	3440	3295	3295
query88	3721	2780	2746	2746
query89	432	402	346	346
query90	2167	194	202	194
query91	182	170	143	143
query92	83	83	82	82
query93	1406	1442	856	856
query94	556	346	290	290
query95	678	395	353	353
query96	1112	801	362	362
query97	2711	2751	2576	2576
query98	238	233	228	228
query99	1158	1143	1033	1033
Total cold run time: 253645 ms
Total hot run time: 173476 ms

@hello-stephen

Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 0.00% (0/1) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.99% (20982/38863)
Line Coverage 37.55% (198950/529853)
Region Coverage 33.83% (155935/460884)
Branch Coverage 34.84% (67915/194936)

@liaoxin01 liaoxin01 force-pushed the fix-cir-20393-master branch 2 times, most recently from 1568a28 to 989d36e Compare May 29, 2026 06:57
@liaoxin01 liaoxin01 marked this pull request as draft May 30, 2026 03:08
@liaoxin01 liaoxin01 force-pushed the fix-cir-20393-master branch from 989d36e to 4e2557f Compare June 11, 2026 07:54
@liaoxin01 liaoxin01 changed the title [fix](load) avoid query scanner updating load counters [fix](load) only load source scanners update load counters Jun 11, 2026
For DELETE/UPDATE/INSERT INTO ... SELECT executed through the insert
path, rows filtered by query scan predicates (including runtime
filters) were added to RuntimeState load counters. When all scanned
rows are filtered, num_rows_load_success() (total - filtered -
unselected) goes negative and FE fails the insert_max_filter_ratio
check with errors like:

    Insert has too many filtered data 0/-2 insert_max_filter_ratio is 1.000000

Fix by gating the counter update in Scanner::_collect_profile_before_close()
on a new virtual _should_update_load_counters():
- Base Scanner only reports when _is_load (classic stream/broker/routine
  load scanners with src tuple desc).
- FileScanner additionally reports for FILE_STREAM scans, because TVF
  based loads (http_stream, group commit) plan the load source as a tvf
  query scan without src tuple desc, but their WHERE clause filtered
  rows must still be reported as NumberUnselectedRows. FILE_STREAM is
  only reachable from such load entries, never from normal queries.
@liaoxin01 liaoxin01 force-pushed the fix-cir-20393-master branch from 4e2557f to b4f208d Compare June 11, 2026 08:00
@liaoxin01

Copy link
Copy Markdown
Contributor Author

/review

@liaoxin01

Copy link
Copy Markdown
Contributor Author

run buildall

@liaoxin01 liaoxin01 marked this pull request as ready for review June 11, 2026 08:01

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found one regression-test convention issue that should be fixed before merging.

Critical checkpoint conclusions:

  • Goal/test proof: The scanner counter change addresses the negative load-success accounting for INSERT/DELETE/UPDATE scan predicates, and the new regression covers the profile-enabled INSERT plus no-op DELETE/UPDATE path. Existing http_stream/group_commit tests cover the FILE_STREAM unselected-row branch.
  • Scope: The BE change is small and focused around close-time scanner counter accounting.
  • Concurrency/lifecycle: Scanner close can run from scanner worker threads, but the changed RuntimeState counter updates are atomic and the new predicate reads scanner-local scan metadata; I did not find a new locking or lifecycle issue.
  • Compatibility/config/storage/transactions: No new config, storage format, FE-BE protocol field, or persistence path is introduced.
  • Parallel paths: FileScanner keeps the FILE_STREAM load-source exception while the base Scanner blocks ordinary query scanners from polluting load counters.
  • Test coverage/results: Functional coverage is present, but the new regression drops tables at the end, which violates the Doris regression-test standard to preserve state for debugging.
  • Observability/performance: No additional observability is needed for this counter gate, and the added check is constant-time at scanner close.

User focus: no additional user-provided review focus was supplied.

@liaoxin01

Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-H: Total hot run time: 29794 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit b4f208d2ee451c83786597f3748b6e7a1b8f6b29, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17624	4312	4238	4238
q2	q3	10736	1426	862	862
q4	4694	487	356	356
q5	7582	872	593	593
q6	183	184	146	146
q7	785	828	620	620
q8	9613	1531	1618	1531
q9	6622	4528	4521	4521
q10	6820	1872	1547	1547
q11	438	275	253	253
q12	667	441	300	300
q13	18156	3567	2751	2751
q14	282	277	258	258
q15	q16	833	786	720	720
q17	966	901	977	901
q18	7070	5899	5745	5745
q19	1285	1227	1109	1109
q20	510	401	255	255
q21	6186	2803	2742	2742
q22	471	390	346	346
Total cold run time: 101523 ms
Total hot run time: 29794 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	5195	5116	5091	5091
q2	q3	5059	5360	4695	4695
q4	2413	2489	1618	1618
q5	5061	5175	4991	4991
q6	262	187	135	135
q7	2036	1882	1734	1734
q8	2793	2398	2126	2126
q9	7675	7719	7670	7670
q10	4945	4814	4318	4318
q11	594	437	395	395
q12	816	796	587	587
q13	3058	3396	2851	2851
q14	276	293	270	270
q15	q16	708	738	651	651
q17	1332	1302	1295	1295
q18	7754	7160	7074	7074
q19	1096	1096	1105	1096
q20	2282	2273	1998	1998
q21	5731	4941	4743	4743
q22	546	494	447	447
Total cold run time: 59632 ms
Total hot run time: 53785 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-DS: Total hot run time: 171572 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit b4f208d2ee451c83786597f3748b6e7a1b8f6b29, data reload: false

query5	4325	649	511	511
query6	436	193	172	172
query7	4815	517	309	309
query8	368	214	205	205
query9	8756	4094	4077	4077
query10	450	313	262	262
query11	5927	2405	2205	2205
query12	163	101	99	99
query13	1260	620	420	420
query14	6526	5886	5399	5399
query14_1	4752	4731	4685	4685
query15	207	205	180	180
query16	1049	480	430	430
query17	1091	701	563	563
query18	2426	470	336	336
query19	193	178	137	137
query20	110	110	104	104
query21	211	141	115	115
query22	13570	13713	13418	13418
query23	17580	16639	16237	16237
query23_1	16319	16470	16372	16372
query24	7573	1779	1335	1335
query24_1	1332	1329	1368	1329
query25	607	459	392	392
query26	1319	332	167	167
query27	2713	562	346	346
query28	4469	2038	2027	2027
query29	1070	664	492	492
query30	319	240	205	205
query31	1134	1101	979	979
query32	101	64	68	64
query33	568	342	258	258
query34	1166	1147	654	654
query35	818	835	739	739
query36	1372	1404	1215	1215
query37	154	114	93	93
query38	3245	3173	3095	3095
query39	933	922	893	893
query39_1	882	881	877	877
query40	232	125	108	108
query41	70	66	69	66
query42	99	97	95	95
query43	344	361	299	299
query44	
query45	203	191	187	187
query46	1108	1195	760	760
query47	2397	2374	2304	2304
query48	408	423	285	285
query49	622	476	358	358
query50	1046	348	251	251
query51	4418	4362	4209	4209
query52	86	88	77	77
query53	241	279	190	190
query54	260	215	192	192
query55	80	78	68	68
query56	235	224	208	208
query57	1420	1397	1326	1326
query58	261	219	218	218
query59	1599	1695	1467	1467
query60	283	238	228	228
query61	155	152	142	142
query62	705	652	591	591
query63	242	183	184	183
query64	2524	763	594	594
query65	
query66	1775	446	347	347
query67	29745	29860	29600	29600
query68	
query69	416	313	268	268
query70	980	933	963	933
query71	292	217	205	205
query72	2959	2634	2331	2331
query73	852	761	439	439
query74	5115	4998	4810	4810
query75	2684	2578	2221	2221
query76	2295	1238	788	788
query77	370	397	289	289
query78	12539	12425	11892	11892
query79	1424	1028	737	737
query80	578	473	392	392
query81	460	299	246	246
query82	579	163	129	129
query83	359	281	251	251
query84	
query85	834	522	444	444
query86	373	318	298	298
query87	3425	3372	3226	3226
query88	3661	2750	2760	2750
query89	434	392	333	333
query90	1841	181	187	181
query91	173	159	136	136
query92	67	61	60	60
query93	1488	1470	860	860
query94	542	353	308	308
query95	663	388	350	350
query96	1080	847	346	346
query97	2756	2710	2591	2591
query98	218	208	210	208
query99	1165	1183	1014	1014
Total cold run time: 251936 ms
Total hot run time: 171572 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-H: Total hot run time: 29485 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit be4389bbea27cfc015db33774d14c179d3f1eea9, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17616	4130	4122	4122
q2	q3	10784	1388	832	832
q4	4676	475	350	350
q5	7536	911	568	568
q6	182	176	141	141
q7	771	844	622	622
q8	9330	1526	1486	1486
q9	5891	4531	4503	4503
q10	6773	1808	1519	1519
q11	425	269	249	249
q12	632	430	290	290
q13	18110	3383	2783	2783
q14	270	257	239	239
q15	q16	826	767	706	706
q17	898	952	974	952
q18	6851	5769	5647	5647
q19	1346	1270	1169	1169
q20	507	403	255	255
q21	6355	2809	2747	2747
q22	464	367	305	305
Total cold run time: 100243 ms
Total hot run time: 29485 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	5153	4735	4716	4716
q2	q3	4881	5173	4738	4738
q4	2122	2187	1401	1401
q5	4828	4800	4713	4713
q6	245	183	129	129
q7	1900	1705	1538	1538
q8	2471	2092	2073	2073
q9	8012	7810	7420	7420
q10	4722	4687	4217	4217
q11	536	386	348	348
q12	723	794	528	528
q13	2981	3361	2828	2828
q14	281	293	246	246
q15	q16	674	696	613	613
q17	1264	1253	1263	1253
q18	7246	6939	6641	6641
q19	1142	1072	1104	1072
q20	2226	2204	1985	1985
q21	5237	4550	4384	4384
q22	523	442	397	397
Total cold run time: 57167 ms
Total hot run time: 51240 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-DS: Total hot run time: 168341 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit be4389bbea27cfc015db33774d14c179d3f1eea9, data reload: false

query5	4331	599	486	486
query6	447	188	194	188
query7	4828	533	299	299
query8	362	208	188	188
query9	8718	3994	3999	3994
query10	443	320	266	266
query11	5904	2343	2138	2138
query12	158	102	98	98
query13	1275	583	436	436
query14	6336	5389	5001	5001
query14_1	4335	4314	4336	4314
query15	204	194	173	173
query16	1020	443	427	427
query17	1112	677	549	549
query18	2442	452	326	326
query19	190	179	136	136
query20	110	103	104	103
query21	218	134	115	115
query22	13528	13573	13445	13445
query23	17361	16435	16050	16050
query23_1	16278	16313	16204	16204
query24	7508	1754	1303	1303
query24_1	1312	1302	1308	1302
query25	536	426	379	379
query26	1294	314	174	174
query27	2678	563	345	345
query28	4433	2038	2006	2006
query29	1091	615	500	500
query30	310	242	198	198
query31	1132	1092	960	960
query32	115	64	60	60
query33	553	346	242	242
query34	1164	1128	658	658
query35	753	769	692	692
query36	1406	1386	1252	1252
query37	150	96	85	85
query38	3217	3136	3041	3041
query39	934	907	896	896
query39_1	866	861	850	850
query40	218	118	98	98
query41	63	60	60	60
query42	93	93	93	93
query43	320	323	284	284
query44	
query45	197	181	176	176
query46	1033	1184	758	758
query47	2330	2371	2256	2256
query48	391	418	289	289
query49	617	462	352	352
query50	976	349	247	247
query51	4317	4233	4258	4233
query52	88	90	79	79
query53	242	269	189	189
query54	287	212	190	190
query55	76	73	68	68
query56	232	228	203	203
query57	1436	1408	1333	1333
query58	249	211	213	211
query59	1547	1627	1414	1414
query60	269	243	231	231
query61	152	151	149	149
query62	702	643	593	593
query63	220	187	186	186
query64	2533	763	626	626
query65	
query66	1802	457	344	344
query67	29677	29704	29420	29420
query68	
query69	429	299	259	259
query70	951	935	961	935
query71	307	219	208	208
query72	2958	2593	2295	2295
query73	882	829	469	469
query74	5177	4988	4742	4742
query75	2618	2563	2229	2229
query76	2348	1133	768	768
query77	341	392	283	283
query78	12391	12417	11959	11959
query79	1255	1051	768	768
query80	570	503	417	417
query81	452	284	243	243
query82	240	160	121	121
query83	276	283	253	253
query84	
query85	837	492	408	408
query86	318	290	275	275
query87	3339	3336	3173	3173
query88	3586	2738	2723	2723
query89	413	374	330	330
query90	2179	180	174	174
query91	167	163	135	135
query92	59	62	57	57
query93	1477	1438	853	853
query94	529	349	267	267
query95	668	460	332	332
query96	1008	812	387	387
query97	2668	2680	2552	2552
query98	209	204	197	197
query99	1155	1182	1013	1013
Total cold run time: 249323 ms
Total hot run time: 168341 ms

@hello-stephen

Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 0.00% (0/12) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.91% (21116/39171)
Line Coverage 37.63% (201361/535150)
Region Coverage 33.63% (157776/469128)
Branch Coverage 34.71% (69155/199223)

@hello-stephen

Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (12/12) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.88% (28264/38258)
Line Coverage 57.83% (307675/532031)
Region Coverage 54.66% (257590/471295)
Branch Coverage 56.02% (111752/199486)

@liaoxin01

Copy link
Copy Markdown
Contributor Author

/review

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found two regression-artifact issues that should be fixed before merging.

Checkpoint conclusions:

  • Goal/test: the C++ change addresses the intended load-counter pollution path, and the new suite covers INSERT SELECT, profile-enabled INSERT, DELETE, and UPDATE no-op cases. The test artifacts need cleanup to satisfy repository standards.
  • Scope/focus: the load-counter gate itself is small and focused. No additional user focus points were provided.
  • Concurrency/lifecycle: no new concurrency path or lifecycle ownership issue is introduced by this PR's reviewed changes; the existing RuntimeState counters remain atomic and are still updated during scanner close.
  • Config/compatibility/persistence: no new config, thrift field, storage format, or edit-log compatibility concern found.
  • Parallel paths: base Scanner now suppresses non-load scanners, while FileScanner preserves FILE_STREAM TVF load reporting; I did not find another scanner path that requires the same special case.
  • Test standards/results: the new regression suite should hardcode simple table names instead of using variables, and the generated .out currently fails git diff --check due to a trailing blank line.
  • Observability/performance: no additional observability or performance issue found for this narrow counter-gating change.

Review only; I did not run the regression suite in this runner.

@sollhui sollhui left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions Bot added the approved Indicates a PR has been approved by one committer. label Jun 25, 2026
@github-actions

Copy link
Copy Markdown
Contributor

PR approved by at least one committer and no changes requested.

@liaoxin01 liaoxin01 merged commit a786cba into apache:master Jun 25, 2026
31 of 32 checks passed
@liaoxin01 liaoxin01 deleted the fix-cir-20393-master branch June 25, 2026 08:02
Gabriel39 added a commit to Gabriel39/incubator-doris that referenced this pull request Jun 26, 2026
### What problem does this PR solve?

Issue Number: None

Related PR: apache#63781, apache#64671

Problem Summary: File scanner v2 did not carry the same fixes as the existing file scanner path. Predicate rows filtered inside v2 file readers were still reported through scanner load counters unless the scanner was a real load source, and Hive TEXTFILE empty physical lines were still skipped unless read_csv_empty_line_as_null was enabled. This change gates v2 load counter reporting with the same FILE_STREAM exception used by FileScanner and adds a delimited text hook so Hive Text v2 treats empty physical lines as records while CSV keeps the old default behavior.

### Release note

Fix file scanner v2 load counter reporting and Hive TEXTFILE empty-line handling.

### Check List (For Author)

- Test: Unit Test / Manual test
    - Added TextV2ReaderTest coverage for Hive TEXTFILE empty line records, single-column empty string fields, and COUNT pushdown.
    - Ran git diff --check.
    - Ran clang-format v16 through build-support/run_clang_format.py for changed files.
    - Attempted ./run-be-ut.sh --run --filter='TextV2ReaderTest.*:FileScannerV2Test.*', but the local run was blocked because the script needed to update/download datasketches-cpp and network access was unavailable; no BE UT binary was already built.
    - Attempted clang-tidy with the available compile_commands.json, but it pointed at a stale /mnt/disk3/gabriel path; the project clang-tidy wrapper also requires bash 4+ while only system bash is available.
- Behavior changed: Yes. File scanner v2 now matches v1 load counter gating and Hive TEXTFILE empty-line semantics.
- Does this need documentation: No
Gabriel39 added a commit that referenced this pull request Jun 26, 2026
### What problem does this PR solve?

Issue Number: None

Related PR: #63781, #64671

Problem Summary: File scanner v2 did not carry the same fixes as the
existing file scanner path. Predicate rows filtered inside v2 file
readers were still reported through scanner load counters unless the
scanner was a real load source, and Hive TEXTFILE empty physical lines
were still skipped unless read_csv_empty_line_as_null was enabled. This
change gates v2 load counter reporting with the same FILE_STREAM
exception used by FileScanner and adds a delimited text hook so Hive
Text v2 treats empty physical lines as records while CSV keeps the old
default behavior.

### Release note

Fix file scanner v2 load counter reporting and Hive TEXTFILE empty-line
handling.

### Check List (For Author)

- Test: Unit Test / Manual test
- Added TextV2ReaderTest coverage for Hive TEXTFILE empty line records,
single-column empty string fields, and COUNT pushdown.
    - Ran git diff --check.
- Ran clang-format v16 through build-support/run_clang_format.py for
changed files.
- Attempted ./run-be-ut.sh --run
--filter='TextV2ReaderTest.*:FileScannerV2Test.*', but the local run was
blocked because the script needed to update/download datasketches-cpp
and network access was unavailable; no BE UT binary was already built.
- Attempted clang-tidy with the available compile_commands.json, but it
pointed at a stale /mnt/disk3/gabriel path; the project clang-tidy
wrapper also requires bash 4+ while only system bash is available.
- Behavior changed: Yes. File scanner v2 now matches v1 load counter
gating and Hive TEXTFILE empty-line semantics.
- Does this need documentation: No

### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

### Release note

None

### Check List (For Author)

- Test <!-- At least one of them must be included. -->
    - [ ] Regression test
    - [ ] Unit Test
    - [ ] Manual test (add detailed scripts or steps below)
    - [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
        - [ ] Previous test can cover this change.
        - [ ] No code files have been changed.
        - [ ] Other reason <!-- Add your reason?  -->

- Behavior changed:
    - [ ] No.
    - [ ] Yes. <!-- Explain the behavior change -->

- Does this need documentation?
    - [ ] No.
- [ ] Yes. <!-- Add document PR link here. eg:
apache/doris-website#1214 -->

### Check List (For Reviewer who merge this PR)

- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR
should merge into -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants