Skip to content

[fix](serde) fix split_by_delimiter missing backslash escape handling#61995

Merged
eldenmoon merged 1 commit into
apache:masterfrom
csun5285:fix-map-serde-escape
Apr 29, 2026
Merged

[fix](serde) fix split_by_delimiter missing backslash escape handling#61995
eldenmoon merged 1 commit into
apache:masterfrom
csun5285:fix-map-serde-escape

Conversation

@csun5285

@csun5285 csun5285 commented Apr 1, 2026

Copy link
Copy Markdown
Contributor

Bug

Stream Load (JSON format) writing to Map<String, String> columns silently sets the column to NULL when the map value contains both \" (escaped quotes) and : or ,. Stream Load returns Success with FilteredRows=0 — data loss with no error indication.

Fix

Added \ escape skip in split_by_delimiter's character loop, before quote detection — consistent with the existing escape handling in deserialize_one_cell_from_json.

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@Thearas

Thearas commented Apr 1, 2026

Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@csun5285

csun5285 commented Apr 1, 2026

Copy link
Copy Markdown
Contributor Author

run buildall

@doris-robot

Copy link
Copy Markdown
TPC-H: Total hot run time: 29317 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 493abfc60486787a1cf5b7506654e9ca2344d28e, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17650	3883	3721	3721
q2	q3	10673	864	600	600
q4	4677	465	379	379
q5	7461	1318	1139	1139
q6	194	168	143	143
q7	914	939	797	797
q8	9313	1444	1315	1315
q9	5532	5277	5309	5277
q10	6263	2046	1773	1773
q11	480	281	276	276
q12	812	694	509	509
q13	18063	2805	2164	2164
q14	281	280	261	261
q15	q16	855	865	787	787
q17	1007	1117	817	817
q18	6469	5618	5501	5501
q19	1141	1226	1070	1070
q20	588	557	415	415
q21	4377	2447	2018	2018
q22	480	397	355	355
Total cold run time: 97230 ms
Total hot run time: 29317 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4547	4476	4755	4476
q2	q3	4663	4836	4217	4217
q4	2119	2193	1344	1344
q5	4916	5000	5184	5000
q6	198	166	138	138
q7	1998	1740	1633	1633
q8	3291	3066	3055	3055
q9	8267	8479	8680	8479
q10	4502	4430	4318	4318
q11	570	397	373	373
q12	652	804	484	484
q13	2669	3169	2365	2365
q14	295	299	278	278
q15	q16	762	811	696	696
q17	1275	1332	1240	1240
q18	7915	7043	7094	7043
q19	1132	1175	1111	1111
q20	2205	2180	1914	1914
q21	6027	5305	4929	4929
q22	560	517	449	449
Total cold run time: 58563 ms
Total hot run time: 53542 ms

@doris-robot

Copy link
Copy Markdown
TPC-DS: Total hot run time: 179264 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 493abfc60486787a1cf5b7506654e9ca2344d28e, data reload: false

query5	4333	650	503	503
query6	341	270	205	205
query7	4231	576	344	344
query8	330	244	219	219
query9	8762	3898	3856	3856
query10	500	419	374	374
query11	6647	5485	5111	5111
query12	185	130	133	130
query13	1292	587	446	446
query14	5676	5153	4746	4746
query14_1	4082	4066	4051	4051
query15	210	195	174	174
query16	1007	462	343	343
query17	925	739	630	630
query18	2430	494	358	358
query19	249	223	189	189
query20	137	133	128	128
query21	212	144	122	122
query22	13909	14966	14357	14357
query23	18011	17225	16723	16723
query23_1	16769	17014	16735	16735
query24	7467	1730	1348	1348
query24_1	1332	1351	1360	1351
query25	562	485	435	435
query26	1265	308	172	172
query27	2729	624	363	363
query28	4494	1865	1842	1842
query29	936	662	546	546
query30	299	230	187	187
query31	1102	1049	931	931
query32	89	71	66	66
query33	517	340	291	291
query34	1203	1154	673	673
query35	730	771	655	655
query36	1232	1236	1076	1076
query37	150	94	84	84
query38	3110	3034	2966	2966
query39	908	892	863	863
query39_1	844	841	834	834
query40	235	157	140	140
query41	62	65	58	58
query42	277	275	269	269
query43	311	317	277	277
query44	
query45	214	193	190	190
query46	1155	1293	770	770
query47	2356	2355	2250	2250
query48	362	450	298	298
query49	643	527	434	434
query50	708	286	224	224
query51	4377	4270	4203	4203
query52	280	284	269	269
query53	328	347	270	270
query54	322	298	266	266
query55	99	94	88	88
query56	327	344	319	319
query57	1729	1700	1569	1569
query58	300	272	276	272
query59	2894	2989	2742	2742
query60	339	338	330	330
query61	159	150	160	150
query62	664	624	569	569
query63	318	277	276	276
query64	5322	1453	1085	1085
query65	
query66	1472	468	364	364
query67	24319	24274	24108	24108
query68	
query69	459	334	316	316
query70	955	983	969	969
query71	374	333	309	309
query72	3136	2820	2441	2441
query73	828	825	427	427
query74	9889	9756	9590	9590
query75	3570	3361	3025	3025
query76	2295	1136	760	760
query77	401	411	340	340
query78	11333	11288	10765	10765
query79	1553	1030	829	829
query80	820	772	658	658
query81	456	281	235	235
query82	1390	154	123	123
query83	376	288	261	261
query84	309	147	118	118
query85	883	532	456	456
query86	390	339	317	317
query87	3316	3188	3087	3087
query88	3546	2692	2678	2678
query89	470	410	374	374
query90	1972	186	178	178
query91	174	166	136	136
query92	80	74	69	69
query93	897	892	508	508
query94	532	339	289	289
query95	656	379	422	379
query96	1020	777	339	339
query97	2715	2663	2574	2574
query98	239	236	224	224
query99	1068	1075	974	974
Total cold run time: 258388 ms
Total hot run time: 179264 ms

@hello-stephen

Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.93% (20043/37869)
Line Coverage 36.51% (188117/515243)
Region Coverage 32.71% (145749/445629)
Branch Coverage 33.92% (63933/188472)

@hello-stephen

Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100% (0/0) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.90% (26665/37088)
Line Coverage 54.80% (281479/513686)
Region Coverage 51.80% (232969/449736)
Branch Coverage 53.27% (100694/189028)

split_by_delimiter in complex_type_deserialize_util.h did not handle '\'
escape characters. When Stream Load JSON writes Map<String,String> columns,
the JSON reader stores the map as a raw JSON string (via to_json_string),
which is later parsed back by from_string → split_by_delimiter. Without
escape handling, '\"' inside a value flips the quote tracker, causing
subsequent ':' or ',' to be mis-identified as delimiters. This leads to
"Map does not have even number of key-value pairs" error, which is silently
swallowed by the nullable serde and results in the Map column being NULL.

Also fix test_jsonb_cast and test_jsonb_with_unescaped_string test data:
the CSV input 'foo\\'bar' was ambiguous under escape rules (\\' means
escaped backslash followed by quote-close, leaving 'bar' dangling).
Changed to 'foo\'bar' and updated .out files to reflect correct parsing
with backslash escape handling.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@csun5285 csun5285 force-pushed the fix-map-serde-escape branch from 493abfc to e4dfb2c Compare April 2, 2026 11:11
@csun5285

csun5285 commented Apr 2, 2026

Copy link
Copy Markdown
Contributor Author

run buildall

@csun5285 csun5285 closed this Apr 2, 2026
@csun5285 csun5285 reopened this Apr 2, 2026
@doris-robot

Copy link
Copy Markdown

BE UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.93% (20052/37884)
Line Coverage 36.54% (188361/515493)
Region Coverage 32.81% (146296/445860)
Branch Coverage 33.96% (64054/188606)

@doris-robot

Copy link
Copy Markdown
TPC-H: Total hot run time: 29264 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit e4dfb2c0ecf9e809382213d50baf4ccb7c68d68f, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17649	3754	3796	3754
q2	q3	10686	849	623	623
q4	4685	493	368	368
q5	7476	1332	1156	1156
q6	181	164	135	135
q7	903	949	769	769
q8	9307	1400	1315	1315
q9	5511	5282	5234	5234
q10	6243	2022	1797	1797
q11	466	275	290	275
q12	864	690	516	516
q13	18016	2800	2179	2179
q14	282	283	260	260
q15	q16	858	870	776	776
q17	954	1168	723	723
q18	6368	5685	5535	5535
q19	1130	1267	1036	1036
q20	622	539	399	399
q21	4512	2471	2041	2041
q22	503	432	373	373
Total cold run time: 97216 ms
Total hot run time: 29264 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4698	4430	4587	4430
q2	q3	4623	4833	4149	4149
q4	2002	2166	1355	1355
q5	4929	4995	5174	4995
q6	198	164	136	136
q7	2063	1790	1697	1697
q8	3290	3074	3059	3059
q9	8342	8257	8296	8257
q10	4477	4461	4255	4255
q11	578	418	380	380
q12	654	745	490	490
q13	2736	3420	2381	2381
q14	299	306	263	263
q15	q16	801	796	685	685
q17	1304	1315	1169	1169
q18	7894	7396	7135	7135
q19	1146	1156	1120	1120
q20	2218	2184	1937	1937
q21	6010	5422	4794	4794
q22	547	496	424	424
Total cold run time: 58809 ms
Total hot run time: 53111 ms

@doris-robot

Copy link
Copy Markdown
TPC-DS: Total hot run time: 180400 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit e4dfb2c0ecf9e809382213d50baf4ccb7c68d68f, data reload: false

query5	4326	654	502	502
query6	358	235	214	214
query7	4225	616	340	340
query8	335	250	226	226
query9	8721	3917	3956	3917
query10	511	409	351	351
query11	6653	5530	5154	5154
query12	188	136	129	129
query13	1285	622	457	457
query14	5745	5257	4827	4827
query14_1	4165	4135	4122	4122
query15	216	207	183	183
query16	994	496	457	457
query17	980	814	668	668
query18	2467	521	386	386
query19	250	237	196	196
query20	145	132	132	132
query21	229	152	122	122
query22	14164	14960	14559	14559
query23	17999	17515	16775	16775
query23_1	16794	16778	16647	16647
query24	7459	1780	1363	1363
query24_1	1345	1362	1370	1362
query25	623	493	430	430
query26	1251	332	184	184
query27	2676	652	395	395
query28	4470	1883	1898	1883
query29	957	703	558	558
query30	291	241	194	194
query31	1096	1059	946	946
query32	86	72	73	72
query33	544	352	296	296
query34	1195	1153	658	658
query35	764	790	675	675
query36	1220	1213	1062	1062
query37	152	98	85	85
query38	3106	3061	2982	2982
query39	922	907	841	841
query39_1	849	831	850	831
query40	237	159	142	142
query41	63	60	60	60
query42	280	271	273	271
query43	321	324	283	283
query44	
query45	206	200	188	188
query46	1162	1260	809	809
query47	2359	2365	2207	2207
query48	414	427	347	347
query49	640	545	419	419
query50	752	286	221	221
query51	4347	4237	4191	4191
query52	290	282	270	270
query53	334	363	279	279
query54	333	294	282	282
query55	97	92	89	89
query56	331	334	325	325
query57	1702	1619	1533	1533
query58	291	282	268	268
query59	2917	2982	2759	2759
query60	340	340	329	329
query61	156	146	148	146
query62	685	623	551	551
query63	320	269	269	269
query64	5205	1408	1098	1098
query65	
query66	1515	492	371	371
query67	24412	24351	24209	24209
query68	
query69	439	344	316	316
query70	1047	982	962	962
query71	378	327	310	310
query72	3024	2692	2501	2501
query73	809	781	444	444
query74	9890	9743	9563	9563
query75	3635	3371	3017	3017
query76	2278	1197	803	803
query77	412	423	353	353
query78	11455	11310	10772	10772
query79	1340	1102	828	828
query80	840	785	677	677
query81	446	286	243	243
query82	240	165	126	126
query83	291	286	261	261
query84	294	149	114	114
query85	841	530	468	468
query86	361	320	337	320
query87	3251	3191	3080	3080
query88	3646	2773	2746	2746
query89	471	417	372	372
query90	2201	183	176	176
query91	175	172	141	141
query92	79	77	68	68
query93	917	923	517	517
query94	522	365	300	300
query95	668	471	343	343
query96	1033	822	351	351
query97	2688	2686	2579	2579
query98	246	229	228	228
query99	1067	1070	964	964
Total cold run time: 257655 ms
Total hot run time: 180400 ms

@csun5285

csun5285 commented Apr 3, 2026

Copy link
Copy Markdown
Contributor Author

run buildall

@doris-robot

Copy link
Copy Markdown
TPC-H: Total hot run time: 29194 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit e4dfb2c0ecf9e809382213d50baf4ccb7c68d68f, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17596	3703	3678	3678
q2	q3	10694	882	594	594
q4	4672	460	365	365
q5	7439	1321	1142	1142
q6	182	162	134	134
q7	909	941	792	792
q8	9325	1400	1325	1325
q9	5542	5272	5227	5227
q10	6301	2007	1767	1767
q11	474	277	275	275
q12	828	687	518	518
q13	18057	2767	2160	2160
q14	285	283	259	259
q15	q16	903	836	779	779
q17	1004	1016	880	880
q18	6476	5616	5501	5501
q19	1294	1270	1081	1081
q20	595	553	397	397
q21	5046	2408	1951	1951
q22	466	460	369	369
Total cold run time: 98088 ms
Total hot run time: 29194 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4486	4256	4339	4256
q2	q3	4628	4713	4161	4161
q4	1989	2093	1348	1348
q5	4895	4992	5155	4992
q6	193	172	137	137
q7	1991	1779	1661	1661
q8	3537	3094	3115	3094
q9	8156	8212	8298	8212
q10	4472	4527	4233	4233
q11	589	405	373	373
q12	669	812	612	612
q13	2653	3085	2409	2409
q14	316	335	287	287
q15	q16	779	798	705	705
q17	1300	1269	1212	1212
q18	7729	6927	6871	6871
q19	1159	1158	1110	1110
q20	2237	2233	1957	1957
q21	5909	5367	4969	4969
q22	545	507	418	418
Total cold run time: 58232 ms
Total hot run time: 53017 ms

@doris-robot

Copy link
Copy Markdown
TPC-DS: Total hot run time: 181088 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit e4dfb2c0ecf9e809382213d50baf4ccb7c68d68f, data reload: false

query5	4361	685	510	510
query6	330	237	211	211
query7	4240	575	334	334
query8	329	240	226	226
query9	8774	3966	3971	3966
query10	505	411	366	366
query11	6829	5508	5124	5124
query12	208	137	129	129
query13	1291	614	435	435
query14	5692	5237	4812	4812
query14_1	4292	4203	4156	4156
query15	215	206	186	186
query16	1016	464	435	435
query17	1150	771	644	644
query18	2726	504	389	389
query19	258	235	199	199
query20	140	139	134	134
query21	220	145	125	125
query22	14126	14579	15141	14579
query23	17887	17440	16743	16743
query23_1	16800	16889	17316	16889
query24	8011	1741	1405	1405
query24_1	1367	1373	1376	1373
query25	573	493	435	435
query26	1239	344	187	187
query27	2651	592	371	371
query28	4422	1911	1919	1911
query29	970	665	556	556
query30	297	243	201	201
query31	1096	1045	945	945
query32	95	69	69	69
query33	545	348	292	292
query34	1220	1156	696	696
query35	735	772	675	675
query36	1209	1243	1140	1140
query37	147	98	85	85
query38	3087	3079	3001	3001
query39	912	895	875	875
query39_1	825	846	843	843
query40	234	159	137	137
query41	61	59	58	58
query42	277	277	273	273
query43	320	315	288	288
query44	
query45	200	209	183	183
query46	1134	1289	814	814
query47	2354	2313	2249	2249
query48	411	411	299	299
query49	632	541	425	425
query50	705	292	219	219
query51	4408	4336	4245	4245
query52	282	290	270	270
query53	339	369	296	296
query54	327	294	284	284
query55	105	97	89	89
query56	335	321	333	321
query57	1747	1637	1595	1595
query58	298	282	276	276
query59	2905	2991	2726	2726
query60	330	336	311	311
query61	156	157	157	157
query62	703	626	568	568
query63	313	271	267	267
query64	5256	1430	1091	1091
query65	
query66	1426	473	367	367
query67	24525	24388	24108	24108
query68	
query69	435	334	320	320
query70	1048	1003	1023	1003
query71	367	329	324	324
query72	2989	2738	2652	2652
query73	811	831	461	461
query74	9844	9780	9560	9560
query75	3576	3391	3012	3012
query76	2306	1166	758	758
query77	410	436	361	361
query78	11294	11390	10747	10747
query79	1485	1111	842	842
query80	1394	760	677	677
query81	504	277	238	238
query82	1316	158	123	123
query83	363	294	257	257
query84	264	149	115	115
query85	941	500	453	453
query86	427	344	324	324
query87	3298	3241	3074	3074
query88	3650	2716	2727	2716
query89	513	435	378	378
query90	1922	183	179	179
query91	177	170	137	137
query92	74	75	71	71
query93	889	909	522	522
query94	663	323	298	298
query95	668	449	331	331
query96	983	754	355	355
query97	2709	2667	2584	2584
query98	248	229	230	229
query99	1074	1075	1002	1002
Total cold run time: 260389 ms
Total hot run time: 181088 ms

@doris-robot

Copy link
Copy Markdown

BE UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.98% (20070/37885)
Line Coverage 36.57% (188529/515490)
Region Coverage 32.83% (146376/445843)
Branch Coverage 33.99% (64110/188605)

@hello-stephen

Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 77.78% (7/9) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.59% (27307/37105)
Line Coverage 57.21% (294009/513920)
Region Coverage 54.54% (245406/449982)
Branch Coverage 56.18% (106286/189187)

@hello-stephen

Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 75.00% (12/16) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.59% (27307/37105)
Line Coverage 57.21% (294009/513920)
Region Coverage 54.54% (245406/449982)
Branch Coverage 56.18% (106286/189187)

@hello-stephen

Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 93.13% (393/422) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.59% (27307/37105)
Line Coverage 57.21% (294002/513920)
Region Coverage 54.55% (245453/449982)
Branch Coverage 56.18% (106281/189187)

@hello-stephen

Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 93.13% (393/422) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.59% (27306/37105)
Line Coverage 57.20% (293952/513920)
Region Coverage 54.51% (245301/449982)
Branch Coverage 56.16% (106238/189187)

@hello-stephen

Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 93.13% (393/422) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.59% (27306/37105)
Line Coverage 57.20% (293968/513920)
Region Coverage 54.51% (245293/449982)
Branch Coverage 56.16% (106245/189187)

@eldenmoon eldenmoon left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@eldenmoon

Copy link
Copy Markdown
Member

/review

@github-actions github-actions Bot added the approved Indicates a PR has been approved by one committer. label Apr 27, 2026
@github-actions

Copy link
Copy Markdown
Contributor

PR approved by at least one committer and no changes requested.

@github-actions

Copy link
Copy Markdown
Contributor

PR approved by anyone and no changes requested.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found one blocking issue in the current patch.

  1. The code change does not appear to touch the Stream Load JSON execution path described in the PR body. When _is_load is true, NewJsonReader::_simdjson_write_data_to_column() still routes complex JSON values through deserialize_one_cell_from_json() (be/src/format/json/new_json_reader.cpp:1084-1121), and DataTypeMapSerDe::deserialize_one_cell_from_json() already has its own backslash-skip logic (be/src/core/data_type_serde/data_type_map_serde.cpp:244-245). The new split_by_delimiter() handling only affects from_string() callers such as string/JSONB-to-complex conversions, and the added unit test exercises only that path. So the current change does not yet prove, or appear to implement, the Stream Load JSON fix claimed by the PR description.

Critical checkpoints:

  • Goal / proof: Not yet satisfied. The patch demonstrates a from_string() fix, not the stated Stream Load JSON bug, and there is no test covering the real load path.
  • Scope / minimality: The code change is small, but it is currently applied to a different path than the reported bug.
  • Concurrency: Not applicable in the touched code.
  • Lifecycle / static initialization: Not applicable.
  • Configuration: No config changes.
  • Compatibility: No storage or protocol compatibility issue found in the touched lines.
  • Parallel code paths: This is the main problem. The JSON load path uses deserialize_one_cell_from_json(), while this patch changes only the shared from_string() splitter path.
  • Special conditional checks: The new escape check itself is straightforward.
  • Test coverage: Insufficient for the reported bug; an end-to-end reproducer on the actual JSON Stream Load path is still missing.
  • Observability: Not applicable.
  • Transaction / persistence: Not applicable.
  • Data writes / modifications: This is a user-visible parsing fix, so it needs validation on the real ingestion path before merge.
  • FE / BE variable passing: Not applicable.
  • Performance: No obvious concern.
  • Other issues: None beyond the path mismatch above.

User focus points: no additional focus points were provided.

for (int pos = 0; pos < str.size; ++pos) {
char c = str.data[pos];
if (c == '"' || c == '\'') {
if (c == '\\' && pos + 1 < static_cast<int>(str.size)) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This helper is only used by the *_serde::from_string() implementations. The Stream Load JSON path described in the PR body still goes through NewJsonReader::_simdjson_write_data_to_column() and then DataTypeMapSerDe::deserialize_one_cell_from_json() when _is_load is true (be/src/format/json/new_json_reader.cpp:1084-1121). That map JSON path already has its own \ handling at be/src/core/data_type_serde/data_type_map_serde.cpp:244-245, and it never calls split_by_delimiter(). So this change does not appear to fix the reported Stream Load regression; it fixes the from_string() path instead. Can you either add a reproducer on the real load path and patch that path if needed, or narrow the PR description/tests to the string or JSONB-to-map conversion flow that this code actually changes?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new_json_reader doesn't parse the input directly into a map — it parses it into a string, and then casts that string into a map. The fix is exactly on the path that casts a string into a map.

@eldenmoon eldenmoon merged commit 0c9fff3 into apache:master Apr 29, 2026
33 of 34 checks passed
zhaorongsheng pushed a commit to zhaorongsheng/doris that referenced this pull request Jun 4, 2026
…apache#61995)

### Bug

Stream Load (JSON format) writing to `Map<String, String>` columns
silently sets the column to NULL when the map value contains both `\"`
(escaped quotes) and `:` or `,`. Stream Load returns Success with
FilteredRows=0 — data loss with no error indication.

### Fix

Added `\` escape skip in `split_by_delimiter`'s character loop, before
quote detection — consistent with the existing escape handling in
`deserialize_one_cell_from_json`.
github-actions Bot pushed a commit that referenced this pull request Jun 11, 2026
…#61995)

### Bug

Stream Load (JSON format) writing to `Map<String, String>` columns
silently sets the column to NULL when the map value contains both `\"`
(escaped quotes) and `:` or `,`. Stream Load returns Success with
FilteredRows=0 — data loss with no error indication.

### Fix

Added `\` escape skip in `split_by_delimiter`'s character loop, before
quote detection — consistent with the existing escape handling in
`deserialize_one_cell_from_json`.
yiguolei pushed a commit that referenced this pull request Jun 16, 2026
…ape handling #61995 (#64432)

Cherry-picked from #61995

Co-authored-by: Chenyang Sun <sunchenyang@selectdb.com>
morningman pushed a commit to csun5285/doris that referenced this pull request Jun 24, 2026
…apache#61995)

Manual backport of apache#61995 to branch-4.0. The serde util still lives under
be/src/vec/data_types/serde/ on this branch (namespace doris::vectorized),
so the core fix and the new UT are applied at the vec/ paths and the test's
added include points to vec/data_types/serde/complex_type_deserialize_util.h.
The core change, the UT body, and the jsonb regression data are unchanged.

### Bug
Stream Load (JSON format) writing to Map<String, String> columns silently
sets the column to NULL when the map value contains both \" (escaped quotes)
and : or ,. Stream Load returns Success with FilteredRows=0 — data loss with
no error indication.

### Fix
Added \ escape skip in split_by_delimiter's character loop, before quote
detection — consistent with the existing escape handling in
deserialize_one_cell_from_json.

(cherry picked from commit 0c9fff3)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
yiguolei pushed a commit that referenced this pull request Jun 26, 2026
#61995 ([fix](serde) fix split_by_delimiter missing backslash escape
handling) made the complex-type splitter skip escaped characters so that
'\"' inside a value no longer flips quote state.
### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

### Release note

None

### Check List (For Author)

- Test <!-- At least one of them must be included. -->
    - [ ] Regression test
    - [ ] Unit Test
    - [ ] Manual test (add detailed scripts or steps below)
    - [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
        - [ ] Previous test can cover this change.
        - [ ] No code files have been changed.
        - [ ] Other reason <!-- Add your reason?  -->

- Behavior changed:
    - [ ] No.
    - [ ] Yes. <!-- Explain the behavior change -->

- Does this need documentation?
    - [ ] No.
- [ ] Yes. <!-- Add document PR link here. eg:
apache/doris-website#1214 -->

### Check List (For Reviewer who merge this PR)

- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR
should merge into -->

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
github-actions Bot pushed a commit that referenced this pull request Jun 26, 2026
#61995 ([fix](serde) fix split_by_delimiter missing backslash escape
handling) made the complex-type splitter skip escaped characters so that
'\"' inside a value no longer flips quote state.
### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

### Release note

None

### Check List (For Author)

- Test <!-- At least one of them must be included. -->
    - [ ] Regression test
    - [ ] Unit Test
    - [ ] Manual test (add detailed scripts or steps below)
    - [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
        - [ ] Previous test can cover this change.
        - [ ] No code files have been changed.
        - [ ] Other reason <!-- Add your reason?  -->

- Behavior changed:
    - [ ] No.
    - [ ] Yes. <!-- Explain the behavior change -->

- Does this need documentation?
    - [ ] No.
- [ ] Yes. <!-- Add document PR link here. eg:
apache/doris-website#1214 -->

### Check List (For Reviewer who merge this PR)

- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR
should merge into -->

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/4.0.x dev/4.0.x-conflict dev/4.1.3-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants