[fix](streaming-job) drop neighbour-table rows leaked by JDBC LIKE wildcards in JdbcPostgreSQLClient by JNSimba · Pull Request #63402 · apache/doris

JNSimba · 2026-05-19T09:13:52Z

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

JdbcPostgreSQLClient.getJdbcColumnsInfo calls DatabaseMetaData.getColumns(catalog, schemaPattern, tableNamePattern, columnNamePattern). Per the JDBC spec the 3rd argument is a SQL LIKE pattern, so literal _ / % characters in the requested table name are interpreted as wildcards by the Postgres driver. When a streaming job is created with include_tables = "user_info_pg_normal1" and a neighbour table like userXinfo_pg_normal1 happens to coexist in the same schema, the metadata query returns columns from both tables. The combined result then trips CREATE TABLE on the Doris side with errors such as errCode = 2, detailMessage = Duplicate column name 'name', or pollutes the auto-created table schema with stray columns.

The repro is trivial: in the same Postgres schema create

user_info_pg_normal1(name varchar, age int2) — the table we want to capture
userXinfo_pg_normal1(name varchar, weight float8) — a decoy whose name only differs from the target by a single character that _ matches

then run CREATE JOB ... include_tables = "user_info_pg_normal1". Without the fix the schema fetched for the target leaks weight (or Duplicate column name 'name', depending on column order).

Fix: after fetching the ResultSet, drop rows whose TABLE_NAME does not exactly equal the requested remoteTableName. We deliberately do not escape _ / % at the source — relying on DatabaseMetaData.getSearchStringEscape() is driver-version dependent (older Oracle drivers don't honour escape sequences in getTables), while filtering on the consumer side is deterministic and driver-agnostic.

Scope:

Only JdbcPostgreSQLClient is patched. This is the path used by Postgres streaming jobs (the failing case). MySQL streaming jobs were checked against the same decoy pattern and do not reproduce the bug because MySQL Connector/J doesn't pull neighbour rows here in practice — so JdbcMySQLClient is left untouched in this PR.
The JDBC catalog path lives in a separate module (fe-connector-jdbc/.../JdbcConnectorClient) and is not part of this PR. It already does partial escape but intentionally skips _ / % for driver-compatibility reasons; a follow-up can apply the same after-the-fact filter there.

Release note

None

Check List (For Author)

Test
- Regression test
- Unit Test
- Manual test (add detailed scripts or steps below)
- No need to test or manual test. Explain why:
  - This is a refactor/code format and no logic has been changed.
  - Previous test can cover this change.
  - No code files have been changed.
  - Other reason

A decoy table userXinfo_pg_normal1 (with a different column shape: weight float8) is added to test_streaming_postgres_job.groovy, plus an assert !createTalInfo.contains(\weight`)guard. Without the fix the case fails (eitherDuplicate column nameduringCREATE TABLE, or the weightassert trips). The same decoy is mirrored intotest_streaming_mysql_job.groovy` as a baseline so any future regression in MySQL Connector/J's behaviour is caught immediately.

Behavior changed:
- No.
- Yes.
Does this need documentation?
- No.
- Yes.

Check List (For Reviewer who merge this PR)

Confirm the release note
Confirm test cases
Confirm document
Add branch pick label

…ldcards in JdbcPostgreSQLClient.getJdbcColumnsInfo

hello-stephen · 2026-05-19T09:13:58Z

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

What problem was fixed (it's best to include specific error reporting information). How it was fixed.
Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
What features were added. Why was this function added?
Which code was refactored and why was this part of the code refactored?
Which functions were optimized and what is the difference before and after the optimization?

…Client and revert out-of-scope decoy added to mysql case

liaoxin01

LGTM

github-actions · 2026-05-26T07:27:28Z

PR approved by at least one committer and no changes requested.

github-actions · 2026-05-26T07:27:31Z

PR approved by anyone and no changes requested.

JNSimba · 2026-05-26T07:28:46Z

run buildall

JNSimba · 2026-05-26T07:28:54Z

/review

github-actions

I found one correctness issue in the PR's main fix. The approach is directionally correct for table-name wildcard leakage, but the same JDBC API also treats the schema argument as a LIKE pattern, so the current filter is incomplete for Postgres schemas containing _ or %.

Critical checkpoint conclusions:

Goal/test: The PR targets leaked columns from PostgreSQL JDBC LIKE matching and adds a regression for table-name _; it does not cover schema-pattern leakage, so the goal is only partially satisfied.
Scope: The change is small and focused.
Concurrency/lifecycle/config/compatibility: No new concurrency, special lifecycle, config, storage format, or FE-BE protocol compatibility concerns found.
Parallel paths: The modified PostgreSQL columns path is the relevant streaming-job path described by the PR; primary-key lookup uses a different JDBC call.
Tests: The added regression covers the table-name wildcard case but misses the schema wildcard case.
Observability/performance/transactions: No material concerns found.

User focus: No additional user-provided review focus was specified.

…st_streaming_postgres_job.groovy decoy block)

…IKE wildcards in JdbcPostgreSQLClient.getJdbcColumnsInfo

JNSimba · 2026-05-26T09:07:06Z

/review

JNSimba · 2026-05-26T09:07:11Z

run buildall

hello-stephen · 2026-05-26T09:45:45Z

TPC-H: Total hot run time: 31614 ms

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit b9107947bb0fd67973c72d4792cd91618a3b2a1f, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17685	4067	4010	4010
q2	q3	10773	1376	851	851
q4	4690	475	354	354
q5	7627	2271	2096	2096
q6	261	176	144	144
q7	942	821	635	635
q8	9352	1780	1650	1650
q9	6500	4955	4881	4881
q10	6458	2252	1858	1858
q11	440	270	242	242
q12	688	430	295	295
q13	18212	3366	2787	2787
q14	265	254	235	235
q15	q16	816	779	700	700
q17	883	894	875	875
q18	6947	5785	5556	5556
q19	1182	1292	1212	1212
q20	535	454	271	271
q21	5989	2782	2638	2638
q22	458	386	324	324
Total cold run time: 100703 ms
Total hot run time: 31614 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4776	4753	4976	4753
q2	q3	4901	5274	4665	4665
q4	2155	2218	1440	1440
q5	4928	4746	4746	4746
q6	240	181	124	124
q7	1799	1978	1579	1579
q8	2453	1978	1961	1961
q9	7404	7504	7375	7375
q10	4807	4699	4253	4253
q11	543	391	358	358
q12	738	753	547	547
q13	2962	3351	2789	2789
q14	274	278	253	253
q15	q16	684	707	614	614
q17	1302	1277	1263	1263
q18	7347	6734	6847	6734
q19	1151	1109	1109	1109
q20	2229	2219	1969	1969
q21	5343	4656	4528	4528
q22	521	482	405	405
Total cold run time: 56557 ms
Total hot run time: 51465 ms

hello-stephen · 2026-05-26T09:56:48Z

TPC-DS: Total hot run time: 172798 ms

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit b9107947bb0fd67973c72d4792cd91618a3b2a1f, data reload: false

query5	4317	675	518	518
query6	335	232	200	200
query7	4211	546	313	313
query8	330	242	222	222
query9	8849	4126	4129	4126
query10	460	350	306	306
query11	5841	2600	2230	2230
query12	187	130	127	127
query13	1332	612	465	465
query14	6170	5578	5256	5256
query14_1	4578	4571	4555	4555
query15	217	207	184	184
query16	983	452	460	452
query17	1068	757	622	622
query18	2456	503	376	376
query19	218	211	167	167
query20	142	144	133	133
query21	219	143	117	117
query22	13713	13719	13376	13376
query23	17430	16557	16268	16268
query23_1	16462	16398	16432	16398
query24	7459	1796	1324	1324
query24_1	1339	1327	1344	1327
query25	592	527	442	442
query26	1309	317	180	180
query27	2692	571	361	361
query28	4441	1979	2001	1979
query29	1051	668	522	522
query30	313	253	202	202
query31	1135	1092	989	989
query32	95	82	78	78
query33	567	395	309	309
query34	1176	1134	658	658
query35	820	809	689	689
query36	1420	1364	1245	1245
query37	162	117	88	88
query38	3239	3180	3113	3113
query39	936	940	909	909
query39_1	886	864	881	864
query40	244	153	131	131
query41	71	70	70	70
query42	120	113	118	113
query43	351	346	312	312
query44	
query45	218	212	205	205
query46	1119	1199	756	756
query47	2364	2426	2351	2351
query48	406	409	302	302
query49	640	504	383	383
query50	1004	354	261	261
query51	4345	4304	4231	4231
query52	103	105	95	95
query53	261	288	206	206
query54	316	274	257	257
query55	93	92	90	90
query56	310	309	292	292
query57	1442	1432	1357	1357
query58	305	275	270	270
query59	1597	1665	1473	1473
query60	334	327	310	310
query61	164	147	159	147
query62	692	649	599	599
query63	240	205	213	205
query64	2397	816	628	628
query65	
query66	1732	486	360	360
query67	29920	29845	29755	29755
query68	
query69	463	339	306	306
query70	1016	1079	997	997
query71	304	273	267	267
query72	3032	2660	2405	2405
query73	822	755	418	418
query74	5145	5031	4844	4844
query75	2707	2636	2319	2319
query76	2294	1147	781	781
query77	411	417	347	347
query78	12508	12392	11876	11876
query79	1523	1029	770	770
query80	1236	547	461	461
query81	506	292	243	243
query82	1449	155	125	125
query83	383	284	247	247
query84	287	142	112	112
query85	946	545	454	454
query86	435	335	334	334
query87	3496	3440	3247	3247
query88	3616	2735	2730	2730
query89	454	392	348	348
query90	1777	186	184	184
query91	179	167	137	137
query92	80	81	72	72
query93	1438	1516	883	883
query94	662	343	303	303
query95	697	399	356	356
query96	1125	775	339	339
query97	2724	2747	2575	2575
query98	245	229	231	229
query99	1184	1146	1052	1052
Total cold run time: 256058 ms
Total hot run time: 172798 ms

github-actions

I found one remaining correctness gap in a parallel PostgreSQL JDBC metadata path. The new exact schema/table filtering fixes fe-core's JdbcPostgreSQLClient, and the added regression covers the streaming-job path including the schema-wildcard case that was already raised in the existing thread. However, fe/fe-connector/fe-connector-jdbc/src/main/java/org/apache/doris/connector/jdbc/client/JdbcPostgreSQLConnectorClient.java is an adapted PostgreSQL client and still builds schemas directly from DatabaseMetaData.getColumns(cat, remoteDbName, remoteTableName, null) without exact TABLE_SCHEM/TABLE_NAME filtering. Users of that connector path can still get columns mixed from wildcard-matching schemas/tables.

Critical checkpoint conclusions:

Goal: prevent JDBC LIKE wildcard leakage in PostgreSQL column discovery. Partially accomplished; legacy FE datasource path is fixed, connector path remains vulnerable.
Scope/focus: the code change itself is small and clear, but incomplete across functionally parallel code paths.
Concurrency/lifecycle/config/compatibility: no new concurrency, lifecycle, config, storage-format, or FE-BE protocol concerns found.
Parallel paths: issue found in JdbcPostgreSQLConnectorClient.getJdbcColumnsInfo.
Special checks: exact filtering is appropriate at the ResultSet consumption point.
Tests: regression covers the streaming-job path and both table/schema wildcard decoys; no coverage for the fe-connector PostgreSQL path.
Observability/transactions/persistence/data writes: not applicable to this metadata-only change.
Performance: the added per-row string checks are negligible relative to JDBC metadata IO.
User focus: no additional user-provided review focus was specified.

hello-stephen · 2026-05-26T12:08:50Z

FE Regression Coverage Report

Increment line coverage 0.00% (0/87) 🎉
Increment coverage report
Complete coverage report

fixed

#63404 #63471 #63480 #63490 #63514 #63618 (#63812) Cherry-picked from: - #63079 [improve](streaming-job) async chunk splitting for cdc source job - #63404 [test](streaming-job) refine cdc data-type and boundary regression cases for mysql/pg - #63471 [regression-test](streaming-job) add cdc cases (composite/concurrent-dml/id-gap/decimal/datetime pk) and fix split-bound java.time deserialize - #63480 [fix](streaming-job) misc fixes for typo/log/validation/visibility - #63402 [fix](streaming-job) drop neighbour-table rows leaked by JDBC LIKE wildcards in JdbcPostgreSQLClient - #63514 [regression-test](streaming-job) add cdc operational cases for offset modes and pg slot lifecycle - #63618 [fix](streaming-job) fix postgres historical-date timestamp handling in cdc-client - #63490 [improve](streaming-job) support user-specified mysql server_id with per-reader assignment

…ldcards in JdbcPostgreSQLClient (apache#63402) ### What problem does this PR solve? `JdbcPostgreSQLClient.getJdbcColumnsInfo` calls `DatabaseMetaData.getColumns(catalog, schemaPattern, tableNamePattern, columnNamePattern)`. Per the JDBC spec the 3rd argument is a **SQL LIKE pattern**, so literal `_` / `%` characters in the requested table name are interpreted as wildcards by the Postgres driver. When a streaming job is created with `include_tables = "user_info_pg_normal1"` and a neighbour table like `userXinfo_pg_normal1` happens to coexist in the same schema, the metadata query returns columns from **both** tables. The combined result then trips `CREATE TABLE` on the Doris side with errors such as `errCode = 2, detailMessage = Duplicate column name 'name'`, or pollutes the auto-created table schema with stray columns. The repro is trivial: in the same Postgres schema create - `user_info_pg_normal1(name varchar, age int2)` — the table we want to capture - `userXinfo_pg_normal1(name varchar, weight float8)` — a decoy whose name only differs from the target by a single character that `_` matches then run `CREATE JOB ... include_tables = "user_info_pg_normal1"`. Without the fix the schema fetched for the target leaks `weight` (or `Duplicate column name 'name'`, depending on column order). Fix: after fetching the `ResultSet`, drop rows whose `TABLE_NAME` does not exactly equal the requested `remoteTableName`. We deliberately do **not** escape `_` / `%` at the source — relying on `DatabaseMetaData.getSearchStringEscape()` is driver-version dependent (older Oracle drivers don't honour escape sequences in `getTables`), while filtering on the consumer side is deterministic and driver-agnostic. Scope: - Only `JdbcPostgreSQLClient` is patched. This is the path used by Postgres streaming jobs (the failing case). MySQL streaming jobs were checked against the same decoy pattern and do not reproduce the bug because MySQL Connector/J doesn't pull neighbour rows here in practice — so `JdbcMySQLClient` is left untouched in this PR. - The JDBC catalog path lives in a separate module (`fe-connector-jdbc/.../JdbcConnectorClient`) and is **not** part of this PR. It already does partial escape but intentionally skips `_` / `%` for driver-compatibility reasons; a follow-up can apply the same after-the-fact filter there.

[fix](streaming-job) drop neighbour-table rows leaked by JDBC LIKE wi…

8af6e19

…ldcards in JdbcPostgreSQLClient.getJdbcColumnsInfo

[fix](streaming-job) trim LIKE-wildcard fix comment in JdbcPostgreSQL…

626b408

…Client and revert out-of-scope decoy added to mysql case

liaoxin01 previously approved these changes May 26, 2026

View reviewed changes

github-actions Bot added the approved Indicates a PR has been approved by one committer. label May 26, 2026

github-actions Bot added the reviewed label May 26, 2026

github-actions Bot requested changes May 26, 2026

View reviewed changes

Comment thread fe/fe-core/src/main/java/org/apache/doris/datasource/jdbc/client/JdbcPostgreSQLClient.java Outdated

Merge upstream/master into fix/jdbc-escape-like-wildcards (resolve te…

122b341

…st_streaming_postgres_job.groovy decoy block)

JNSimba dismissed liaoxin01’s stale review via 122b341 May 26, 2026 08:22

github-actions Bot removed the approved Indicates a PR has been approved by one committer. label May 26, 2026

[fix](streaming-job) also drop neighbour-schema rows leaked by JDBC L…

b910794

…IKE wildcards in JdbcPostgreSQLClient.getJdbcColumnsInfo

github-actions Bot previously requested changes May 26, 2026

View reviewed changes

Comment thread fe/fe-core/src/main/java/org/apache/doris/datasource/jdbc/client/JdbcPostgreSQLClient.java

liaoxin01 approved these changes May 26, 2026

View reviewed changes

JNSimba merged commit 41581e5 into apache:master May 26, 2026
31 checks passed

JNSimba mentioned this pull request May 28, 2026

branch-4.1: [pick](streamingjob) pick streaming-job batch #63079 #63402 #63404 #63471 #63480 #63490 #63514 #63618 #63812

Merged

yiguolei added the dev/4.1.2-merged label May 29, 2026

yiguolei mentioned this pull request Jun 14, 2026

4.1.2 Release Notes #64485

Open

Uh oh!

Conversation

JNSimba commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem does this PR solve?

Release note

Check List (For Author)

Check List (For Reviewer who merge this PR)

Uh oh!

hello-stephen commented May 19, 2026

Uh oh!

liaoxin01 left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 26, 2026

Uh oh!

github-actions Bot commented May 26, 2026

Uh oh!

JNSimba commented May 26, 2026

Uh oh!

JNSimba commented May 26, 2026

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

JNSimba commented May 26, 2026

Uh oh!

JNSimba commented May 26, 2026

Uh oh!

hello-stephen commented May 26, 2026

Uh oh!

hello-stephen commented May 26, 2026

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

hello-stephen commented May 26, 2026

FE Regression Coverage Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

JNSimba commented May 19, 2026 •

edited

Loading