Search before asking
Version
4.1.1
What's Wrong?
When using the multi-table CDC mode of CREATE STREAMING JOB in Doris 4.1.1 to synchronize tables and data from a specified MySQL database with auto table creation enabled, all Chinese characters in the synchronized data are garbled and displayed as ???.
For the identical MySQL table, if we adopt the CREATE STREAMING JOB TVF mode: manually create a Doris primary key model table first, then perform data synchronization, the Chinese characters display normally without garbling.
We executed SHOW CREATE TABLE to check the DDL statements generated by Doris under the two modes, and found no obvious differences between them.
In the CDC mode synchronization task, the following parameters have already been appended to the MySQL jdbc_url:
?useUnicode=true&characterEncoding=utf-8&serverTimezone=Asia/Shanghai&zeroDateTimeBehavior=convertToNull
Yet Chinese content still turns into garbled ???.
The schema configuration of the corresponding MySQL table is as follows:
ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci ROW_FORMAT=DYNAMIC
What You Expected?
When synchronizing MySQL tables and data using Doris 4.1.1 multi-table CDC streaming job with auto table creation, Chinese text should be parsed and stored correctly, displaying normal Chinese characters instead of garbled ???.
The Chinese display effect should be consistent with the TVF streaming job mode (manually created Doris primary key tables work fine for Chinese content).
How to Reproduce?
Environment: Apache Doris 4.1.1, source MySQL table with charset utf8mb4
MySQL table config: ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci ROW_FORMAT=DYNAMIC
Create a multi-table CDC streaming job with auto table creation enabled to sync all tables from a specified MySQL database.
Append charset & timezone parameters to MySQL jdbc_url:
?useUnicode=true&characterEncoding=utf-8&serverTimezone=Asia/Shanghai&zeroDateTimeBehavior=convertToNull
Insert or read Chinese data in MySQL source table, wait for CDC synchronization to complete.
Query synced data in Doris, all Chinese characters show as ???.
Anything Else?
No response
Are you willing to submit PR?
Code of Conduct
Search before asking
Version
4.1.1
What's Wrong?
When using the multi-table CDC mode of CREATE STREAMING JOB in Doris 4.1.1 to synchronize tables and data from a specified MySQL database with auto table creation enabled, all Chinese characters in the synchronized data are garbled and displayed as ???.
For the identical MySQL table, if we adopt the CREATE STREAMING JOB TVF mode: manually create a Doris primary key model table first, then perform data synchronization, the Chinese characters display normally without garbling.
We executed SHOW CREATE TABLE to check the DDL statements generated by Doris under the two modes, and found no obvious differences between them.
In the CDC mode synchronization task, the following parameters have already been appended to the MySQL jdbc_url:
?useUnicode=true&characterEncoding=utf-8&serverTimezone=Asia/Shanghai&zeroDateTimeBehavior=convertToNull
Yet Chinese content still turns into garbled ???.
The schema configuration of the corresponding MySQL table is as follows:
ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci ROW_FORMAT=DYNAMIC
What You Expected?
When synchronizing MySQL tables and data using Doris 4.1.1 multi-table CDC streaming job with auto table creation, Chinese text should be parsed and stored correctly, displaying normal Chinese characters instead of garbled ???.
The Chinese display effect should be consistent with the TVF streaming job mode (manually created Doris primary key tables work fine for Chinese content).
How to Reproduce?
Environment: Apache Doris 4.1.1, source MySQL table with charset utf8mb4
MySQL table config: ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci ROW_FORMAT=DYNAMIC
Create a multi-table CDC streaming job with auto table creation enabled to sync all tables from a specified MySQL database.
Append charset & timezone parameters to MySQL jdbc_url:
?useUnicode=true&characterEncoding=utf-8&serverTimezone=Asia/Shanghai&zeroDateTimeBehavior=convertToNull
Insert or read Chinese data in MySQL source table, wait for CDC synchronization to complete.
Query synced data in Doris, all Chinese characters show as ???.
Anything Else?
No response
Are you willing to submit PR?
Code of Conduct