Skip to content

[Bug] Chinese characters garbled to ??? when synchronizing MySQL data with auto table creation via multi-table CDC mode of CREATE STREAMING JOB #64806

Description

@brantyou

Search before asking

  • I had searched in the issues and found no similar issues.

Version

4.1.1

What's Wrong?

When using the multi-table CDC mode of CREATE STREAMING JOB in Doris 4.1.1 to synchronize tables and data from a specified MySQL database with auto table creation enabled, all Chinese characters in the synchronized data are garbled and displayed as ???.
For the identical MySQL table, if we adopt the CREATE STREAMING JOB TVF mode: manually create a Doris primary key model table first, then perform data synchronization, the Chinese characters display normally without garbling.
We executed SHOW CREATE TABLE to check the DDL statements generated by Doris under the two modes, and found no obvious differences between them.
In the CDC mode synchronization task, the following parameters have already been appended to the MySQL jdbc_url:
?useUnicode=true&characterEncoding=utf-8&serverTimezone=Asia/Shanghai&zeroDateTimeBehavior=convertToNull
Yet Chinese content still turns into garbled ???.
The schema configuration of the corresponding MySQL table is as follows:
ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci ROW_FORMAT=DYNAMIC

What You Expected?

When synchronizing MySQL tables and data using Doris 4.1.1 multi-table CDC streaming job with auto table creation, Chinese text should be parsed and stored correctly, displaying normal Chinese characters instead of garbled ???.
The Chinese display effect should be consistent with the TVF streaming job mode (manually created Doris primary key tables work fine for Chinese content).

How to Reproduce?

Environment: Apache Doris 4.1.1, source MySQL table with charset utf8mb4
MySQL table config: ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci ROW_FORMAT=DYNAMIC
Create a multi-table CDC streaming job with auto table creation enabled to sync all tables from a specified MySQL database.
Append charset & timezone parameters to MySQL jdbc_url:
?useUnicode=true&characterEncoding=utf-8&serverTimezone=Asia/Shanghai&zeroDateTimeBehavior=convertToNull
Insert or read Chinese data in MySQL source table, wait for CDC synchronization to complete.
Query synced data in Doris, all Chinese characters show as ???.

Anything Else?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions