Skip to content

fix: allow Unicode letters in column names (NameRegex)#10

Closed
0x5143 wants to merge 1 commit into
masterfrom
fix/unicode-column-name
Closed

fix: allow Unicode letters in column names (NameRegex)#10
0x5143 wants to merge 1 commit into
masterfrom
fix/unicode-column-name

Conversation

@0x5143
Copy link
Copy Markdown
Contributor

@0x5143 0x5143 commented May 17, 2026

问题

RowTableParser 中的 NameRegex 仅允许 ASCII 字母:

private static readonly Regex NameRegex = new Regex(@"^[A-Za-z][A-Za-z0-9_]*$");

当列名包含中文(或其他 Unicode 字母)时,如 Name文本,会抛出:

System.FormatException: 数据列名称不合法: Name文本

根因

[A-Za-z0-9_] 只匹配 ASCII,不匹配任何 Unicode 字母(汉字属于 Unicode 类别 Lo)。

修复

改用 Unicode 属性转义:

private static readonly Regex NameRegex = new Regex(@"^[\p{L}_][\p{L}\p{N}_]*$");
  • \p{L} = 所有 Unicode 字母(包括中文、日文、俄文等)
  • \p{N} = 所有 Unicode 数字
  • 与 C# 标识符规范一致(C# 本身允许 Unicode 字母作为标识符)

向后兼容:所有原来合法的 ASCII 列名仍然合法。

The previous regex ^[A-Za-z][A-Za-z0-9_]*$ only accepted ASCII letters,
rejecting valid column names like "Name文本" that contain CJK or other
Unicode letter categories.

C# identifiers permit any Unicode letter (categories L*, Nl, Nd, Mn,
Mc, Pc, Cf), so restricting to ASCII is unnecessarily strict.

Change NameRegex to use Unicode property escapes:
  ^[\p{L}_][\p{L}\p{N}_]*$

This matches the same ASCII-only names as before, plus any name starting
with a Unicode letter or underscore followed by letters, digits, or
underscores — consistent with C# identifier rules.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@0x5143 0x5143 closed this May 17, 2026
@0x5143 0x5143 deleted the fix/unicode-column-name branch May 17, 2026 14:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant