fix: allow Unicode letters in column names (NameRegex)#10
Closed
0x5143 wants to merge 1 commit into
Closed
Conversation
The previous regex ^[A-Za-z][A-Za-z0-9_]*$ only accepted ASCII letters,
rejecting valid column names like "Name文本" that contain CJK or other
Unicode letter categories.
C# identifiers permit any Unicode letter (categories L*, Nl, Nd, Mn,
Mc, Pc, Cf), so restricting to ASCII is unnecessarily strict.
Change NameRegex to use Unicode property escapes:
^[\p{L}_][\p{L}\p{N}_]*$
This matches the same ASCII-only names as before, plus any name starting
with a Unicode letter or underscore followed by letters, digits, or
underscores — consistent with C# identifier rules.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
问题
RowTableParser中的NameRegex仅允许 ASCII 字母:当列名包含中文(或其他 Unicode 字母)时,如
Name文本,会抛出:根因
[A-Za-z0-9_]只匹配 ASCII,不匹配任何 Unicode 字母(汉字属于 Unicode 类别Lo)。修复
改用 Unicode 属性转义:
\p{L}= 所有 Unicode 字母(包括中文、日文、俄文等)\p{N}= 所有 Unicode 数字向后兼容:所有原来合法的 ASCII 列名仍然合法。