Give tip for unicode chars which might not be visible when rendered#100439
Give tip for unicode chars which might not be visible when rendered#100439chenyukang wants to merge 1 commit into
Conversation
|
(rust-highfive has picked a reviewer for you, use r? to override) |
This comment has been minimized.
This comment has been minimized.
| let token = unicode_chars::check_for_substitution(self, start, c, &mut err); | ||
| if c == '\x00' { | ||
| err.help("source files must contain UTF-8 encoded text, unexpected null bytes might occur when a different encoding is used"); | ||
| } else if let Ok(v) = &err.suggestions && v.len() == 0 && !c.is_ascii() { |
There was a problem hiding this comment.
I think this needs a different heuristic for "is not visible when rendered", since this triggers on every non-ascii char.
Also, why are you checking err.suggestions here?
There was a problem hiding this comment.
I think this needs a different heuristic for "is not visible when rendered", since this triggers on every non-ascii char.
Also, why are you checking
err.suggestionshere?
Yes, I'm still thinking any better method for check a unicode...
I'm checking the err.suggestions here, because we already have a strategy to give suggestions for those possible substitution,
rust/compiler/rustc_parse/src/lexer/mod.rs
Line 322 in d76952f
My idea is if we already have such kind of suggestion, we won't add more, since it's clear enough:
--> /home/cat/code/rust/src/test/ui/parser/unicode-quote-chars.rs:2:14
|
2 | println!(“hello world”);
| ^
|
= help: Unicode character \u{323} might not be visible when rendered
help: Unicode characters '“' (Left Double Quotation Mark) and '”' (Right Double Quotation Mark) look like '"' (Quotation Mark), but are not
I think = help: Unicode character \u{323} might not be visible when rendered here will be redundant.
There was a problem hiding this comment.
I'm checking the err.suggestions here, because we already have a strategy to give suggestions for those possible substitution,
I think if you change your PR to only note characters that have a rendered width of zero (i.e. combining characters, etc) then this check is redundant.
There was a problem hiding this comment.
I haven't found a better way to figure out "not visible when rendered" chars.
We have a test case in
rust/src/test/ui/issues/issue-29227.rs
Line 125 in b998821
The unicode_space_separator, unicode_space_separator and combining_spacing_mark categories seem contain the chars which are hard to spot, but those are also not totaly 'not visible`, and I think introduce those big tables into compiler code base maybe too heavy ..
|
@rustbot author |
|
You'll need to use the unicode table generator to generate a table of the characters with General Category=Nonspacing Mark and/or Bidi_Class=Nonspacing_Mark for which to warn on. Adding the table to rustc is probably fine, adding it to std is nondesirable. The UI test is unrelated. |
|
☔ The latest upstream changes (presumably #102302) made this pull request unmergeable. Please resolve the merge conflicts. |
|
ping from triage: Can you please address the merge conflicts - and post your status on this PR? FYI: when a PR is ready for review, send a message containing |
Fixes #100388