src: do not ignore IDNA conversion error#11549
src: do not ignore IDNA conversion error#11549TimothyGu wants to merge 3 commits intonodejs:masterfrom
Conversation
|
Hopefully the issue with legacy url parser is fixed. /cc @nodejs/intl @nodejs/url New CI: https://ci.nodejs.org/job/node-test-pull-request/6586/ |
doc/api/url.md
Outdated
There was a problem hiding this comment.
Btw, should this be deserialization, and mention that it is the inverse of domainToASCII?
There was a problem hiding this comment.
It is serialization, since the domain is fully parsed and subsequently serialized from the parsed form. It's just that it uses a different algorithm for deserialization.
src/node_i18n.cc
Outdated
There was a problem hiding this comment.
Can you update the args.Length() check above to use 2? Also, you probably want to add a CHECK(args[1]->IsBoolean()); or do args[1]->BooleanValue() instead.
There was a problem hiding this comment.
I didn't update the check for argument length, since (as the comment is trying to say) it is an optional argument, so that existing usage of toUnicode(str) would still work. V8 automatically returns an Undefined for out-of-range args[] dereference.
Wasn't aware of BooleanValue(). Will use that instead.
src/node_i18n.cc
Outdated
src/node_i18n.cc
Outdated
There was a problem hiding this comment.
Is this error part of any non-experimental API? Could we change it to Cannot encode name to ASCII as Punycode?
There was a problem hiding this comment.
Yes for toASCII
> url.parse(`http://${'é'.repeat(230)}.com/`)
Error: Cannot convert name to ASCII
src/node_i18n.cc
Outdated
There was a problem hiding this comment.
Is this error part of any non-experimental API? Could we change it to Cannot decode name as Punycode? (basically the same question I also posted below).
There was a problem hiding this comment.
No; in fact the toUnicode JS function isn't used in the code base at all. Maybe we should just remove this method?
/cc @jasnell
There was a problem hiding this comment.
If it's not used, it can be removed.
There was a problem hiding this comment.
Remove which function specifically? The `i18n::ToUnicode' function is definitely used.
There was a problem hiding this comment.
@jasnell, the exposed process.binding('icu').toUnicode() JS function.
src/node_i18n.cc
Outdated
There was a problem hiding this comment.
Does this compile? Seems like the env->context() argument is missing
There was a problem hiding this comment.
@addaleax, you are right. Forgot to push fde77b3
src/node_i18n.cc
Outdated
There was a problem hiding this comment.
If it's not used, it can be removed.
joyeecheung
left a comment
There was a problem hiding this comment.
This should also fix the missing errors when parsing percent-encoded disallowed characters in hosts(https://github.com/nodejs/node/blob/master/test/fixtures/url-tests.js#L4499) since we are no longer ignoring UIDNA_ERROR_DISALLOWED, you can turn them on in this PR if you like.
|
@jasnell, did you see #11549 (comment)? |
Old behavior can be restored using a special `lenient` mode.
- Split the tests out to a separate file - Add invalid cases - Add tests for url.domainTo*() - Re-enable previously broken WPT URL parsing tests
|
Test re-enabled per @joyeecheung. Will land tomorrow. |
|
Landed in a520508...7ceea2a. |
Old behavior can be restored using a special `lenient` mode, as used in the legacy URL parser. PR-URL: #11549 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
- Split the tests out to a separate file - Add invalid cases - Add tests for url.domainTo*() - Re-enable previously broken WPT URL parsing tests PR-URL: #11549 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
PR-URL: #11549 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
Old behavior can be restored using a special `lenient` mode, as used in the legacy URL parser. PR-URL: #11549 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
- Split the tests out to a separate file - Add invalid cases - Add tests for url.domainTo*() - Re-enable previously broken WPT URL parsing tests PR-URL: #11549 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
PR-URL: #11549 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
Currently, the ICU-based IDNA conversion methods only return errors on those passed along through a
UErrorCode. However, according to ICU's documentation foruidna_nameToASCII(),In other words, when non-catastrophically invalid domains are passed,
ToASCII()andToUnicode()(and their downstreamurl.domainToASCII()andurl.domainToUnicode()) currently return garbled domain names instead of errors.This PR makes the C++ binding methods report errors in
pInfo->errorsin addition toUErrorCode, thereby fixing those aforementioned problems.Also included in this PR are additional tests for invalid situations as well as documentation clarifications for the user-facing
url.domainToASCII()andurl.domainToUnicode().Before vs. after
Checklist
make -j4 test(UNIX), orvcbuild test(Windows) passesAffected core subsystem(s)