✨ Parse UTF-8 encoded strings, for UTF8=ACCEPT and IMAP4rev2#111
✨ Parse UTF-8 encoded strings, for UTF8=ACCEPT and IMAP4rev2#111
Conversation
In RFC3501 (IMAP4rev1):
resp-text = ["[" resp-text-code "]" SP] text
In RFC9051 (IMAP4rev2):
resp-text = ["[" resp-text-code "]" SP] [text]
And in RFC9051 Appendix E:
23. resp-text ABNF non-terminal was updated to allow for empty text.
In the spirit of Appendix E. 23 (and based on some actual server
responses I've seen over the years), I've leniently re-interpreted this
as also allowing us to drop the trailing `SP` char after
`[resp-text-code parsable code data]`, like so:
resp-text = "[" resp-text-code "]" [SP [text]] / [text]
Actually, the original parser already _mostly_ behaved this way, because
the original regexps for `T_TEXT` used `*` and not `+`. But, as I
updated the parser in many other places to more closely match the RFCs,
that broke this behavior. This commit originally came _after_ many many
other changes. While rebasing, I moved this commit first because that
simplified later commits.
Also:
* ♻️ Add `Patterns` module, to organize regexps.
* ♻️ Use `Patterns::CharClassSubtraction` refinement to simplify
exceptions.
* ♻️ Add `ParserUtils::Generator#def_char_matchers` to define `SP`,
`LBRA`, `RBRA`.
* ♻️ Add `ParserUtils#{match,accept}_re` to replace `TEXT`, `CTEXT` lex
states.
* ♻️ Remove unused `lex_state` kwarg from match
The parser update supports both RFC6855 (UTF8=ALLOW, UTF8=ONLY) and the UTF8 requirements of IMAP4rev2 (resp-text). Also updated #enable documentation and method signature: * document `UTF8=ACCEPT` as "supported" * use `*rest` args => flatten => map(aliases) => uniq * add `:utf8` as an alias for `UTF8=ACCEPT`
|
Wow, that's nice. I wrote the same thing while I was away being teambuilt, and discovered yours while making a pull request. A joyous kind of disappointment, if you see what I mean. Mine has a couple of details I like better (if I've understood the diff correctly), I may make a tiny PR for those. But as it stands, great work, I'm so happy. Once ruby/net-smtp#49 is merged Ruby will have fine support for EAI. |
|
@arnt Sorry about that! It's a good problem to have, and honestly I might still be sitting on my code in private branch if you hadn't beaten me to the And, I actually just came here to send you a note and ask if you had any thoughts on this PR! In addition to your note on the other PR, I knew from looking up your github profile that you had experience with email UTF-8 encoding. And later when I was searching extra-wg mailing list archives I saw your name in there too! So I figured you had experience that was well worth consulting. 🙂 I look forward to any thoughts you have to offer, and... I'm gonna race you to finish my |
|
@arnt BTW, if you want to simply make a fake "PR" against the earlier version of the code, I'd be happy to take a look at that, too. |
|
I'll be glad to review it, preferably next week, when I have two eleven-hour flights. This week I have lots to do. I'll push the branch again, though. |
|
Looks good to me. Better than my code, which was perhaps overly optimised for being a small diff. I see 'git push' didn't recreate the branch I had deleted. No matter. I made a PR with the bits I had that weren't in your code. Really just details. Thanks for doing this. I think I'll check mikelmail+arabic next. |
|
One other thing. I can't race you on qresync. I can do favours, and will ;) but my job description says I should make EAI/IDN work better, find and fix interop problems, all that kind of thing. There's enough of it to keep me busy for a long time. |
Split off from #104, this PR updates how
textandquotedare handled (without the other bugfixes and updates in that PR).RFC6855(2013):UTF8=ACCEPT(recommended byIMAP4rev2for backward compatibility) #38