✨ Parse UTF-8 encoded strings, for UTF8=ACCEPT and IMAP4rev2 by nevans · Pull Request #111 · ruby/net-imap

nevans · 2023-02-12T06:00:37Z

Split off from #104, this PR updates how text and quoted are handled (without the other bugfixes and updates in that PR).

Fixes resp-text ABNF non-terminal was updated to allow for empty text. #110
Fixes RFC6855 (2013): UTF8=ACCEPT (recommended by IMAP4rev2 for backward compatibility) #38
Fixes After ENABLE, IMAP4rev2 human-readable response text can include non-ASCII encoded in UTF-8. #109

In RFC3501 (IMAP4rev1): resp-text = ["[" resp-text-code "]" SP] text In RFC9051 (IMAP4rev2): resp-text = ["[" resp-text-code "]" SP] [text] And in RFC9051 Appendix E: 23. resp-text ABNF non-terminal was updated to allow for empty text. In the spirit of Appendix E. 23 (and based on some actual server responses I've seen over the years), I've leniently re-interpreted this as also allowing us to drop the trailing `SP` char after `[resp-text-code parsable code data]`, like so: resp-text = "[" resp-text-code "]" [SP [text]] / [text] Actually, the original parser already _mostly_ behaved this way, because the original regexps for `T_TEXT` used `*` and not `+`. But, as I updated the parser in many other places to more closely match the RFCs, that broke this behavior. This commit originally came _after_ many many other changes. While rebasing, I moved this commit first because that simplified later commits. Also: * ♻️ Add `Patterns` module, to organize regexps. * ♻️ Use `Patterns::CharClassSubtraction` refinement to simplify exceptions. * ♻️ Add `ParserUtils::Generator#def_char_matchers` to define `SP`, `LBRA`, `RBRA`. * ♻️ Add `ParserUtils#{match,accept}_re` to replace `TEXT`, `CTEXT` lex states. * ♻️ Remove unused `lex_state` kwarg from match

The parser update supports both RFC6855 (UTF8=ALLOW, UTF8=ONLY) and the UTF8 requirements of IMAP4rev2 (resp-text). Also updated #enable documentation and method signature: * document `UTF8=ACCEPT` as "supported" * use `*rest` args => flatten => map(aliases) => uniq * add `:utf8` as an alias for `UTF8=ACCEPT`

arnt · 2023-02-13T11:23:04Z

Wow, that's nice. I wrote the same thing while I was away being teambuilt, and discovered yours while making a pull request. A joyous kind of disappointment, if you see what I mean.

Mine has a couple of details I like better (if I've understood the diff correctly), I may make a tiny PR for those. But as it stands, great work, I'm so happy. Once ruby/net-smtp#49 is merged Ruby will have fine support for EAI.

nevans · 2023-02-13T14:07:19Z

@arnt Sorry about that! It's a good problem to have, and honestly I might still be sitting on my code in private branch if you hadn't beaten me to the ENABLE PR. 😉 I understand this joyous disappointment well!

And, I actually just came here to send you a note and ask if you had any thoughts on this PR! In addition to your note on the other PR, I knew from looking up your github profile that you had experience with email UTF-8 encoding. And later when I was searching extra-wg mailing list archives I saw your name in there too! So I figured you had experience that was well worth consulting. 🙂

I look forward to any thoughts you have to offer, and... I'm gonna race you to finish my QRESYNC PRs first, too. 😜

nevans · 2023-02-13T14:12:47Z

@arnt BTW, if you want to simply make a fake "PR" against the earlier version of the code, I'd be happy to take a look at that, too.

arnt · 2023-02-13T20:23:50Z

I'll be glad to review it, preferably next week, when I have two eleven-hour flights. This week I have lots to do. I'll push the branch again, though.

arnt · 2023-02-22T12:08:42Z

Looks good to me. Better than my code, which was perhaps overly optimised for being a small diff.

I see 'git push' didn't recreate the branch I had deleted. No matter. I made a PR with the bits I had that weren't in your code. Really just details.

Thanks for doing this. I think I'll check mikelmail+arabic next.

arnt · 2023-02-22T12:16:41Z

One other thing. I can't race you on qresync. I can do favours, and will ;) but my job description says I should make EAI/IDN work better, find and fix interop problems, all that kind of thing. There's enough of it to keep me busy for a long time.

nevans added 2 commits February 12, 2023 00:23

nevans added the IMAP4rev2 Requirement for IMAP4rev2, RFC9051 label Feb 12, 2023

nevans merged commit b557d51 into master Feb 12, 2023

nevans deleted the UTF8-for-IMAP4rev2 branch February 12, 2023 06:02

nevans mentioned this pull request Feb 12, 2023

✨♻️🐛⚡ Support for UTF8, optional text, fix namespace and bodystructure bugs #104

Closed

nirvdrum mentioned this pull request Oct 6, 2023

net-imap 0.4.0 Regexp Error truffleruby/truffleruby#3287

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

✨ Parse UTF-8 encoded strings, for UTF8=ACCEPT and IMAP4rev2#111

✨ Parse UTF-8 encoded strings, for UTF8=ACCEPT and IMAP4rev2#111
nevans merged 2 commits intomasterfrom
UTF8-for-IMAP4rev2

nevans commented Feb 12, 2023

Uh oh!

arnt commented Feb 13, 2023

Uh oh!

nevans commented Feb 13, 2023 •

edited

Loading

Uh oh!

nevans commented Feb 13, 2023 •

edited

Loading

Uh oh!

arnt commented Feb 13, 2023

Uh oh!

arnt commented Feb 22, 2023

Uh oh!

arnt commented Feb 22, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nevans commented Feb 12, 2023

Uh oh!

arnt commented Feb 13, 2023

Uh oh!

nevans commented Feb 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nevans commented Feb 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arnt commented Feb 13, 2023

Uh oh!

arnt commented Feb 22, 2023

Uh oh!

arnt commented Feb 22, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nevans commented Feb 13, 2023 •

edited

Loading

nevans commented Feb 13, 2023 •

edited

Loading