|
1 | 1 | ## Grammar for raw string literals |
2 | 2 |
|
| 3 | +##### Table of contents |
| 4 | +<!-- toc --> |
| 5 | + |
3 | 6 | I believe the PEG formalism can't naturally describe Rust's rule for matching the number of `#` characters in raw string literals. |
4 | 7 |
|
5 | 8 | (The same limitations apply to matching the number of `-` characters in frontmatter fences.) |
6 | 9 |
|
7 | 10 | I can think of the following ways to handle this: |
8 | 11 |
|
9 | 12 |
|
10 | | -### Ad-hoc extension |
| 13 | +### Corresponding nonterminal extension { #corresponding-nonterminal } |
11 | 14 |
|
12 | 15 | This writeup uses an [ad-hoc extension][rdql-token] to the formalism, |
13 | 16 | along similar lines to the stack extension described below |
14 | 17 | (but without needing a full stack). |
15 | 18 |
|
16 | | -This extension isn't formalised in the [appendix on PEGs](pegs.md). |
17 | | -I don't think a formalisation would be simpler than formalising the stack extension described below. |
| 19 | +It's described as follows: |
| 20 | + |
| 21 | + > an attempt to match one of the parsing expressions marked as HASHES² fails unless the characters it consumes are the same as the characters consumed by the (only) match of the expression marked as HASHES¹ under the same match attempt of a token-kind nonterminal. |
| 22 | +
|
| 23 | +This extension isn't formalised in the [appendix on PEGs]. |
| 24 | + |
| 25 | +It could be formalised in a similar way to the [mark/check] extension below, |
| 26 | +with the addition of some notion of a _scoping nonterminal_ which uses an empty context for its sub-attempt. |
18 | 27 |
|
19 | 28 |
|
20 | 29 | ### Pest's stack extension |
@@ -60,6 +69,73 @@ All other parsing expressions leave the stack unmodified. |
60 | 69 |
|
61 | 70 | </div> |
62 | 71 |
|
| 72 | + |
| 73 | +### Mark/check extension { #mark-check } |
| 74 | + |
| 75 | +This extension uses the same notation as the [corresponding nonterminal] extension. |
| 76 | +It might be described along the following lines: |
| 77 | + |
| 78 | +<div class=pegs-description> |
| 79 | + |
| 80 | +<div class=sketch> |
| 81 | +An attempt to match a parsing expression marked with ² fails |
| 82 | +unless the characters it consumes are the same as the characters consumed by the previous match of an expression marked as ¹. |
| 83 | +</div> |
| 84 | + |
| 85 | +A formalisation of this extension in the style used in the [appendix on PEGs] is sketched below. |
| 86 | + |
| 87 | +Treat ¹ and ² as operators, defining a _mark expression_ and a _check expression_ respectively. |
| 88 | + |
| 89 | +Extend the characterisation of a match attempt to include a _context_, which is a sequence of matches |
| 90 | +(this formalises a notion of the matches preceding the attempt). |
| 91 | + |
| 92 | +Alter the description of most kinds of expression to consider a context and use the same context for each sub-attempt, |
| 93 | +for example: |
| 94 | + |
| 95 | +<div class=sketch> |
| 96 | +An attempt <var>A</var> to match a nonterminal against <var>s</var> in context <var>c</var> succeeds if and only if |
| 97 | +an attempt <var>A′</var> to match the nonterminal's expression against <var>s</var> in context <var>c</var> succeeds. |
| 98 | +</div> |
| 99 | + |
| 100 | +Alter the description of sequencing expressions to use an updated context when attempting the right-hand side: |
| 101 | + |
| 102 | +<div class=sketch> |
| 103 | +The outcome of an attempt <var>A</var> to match a <dfn>sequencing expression</dfn> <code><var>e₁</var> ~ <var>e₂</var></code> against <var>s</var> in context <var>c</var> is as follows: |
| 104 | + |
| 105 | + - If an attempt <var>A₁</var> to match the expression <var>e₁</var> against <var>s</var> in context <var>c</var> fails, |
| 106 | + <var>A</var> fails. |
| 107 | + - Otherwise, <var>A</var> succeeds if and only if |
| 108 | + an attempt <var>A₂</var> to match <var>e₂</var> against <var>s′</var> in context <var>c′</var> succeeds, |
| 109 | + where <var>s′</var> is the sequence of characters obtained by removing the prefix consumed by <var>A₁</var> from <var>s</var>, |
| 110 | + and <var>c′</var> is <var>c</var> followed by the elaboration of <var>A₁</var>. |
| 111 | +</div> |
| 112 | + |
| 113 | +Include mark expressions in the elaboration: |
| 114 | + |
| 115 | +<div class=sketch> |
| 116 | +An attempt <var>A</var> to match a <dfn>mark expression</dfn> <code><var>e¹</var></code> against <var>s</var> in context <var>c</var> succeeds |
| 117 | +if and only if an attempt <var>A′</var> to match <var>e</var> against <var>s</var> in context <var>c</var> succeeds. |
| 118 | + |
| 119 | +If <var>A</var> is successful, |
| 120 | +it consumes the characters consumed by <var>A′</var> |
| 121 | +and its elaboration is <var>A</var> followed by the elaboration of <var>A′</var>. |
| 122 | +</div> |
| 123 | + |
| 124 | +Describe a check expression as failing unless the characters its subexpression consumes are the same as the characters consumed by the last mark expression in its context: |
| 125 | + |
| 126 | +<div class=sketch> |
| 127 | +An attempt <var>A</var> to match a <dfn>check expression</dfn> <code><var>e²</var></code> against <var>s</var> in context <var>c</var> succeeds if |
| 128 | + |
| 129 | + - an attempt <var>A′</var> to match <var>e</var> against <var>s</var> in context <var>c</var> succeeds; and |
| 130 | + - <var>c</var> includes at least one mark expression; and |
| 131 | + - the characters consumed by <var>A′</var> are the same as the characters consumed by the last mark expression in <var>c</var>. |
| 132 | + |
| 133 | +Otherwise <var>A</var> fails. |
| 134 | +</div> |
| 135 | + |
| 136 | +</div> |
| 137 | + |
| 138 | + |
63 | 139 | ### Scheme of definitions |
64 | 140 |
|
65 | 141 | Because raw string literals have a limit of 255 `#` characters, |
@@ -106,6 +182,9 @@ RDQ_255_CONTENT = { |
106 | 182 |
|
107 | 183 | ``` |
108 | 184 |
|
| 185 | +[appendix on PEGs]: pegs.md |
| 186 | +[mark/check]: #mark-check |
| 187 | +[corresponding nonterminal]: #corresponding-nonterminal |
109 | 188 |
|
110 | 189 | [rdql-token]: quoted_literal_tokens.html#rdql |
111 | 190 | [pest-stack]: https://docs.rs/pest/2.8.0/pest/#special-rules |
0 commit comments