Skip to content

Commit 7401cf2

Browse files
committed
Describe the mark/check extension, with a sketch of formalising it
1 parent c9cb65f commit 7401cf2

File tree

2 files changed

+92
-3
lines changed

2 files changed

+92
-3
lines changed

css/writeup.css

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -120,3 +120,13 @@ kbd {
120120
.pegs-description code {
121121
white-space: preserve nowrap;
122122
}
123+
124+
.pegs-description .sketch {
125+
padding: 10px 10px;
126+
background-color: var(--quote-bg);
127+
}
128+
129+
.pegs-description .sketch p {
130+
margin-bottom: 0;
131+
}
132+

writeup/raw_strings.md

Lines changed: 82 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,29 @@
11
## Grammar for raw string literals
22

3+
##### Table of contents
4+
<!-- toc -->
5+
36
I believe the PEG formalism can't naturally describe Rust's rule for matching the number of `#` characters in raw string literals.
47

58
(The same limitations apply to matching the number of `-` characters in frontmatter fences.)
69

710
I can think of the following ways to handle this:
811

912

10-
### Ad-hoc extension
13+
### Corresponding nonterminal extension { #corresponding-nonterminal }
1114

1215
This writeup uses an [ad-hoc extension][rdql-token] to the formalism,
1316
along similar lines to the stack extension described below
1417
(but without needing a full stack).
1518

16-
This extension isn't formalised in the [appendix on PEGs](pegs.md).
17-
I don't think a formalisation would be simpler than formalising the stack extension described below.
19+
It's described as follows:
20+
21+
> an attempt to match one of the parsing expressions marked as HASHES² fails unless the characters it consumes are the same as the characters consumed by the (only) match of the expression marked as HASHES¹ under the same match attempt of a token-kind nonterminal.
22+
23+
This extension isn't formalised in the [appendix on PEGs].
24+
25+
It could be formalised in a similar way to the [mark/check] extension below,
26+
with the addition of some notion of a _scoping nonterminal_ which uses an empty context for its sub-attempt.
1827

1928

2029
### Pest's stack extension
@@ -60,6 +69,73 @@ All other parsing expressions leave the stack unmodified.
6069

6170
</div>
6271

72+
73+
### Mark/check extension { #mark-check }
74+
75+
This extension uses the same notation as the [corresponding nonterminal] extension.
76+
It might be described along the following lines:
77+
78+
<div class=pegs-description>
79+
80+
<div class=sketch>
81+
An attempt to match a parsing expression marked with ² fails
82+
unless the characters it consumes are the same as the characters consumed by the previous match of an expression marked as ¹.
83+
</div>
84+
85+
A formalisation of this extension in the style used in the [appendix on PEGs] is sketched below.
86+
87+
Treat ¹ and ² as operators, defining a _mark expression_ and a _check expression_ respectively.
88+
89+
Extend the characterisation of a match attempt to include a _context_, which is a sequence of matches
90+
(this formalises a notion of the matches preceding the attempt).
91+
92+
Alter the description of most kinds of expression to consider a context and use the same context for each sub-attempt,
93+
for example:
94+
95+
<div class=sketch>
96+
An attempt <var>A</var> to match a nonterminal against <var>s</var> in context <var>c</var> succeeds if and only if
97+
an attempt <var>A′</var> to match the nonterminal's expression against <var>s</var> in context <var>c</var> succeeds.
98+
</div>
99+
100+
Alter the description of sequencing expressions to use an updated context when attempting the right-hand side:
101+
102+
<div class=sketch>
103+
The outcome of an attempt <var>A</var> to match a <dfn>sequencing expression</dfn> <code><var>e₁</var> ~ <var>e₂</var></code> against <var>s</var> in context <var>c</var> is as follows:
104+
105+
- If an attempt <var>A₁</var> to match the expression <var>e₁</var> against <var>s</var> in context <var>c</var> fails,
106+
<var>A</var> fails.
107+
- Otherwise, <var>A</var> succeeds if and only if
108+
an attempt <var>A₂</var> to match <var>e₂</var> against <var>s′</var> in context <var>c′</var> succeeds,
109+
where <var>s′</var> is the sequence of characters obtained by removing the prefix consumed by <var>A₁</var> from <var>s</var>,
110+
and <var>c′</var> is <var>c</var> followed by the elaboration of <var>A₁</var>.
111+
</div>
112+
113+
Include mark expressions in the elaboration:
114+
115+
<div class=sketch>
116+
An attempt <var>A</var> to match a <dfn>mark expression</dfn> <code><var>e¹</var></code> against <var>s</var> in context <var>c</var> succeeds
117+
if and only if an attempt <var>A′</var> to match <var>e</var> against <var>s</var> in context <var>c</var> succeeds.
118+
119+
If <var>A</var> is successful,
120+
it consumes the characters consumed by <var>A′</var>
121+
and its elaboration is <var>A</var> followed by the elaboration of <var>A′</var>.
122+
</div>
123+
124+
Describe a check expression as failing unless the characters its subexpression consumes are the same as the characters consumed by the last mark expression in its context:
125+
126+
<div class=sketch>
127+
An attempt <var>A</var> to match a <dfn>check expression</dfn> <code><var>e²</var></code> against <var>s</var> in context <var>c</var> succeeds if
128+
129+
- an attempt <var>A′</var> to match <var>e</var> against <var>s</var> in context <var>c</var> succeeds; and
130+
- <var>c</var> includes at least one mark expression; and
131+
- the characters consumed by <var>A′</var> are the same as the characters consumed by the last mark expression in <var>c</var>.
132+
133+
Otherwise <var>A</var> fails.
134+
</div>
135+
136+
</div>
137+
138+
63139
### Scheme of definitions
64140

65141
Because raw string literals have a limit of 255 `#` characters,
@@ -106,6 +182,9 @@ RDQ_255_CONTENT = {
106182
107183
```
108184

185+
[appendix on PEGs]: pegs.md
186+
[mark/check]: #mark-check
187+
[corresponding nonterminal]: #corresponding-nonterminal
109188

110189
[rdql-token]: quoted_literal_tokens.html#rdql
111190
[pest-stack]: https://docs.rs/pest/2.8.0/pest/#special-rules

0 commit comments

Comments
 (0)