Skip to content

Commit 08b9a84

Browse files
committed
Update Syntax Guide and Add Limitations Subsection
1 parent 4592e58 commit 08b9a84

File tree

1 file changed

+34
-10
lines changed

1 file changed

+34
-10
lines changed

README.md

Lines changed: 34 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -123,19 +123,43 @@ npm run zkapp
123123

124124
## Raw Regex Syntax Guide
125125

126-
- **Alteration:** The | character can be used to denote alternation between two expressions. For example: A|B.
127-
- **Concatenation:** Expressions can be concatenated together to form a new expression. For example: ABC.
128-
- **One or More:** The + character can be used to indicate that the preceding expression must occur one or more times. For example: A+.
129-
- **Zero or More:** The _ character can be used to indicate that the preceding expression can occur zero or more times. For example: A_.
130-
- **Optional:** The ? character can be used to indicate that the preceding expression is optional. For example: A?.
131-
- Note: The optional character is not accepted, as the compiler throws an error stating 'Accept nodes length must be exactly 1'.
132-
- **ORing (Character/Number Classes):** Expressions can be ORed together using brackets and the | character to form character or number classes. For example: [ABC] or [345].
133-
- **Ranges:** Ranges of characters or numbers can be defined using brackets and the - character. For example: [0-9] or [a-z].
134-
- **Grouping:** Allows treating multiple characters or patterns as a single unit. This is useful for applying quantifiers or operators to multiple characters or patterns at once. For example, (ab)+ would match one or more occurrences of the sequence "ab".
135-
- **Negation**: The ^ character can be used to negate characters or ranges within character classes. For example, [^aeiou] matches any character that is not a vowel.
126+
- **Alteration:** The `|` character can be used to denote alternation between two expressions. For example: `A|B`.
127+
- **Concatenation:** Expressions can be concatenated together to form a new expression. For example: `ABC`.
128+
- **One or More:** The `+` character can be used to indicate that the preceding expression must occur one or more times. For example: `A+`.
129+
- **Grouping:** Allows treating multiple characters or patterns as a single unit. This is useful for applying quantifiers or operators to multiple characters or patterns at once. For example, `(ab)+` would match one or more occurrences of the sequence `ab`.
130+
- **ORing (Character/Number Classes):** Expressions can be ORed together using brackets and the `|` character to form character or number classes. For example: `(four|4)`.
131+
- **Ranges:** Ranges of characters or numbers can be defined using brackets and the `-` character. For example: `[0-9]` or `[a-z]`.
132+
- Specific ranges of digits or alphabets are supported, such as `[D-S]` or `[4-8]`.
133+
- It is also possible to combine ranges within the same brackets, for example, `[f-sA-N6-8]`.
134+
- **Negation**: The `^` character can be used to negate characters or ranges within character classes. For example, `[^aeiou]` matches any character that is not a vowel.
135+
- **Repetition:** The `{m}` syntax allows you to specify that a character or group must appear exactly `m` times.
136+
137+
- For example, `a{3}` matches exactly three `a` characters in a row, so it would match `aaa` but not `aa` or `aaaa`.
138+
- `\d{3}` matches exactly three digits, such as `123` or `456`.
139+
140+
- **Meta Character Support**:
141+
- `\w`: ANY ONE word character. For ASCII, word characters are `[a-zA-Z0-9_]`
142+
- `\W`: ANY ONE **non**-word character. For ASCII, word characters are `[a-zA-Z0-9_]`
143+
- `\d`: ANY ONE digit character. Digits are `[0-9]` for digits
144+
- `\D`: ANY ONE **non**-digit character. Digits are `[0-9]` for digits
145+
- `\s`: ANY ONE space character. For ASCII, whitespace characters are `[\n\r\t\v\f]`
146+
- `\S`: ANY ONE **non**-space character. For ASCII, whitespace characters are `[\n\r\t\v\f]`
136147

137148
For more details, you can visit this amazing [ZK Regex Tools](https://zkregex.com/min_dfa) website.
138149

150+
### Limitations
151+
152+
The regular expressions supported by the zk-regex compiler have the following limitations:
153+
154+
- Regular expressions that, when converted to DFA, have multiple accepting states are **not** supported.
155+
- Regular expressions that, when converted to DFA (Deterministic Finite
156+
Automaton), include transitions to the initial state are **not** supported such as:
157+
- `*`: zero or more (0+), e.g., [0-9]\* matches zero or more digits. It accepts all those in [0-9]+ plus the empty string.
158+
- `?`: zero or one (optional), e.g., [+-]? matches an optional "+", "-", or an empty string.
159+
- Laziness or _Curb Greediness for Repetition Operators_ are **not** supported
160+
- `*?`, `+?`, `??`, `{m,n}?`, `{m,}?`
161+
- Position Anchors (does not match character, but position such as start-of-line, end-of-line, start-of-word and end-of-word) are **not** supported.
162+
139163
## ZK Regex Workflow
140164

141165
- **Raw Regex:** Begin with the raw regular expression provided by the user. This expression may contain shorthand notations, special characters, and other syntactic elements.

0 commit comments

Comments
 (0)