You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+34-10Lines changed: 34 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -123,19 +123,43 @@ npm run zkapp
123
123
124
124
## Raw Regex Syntax Guide
125
125
126
-
-**Alteration:** The | character can be used to denote alternation between two expressions. For example: A|B.
127
-
-**Concatenation:** Expressions can be concatenated together to form a new expression. For example: ABC.
128
-
-**One or More:** The + character can be used to indicate that the preceding expression must occur one or more times. For example: A+.
129
-
-**Zero or More:** The _ character can be used to indicate that the preceding expression can occur zero or more times. For example: A_.
130
-
-**Optional:** The ? character can be used to indicate that the preceding expression is optional. For example: A?.
131
-
- Note: The optional character is not accepted, as the compiler throws an error stating 'Accept nodes length must be exactly 1'.
132
-
-**ORing (Character/Number Classes):** Expressions can be ORed together using brackets and the | character to form character or number classes. For example: [ABC] or [345].
133
-
-**Ranges:** Ranges of characters or numbers can be defined using brackets and the - character. For example: [0-9] or [a-z].
134
-
-**Grouping:** Allows treating multiple characters or patterns as a single unit. This is useful for applying quantifiers or operators to multiple characters or patterns at once. For example, (ab)+ would match one or more occurrences of the sequence "ab".
135
-
-**Negation**: The ^ character can be used to negate characters or ranges within character classes. For example, [^aeiou] matches any character that is not a vowel.
126
+
-**Alteration:** The `|` character can be used to denote alternation between two expressions. For example: `A|B`.
127
+
-**Concatenation:** Expressions can be concatenated together to form a new expression. For example: `ABC`.
128
+
-**One or More:** The `+` character can be used to indicate that the preceding expression must occur one or more times. For example: `A+`.
129
+
-**Grouping:** Allows treating multiple characters or patterns as a single unit. This is useful for applying quantifiers or operators to multiple characters or patterns at once. For example, `(ab)+` would match one or more occurrences of the sequence `ab`.
130
+
-**ORing (Character/Number Classes):** Expressions can be ORed together using brackets and the `|` character to form character or number classes. For example: `(four|4)`.
131
+
-**Ranges:** Ranges of characters or numbers can be defined using brackets and the `-` character. For example: `[0-9]` or `[a-z]`.
132
+
- Specific ranges of digits or alphabets are supported, such as `[D-S]` or `[4-8]`.
133
+
- It is also possible to combine ranges within the same brackets, for example, `[f-sA-N6-8]`.
134
+
-**Negation**: The `^` character can be used to negate characters or ranges within character classes. For example, `[^aeiou]` matches any character that is not a vowel.
135
+
-**Repetition:** The `{m}` syntax allows you to specify that a character or group must appear exactly `m` times.
136
+
137
+
- For example, `a{3}` matches exactly three `a` characters in a row, so it would match `aaa` but not `aa` or `aaaa`.
138
+
-`\d{3}` matches exactly three digits, such as `123` or `456`.
139
+
140
+
-**Meta Character Support**:
141
+
-`\w`: ANY ONE word character. For ASCII, word characters are `[a-zA-Z0-9_]`
142
+
-`\W`: ANY ONE **non**-word character. For ASCII, word characters are `[a-zA-Z0-9_]`
143
+
-`\d`: ANY ONE digit character. Digits are `[0-9]` for digits
144
+
-`\D`: ANY ONE **non**-digit character. Digits are `[0-9]` for digits
145
+
-`\s`: ANY ONE space character. For ASCII, whitespace characters are `[\n\r\t\v\f]`
146
+
-`\S`: ANY ONE **non**-space character. For ASCII, whitespace characters are `[\n\r\t\v\f]`
136
147
137
148
For more details, you can visit this amazing [ZK Regex Tools](https://zkregex.com/min_dfa) website.
138
149
150
+
### Limitations
151
+
152
+
The regular expressions supported by the zk-regex compiler have the following limitations:
153
+
154
+
- Regular expressions that, when converted to DFA, have multiple accepting states are **not** supported.
155
+
- Regular expressions that, when converted to DFA (Deterministic Finite
156
+
Automaton), include transitions to the initial state are **not** supported such as:
157
+
-`*`: zero or more (0+), e.g., [0-9]\* matches zero or more digits. It accepts all those in [0-9]+ plus the empty string.
158
+
-`?`: zero or one (optional), e.g., [+-]? matches an optional "+", "-", or an empty string.
159
+
- Laziness or _Curb Greediness for Repetition Operators_ are **not** supported
160
+
-`*?`, `+?`, `??`, `{m,n}?`, `{m,}?`
161
+
- Position Anchors (does not match character, but position such as start-of-line, end-of-line, start-of-word and end-of-word) are **not** supported.
162
+
139
163
## ZK Regex Workflow
140
164
141
165
-**Raw Regex:** Begin with the raw regular expression provided by the user. This expression may contain shorthand notations, special characters, and other syntactic elements.
0 commit comments