Skip to content

Commit 29d6568

Browse files
committed
docs: update CLAUDE.md with parser development patterns
- Add corpus testing commands for iterative improvement - Document AST change workflow (pattern match updates) - Add common parser patterns (lookahead, reserved keywords, dialect conflicts)
1 parent 587579e commit 29d6568

File tree

1 file changed

+51
-0
lines changed

1 file changed

+51
-0
lines changed

CLAUDE.md

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,19 @@ Tests are organized by SQL dialect in the `tests/` directory:
4646

4747
Each test file contains comprehensive parsing tests for dialect-specific syntax.
4848

49+
### Corpus Testing
50+
Corpus tests in `tests/sqlparser_corpus.rs` parse real SQL from `tests/corpus/{dialect}/` directories:
51+
```bash
52+
# Run corpus tests and track progress
53+
cargo nextest run --test sqlparser_corpus --no-fail-fast 2>&1 | grep "PASS" | wc -l
54+
55+
# Find common error patterns
56+
cargo nextest run --test sqlparser_corpus --no-fail-fast 2>&1 | grep "sql parser error:" | sort | uniq -c | sort -rn
57+
58+
# Debug specific corpus file
59+
cargo test --test sqlparser_corpus -- --nocapture dialect/category/hash
60+
```
61+
4962
## Architecture
5063

5164
### Core Components
@@ -107,6 +120,36 @@ Expression parsing uses operator precedence climbing:
107120
- `parse_infix()` - Handles binary operators based on precedence
108121
- Precedence levels defined in `get_precedence()`
109122

123+
#### Common Parser Patterns
124+
125+
**Lookahead and backtracking:**
126+
```rust
127+
if self.peek_token().token == Token::Keyword(Keyword::FOO) {
128+
self.next_token(); // consume
129+
// ... parse FOO syntax
130+
} else {
131+
self.prev_token(); // backtrack if needed
132+
}
133+
```
134+
135+
**Negative lookahead** (distinguish between similar patterns):
136+
```rust
137+
// Check for absence of keywords to detect non-keyword identifier
138+
if !matches!(self.peek_token().token, Token::Word(w) if w.keyword == Keyword::PARTITION) {
139+
// Parse as identifier, not as PARTITION keyword
140+
}
141+
```
142+
143+
**Reserved keyword lists** (`src/keywords.rs`):
144+
- `RESERVED_FOR_COLUMN_ALIAS` - Keywords that can't be column aliases in SELECT
145+
- `RESERVED_FOR_TABLE_ALIAS` - Keywords that can't be table aliases in FROM/JOIN
146+
- Add clause-level keywords (FORMAT, SETTINGS, SAMPLE) to BOTH lists to prevent incorrect alias parsing
147+
148+
**Dialect conflicts** (same keyword, different syntax):
149+
- Problem: Keyword parsed in multiple locations (e.g., SAMPLE as table factor vs SELECT clause)
150+
- Solution: Use `dialect_of!` to exclude conflicting dialects from one parsing location
151+
- Example: ClickHouse `SAMPLE n` (clause) vs Snowflake `SAMPLE (n)` (table factor) - exclude ClickHouse from table factor parsing
152+
110153
## Development Guidelines
111154

112155
### Syntax vs Semantics
@@ -125,6 +168,14 @@ Semantic analysis varies drastically between SQL dialects and is left to consume
125168
3. **Add Tests**: Write dialect-specific tests in appropriate test file
126169
4. **Consider Dialect**: Use `dialect_of!` if syntax is dialect-specific
127170

171+
#### AST Change Workflow
172+
When adding fields to AST structs, you must update ALL pattern matches:
173+
1. Add field to struct definition (e.g., `src/ast/mod.rs`, `src/ast/query.rs`)
174+
2. Update Display implementation to output new field
175+
3. Update parser to initialize new field
176+
4. Fix all test files - add `new_field: _` to pattern matches (Rust errors E0027, E0063 guide you)
177+
5. Use `cargo check` to find all locations requiring updates
178+
128179
#### Upstream Compatibility
129180
Since this is a fork of apache/datafusion-sqlparser-rs:
130181
- Avoid creating new AST node types when possible

0 commit comments

Comments
 (0)