@@ -46,6 +46,19 @@ Tests are organized by SQL dialect in the `tests/` directory:
4646
4747Each test file contains comprehensive parsing tests for dialect-specific syntax.
4848
49+ ### Corpus Testing
50+ Corpus tests in ` tests/sqlparser_corpus.rs ` parse real SQL from ` tests/corpus/{dialect}/ ` directories:
51+ ``` bash
52+ # Run corpus tests and track progress
53+ cargo nextest run --test sqlparser_corpus --no-fail-fast 2>&1 | grep " PASS" | wc -l
54+
55+ # Find common error patterns
56+ cargo nextest run --test sqlparser_corpus --no-fail-fast 2>&1 | grep " sql parser error:" | sort | uniq -c | sort -rn
57+
58+ # Debug specific corpus file
59+ cargo test --test sqlparser_corpus -- --nocapture dialect/category/hash
60+ ```
61+
4962## Architecture
5063
5164### Core Components
@@ -107,6 +120,36 @@ Expression parsing uses operator precedence climbing:
107120- ` parse_infix() ` - Handles binary operators based on precedence
108121- Precedence levels defined in ` get_precedence() `
109122
123+ #### Common Parser Patterns
124+
125+ ** Lookahead and backtracking:**
126+ ``` rust
127+ if self . peek_token (). token == Token :: Keyword (Keyword :: FOO ) {
128+ self . next_token (); // consume
129+ // ... parse FOO syntax
130+ } else {
131+ self . prev_token (); // backtrack if needed
132+ }
133+ ```
134+
135+ ** Negative lookahead** (distinguish between similar patterns):
136+ ``` rust
137+ // Check for absence of keywords to detect non-keyword identifier
138+ if ! matches! (self . peek_token (). token, Token :: Word (w ) if w . keyword == Keyword :: PARTITION ) {
139+ // Parse as identifier, not as PARTITION keyword
140+ }
141+ ```
142+
143+ ** Reserved keyword lists** (` src/keywords.rs ` ):
144+ - ` RESERVED_FOR_COLUMN_ALIAS ` - Keywords that can't be column aliases in SELECT
145+ - ` RESERVED_FOR_TABLE_ALIAS ` - Keywords that can't be table aliases in FROM/JOIN
146+ - Add clause-level keywords (FORMAT, SETTINGS, SAMPLE) to BOTH lists to prevent incorrect alias parsing
147+
148+ ** Dialect conflicts** (same keyword, different syntax):
149+ - Problem: Keyword parsed in multiple locations (e.g., SAMPLE as table factor vs SELECT clause)
150+ - Solution: Use ` dialect_of! ` to exclude conflicting dialects from one parsing location
151+ - Example: ClickHouse ` SAMPLE n ` (clause) vs Snowflake ` SAMPLE (n) ` (table factor) - exclude ClickHouse from table factor parsing
152+
110153## Development Guidelines
111154
112155### Syntax vs Semantics
@@ -125,6 +168,14 @@ Semantic analysis varies drastically between SQL dialects and is left to consume
1251683 . ** Add Tests** : Write dialect-specific tests in appropriate test file
1261694 . ** Consider Dialect** : Use ` dialect_of! ` if syntax is dialect-specific
127170
171+ #### AST Change Workflow
172+ When adding fields to AST structs, you must update ALL pattern matches:
173+ 1 . Add field to struct definition (e.g., ` src/ast/mod.rs ` , ` src/ast/query.rs ` )
174+ 2 . Update Display implementation to output new field
175+ 3 . Update parser to initialize new field
176+ 4 . Fix all test files - add ` new_field: _ ` to pattern matches (Rust errors E0027, E0063 guide you)
177+ 5 . Use ` cargo check ` to find all locations requiring updates
178+
128179#### Upstream Compatibility
129180Since this is a fork of apache/datafusion-sqlparser-rs:
130181- Avoid creating new AST node types when possible
0 commit comments