sql: paranthesize printing fixes#37145
Draft
def- wants to merge 10 commits into
Draft
Conversation
`AstDisplay for Expr::Cast` printed `<operand>::<type>` unconditionally, relying on the parser having wrapped a low-precedence operand in `Expr::Nested`. That makes the printer non-self-sufficient: any AST whose protective `Nested` is absent — e.g. one produced by an AST transform, or a round-trip oracle that strips cosmetic parens — prints ambiguously. For `Cast(- 0)` (the AST of `CAST(-0 AS int4)` once the `Nested` is gone) it emitted `- 0::int4`, which reparses as `- (0::int4)`: a structurally and semantically different expression (`CAST(-0 AS int4)` vs `-(0::int4)`). Fix: parenthesize the operand when it is not `prints_self_delimiting`, mirroring the existing `Expr::Collate` arm. `Expr::Nested` is itself self-delimiting, so parser-produced ASTs (which already carry the wrapper) are unchanged — no output difference and no test churn; only the otherwise-mis-printed transformed/stripped ASTs are corrected. Found by the sql-parser grammar fuzzer. Tests: adds `cast_operand_reparenthesized_after_nested_stripped`, which strips the `Nested` (as a transform would) and asserts the display still round-trips. The full datadriven suite confirms no existing cast output changes. Release note: Fix a SQL-printing bug where a cast applied to certain parenthesized expressions could be displayed without the parentheses, changing the meaning if the output was re-parsed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
`AstDisplay for Expr::Between` wrote both bounds bare. The parser parses each bound at `Precedence::Like`, so a bound whose top operator binds at or below `Like` — a comparison (`=`/`<`/…), a quantified `ANY`/`ALL`, or `AND`/`OR`/`NOT`/`IS`/`LIKE`/`IN`/`BETWEEN` — re-associates out of the `BETWEEN` if printed without parens: `x BETWEEN a AND b = c` reparses as `(x BETWEEN a AND b) = c`. The parser wraps such bounds in `Expr::Nested`, so this only mis-prints ASTs whose wrapper is absent (e.g. produced by an AST transform or a paren- stripping round-trip). Fix: a new `write_between_bound` parenthesizes a bound whose top operator binds at or below `Like` (the precedence boundary mirrors `Parser::get_next_precedence`); `Expr::Nested` is not in that set, so parser-produced ASTs keep their existing single layer of parens — no output change and no datadriven churn. Found by the sql-parser grammar fuzzer. Tests: adds `between_bound_reparenthesized_after_nested_stripped`; the full datadriven suite confirms no existing BETWEEN output changes. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The `AstDisplay` arms for binary `Op`, `And`, `Or`, `Not`, `IsExpr`, `InList`, `InSubquery`, and `Like` wrote their operands bare, relying on the parser having wrapped any looser-binding sub-expression in `Expr::Nested`. That makes the printer non-self-sufficient: an AST whose protective `Nested` is absent (produced by an AST transform, or a paren-stripping round-trip) prints ambiguously and reparses to a different tree — e.g. `Op(*, a+b, c)` printed `a + b * c`, `And(a OR b, c)` printed `a OR b AND c`, and `Op(+, NOT x IS DISTINCT FROM y, c)` printed `NOT x IS DISTINCT FROM y + c`. Make the printer precedence-aware: classify each expr's top operator (`expr_precedence`, ranked like `Parser::Precedence`) and parenthesize an operand that binds strictly looser than the surrounding operator (and, on the right of a left-associative binary op, at equal precedence). `Expr::Nested` ranks as an atom, so parser-produced ASTs — which already carry the wrapper — are unchanged: no output difference and no datadriven churn. Only the otherwise-mis-printed transformed/stripped ASTs are corrected. This retires the whole operator-precedence drift class the grammar fuzzer was finding. Found by the sql-parser grammar fuzzer. Tests: adds `binary_op_operand_reparenthesized_after_nested_stripped` (strips `Nested`, asserts the display round-trips for arithmetic, logical, and mixed-precedence expressions). The full datadriven suite confirms no existing expression output changes. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
`write_quantified_left` parenthesizes a low-precedence left operand of `<left> op ANY/ALL (...)` so the comparison operator doesn't reach into it on reparse. But it only matched the operand's *top* node, missing a low-precedence expression hidden under a right-transparent prefix operator: `Op(-, Not(InList))` (the AST of `- NOT a IN (b)`) has `Op` (unary minus) on top, so it wasn't parenthesized, and `- NOT a IN (b) = ANY (...)` reparsed with `= ANY` pulled inside the `NOT` (`-(NOT((a IN b) = ANY ...))`). Peel leading prefix operators (`-`/`+`/`~`) before the set check, so the operand the comparison would actually attach to is what's tested. As elsewhere, parser-produced ASTs wrap such lefts in `Expr::Nested`, so this only affects transformed/stripped ASTs — no datadriven churn. Found by the sql-parser grammar fuzzer (surfaced only after ~355k executions, once the earlier precedence fixes were in place). Tests: extends `binary_op_operand_reparenthesized_after_nested_stripped` with the prefix-over-quantified case. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The expression printer decided whether an operand needed parentheses from the operand's *top* operator alone (`expr_precedence`). That is correct only for precedence-consistent ASTs — ones a precedence-climbing parser would produce, where the top operator is the loosest in the whole subtree. For ASTs that lose their protective `Expr::Nested` (an implicit cast the planner inserts, a rewrite, a fuzz-stripped round trip), the top operator can be *tighter* than an operator buried down a spine, and the bare print re-associates on reparse. Concretely, `Op(<>, a, InSubquery(AnyExpr(b, =, ARRAY[c]), q))` printed as `a <> b = ANY (ARRAY[c]) IN (...)`: the right operand's top is `IN` (binds tighter than `<>`), so the old check added no parens, but its left spine exposes `= ANY` at the same precedence as `<>`, which steals the `<>` into its own left on reparse. Replace the per-construct spot-checks with a single precedence model built from two mirrored helpers: * `right_edge` — the loosest precedence on an expression's right spine, following right-transparent prefixes and the right operands of binary / `BETWEEN` / `LIKE` / `IS DISTINCT FROM`. Used for left operands and the subjects of constructs that print to their right (`IS`, `IN`, `LIKE`, `BETWEEN`, `AND`/`OR` left, quantified-comparison left, binary left). * `left_edge` — the mirror, the loosest precedence on the left spine. Used for right operands (binary right, `AND`/`OR` right, `LIKE` pattern/escape, `NOT` operand). Both rank the self-delimiting forms (`(…)`, `ARRAY[…]`, a literal, a `COLLATE`/`::`/`[…]` whose own operand is already parenthesized, an `Expr::Nested`) as `ATOM`, so parser-produced ASTs — which wrap every re-associating operand in `Nested` — print exactly as before. The new parens only appear for ASTs that lost their `Nested`. The parser and pretty-printer datadriven suites are unchanged, confirming no churn. This subsumes the earlier spot-fixes for cast operands, `BETWEEN` bounds, binary operands, and quantified-comparison lefts into one model. Tests: extends `binary_op_operand_reparenthesized_after_nested_stripped` in `sqlparser_common.rs` with the left-spine right-operand case. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
`Parser::parse_between` parses both bounds with `parse_subexpr(Precedence::Like)`, starting fresh with nothing to the bound's left. It therefore walks the bound's *left* spine and stops at the first operator binding at or below `Like`, leaving that operator outside the bound: `x BETWEEN 1 IS NULL AND y` parses `1` as the low bound and then expects `AND` but finds `IS`. `write_between_bound` decided parenthesization with `right_edge`, which ranks the right-closing forms (`IS NULL`, `= ANY (…)`, `IN (…)`, `a <op> ANY/ALL`) as `ATOM` because nothing binds into them *from the right* — exactly the forms whose looseness lives on the left spine. So the printer emitted them bare and the round trip broke (or silently re-associated, e.g. a comparison bound rebinding the whole `BETWEEN` as its left operand). Switch the bound to `left_edge`, the mirror that walks the left spine, which is the edge the parser actually consumes here. The `BETWEEN` subject keeps `right_edge` (the `BETWEEN` keyword prints to its right). As with the rest of the precedence model, parser-produced bounds are wrapped in `Expr::Nested` (ranked `ATOM` on both edges), so the parser and pretty-printer datadriven suites are unchanged — the new parens only appear for ASTs that lost their `Nested`. Tests: extends `between_bound_reparenthesized_after_nested_stripped` in `sqlparser_common.rs` with the right-closing bound forms (`IS NULL`, `= ANY`, `IN`, a boolean bound, and a left-spine operator buried under a tighter top). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
`Parser::parse_is` parses the right-hand side of `IS DISTINCT FROM` with `parse_subexpr(<IS precedence>)`, so a RHS whose left spine binds at or below `IS` (`OR`/`AND`/another `IS`) re-associates out of the `IS` on reparse: `a IS DISTINCT FROM b OR c` parses as `(a IS DISTINCT FROM b) OR c`. The printer emitted the RHS bare through `IsExprConstruct`'s display, with no precedence handling, so a `Nested`-stripped AST drifted. This drift is invisible to a stable-string round trip — `IsExpr(a, DistinctFrom(Or(b,c)))` and `Or(IsExpr(a, DistinctFrom(b)), c)` print to the same string `a IS DISTINCT FROM b OR c` — which is why the round-trip fuzzer never flagged it despite generating the construct. The bug is real: the two ASTs are distinct and the first does not reparse to itself. Handle `DistinctFrom` in the `IsExpr` print arm (where the precedence helpers are in scope) and parenthesize its RHS with `left_edge` (the RHS is parsed fresh at `IS`, walking its left spine), matching the rest of the model. The bare-keyword constructs (`NULL`/`TRUE`/`FALSE`/`UNKNOWN`) still go through `IsExprConstruct`'s display. Parser-produced RHSs are wrapped in `Expr::Nested` (`ATOM`), so the datadriven suites are unchanged. Tests: adds `is_distinct_from_rhs_reparenthesized_after_nested_stripped` to `sqlparser_common.rs`, which checks the round trip structurally (the stable-string collision makes a string check insufficient). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A `[I]LIKE` pattern parses at `Like` precedence and, when an `ESCAPE` follows, sits immediately to the left of the `ESCAPE` keyword. Only `[I]LIKE` consumes `ESCAPE`, so a `[I]LIKE` exposed on the pattern's *right* spine steals the outer escape as its own: `x LIKE NOT (a LIKE b) ESCAPE c`, printed bare as `x LIKE NOT a LIKE b ESCAPE c`, binds the escape onto the inner `a LIKE b` instead of the outer `LIKE`. The pattern's left edge was already guarded (so its left spine doesn't re-associate out of the `LIKE`), but the escape-stealing hazard is on the right spine and only exists when an escape is present. Add a right-edge guard conditioned on `escape.is_some()`. The parser never builds a pattern that is itself a trailing escapeless `[I]LIKE` under an outer escape (it left-nests instead), so parser-produced ASTs are unaffected and the datadriven suites are unchanged. This drift collides under stable-string printing (both ASTs print to the same text), so the round-trip fuzz oracle was strengthened to compare reparsed ASTs structurally rather than by re-printed string. Tests: adds `like_pattern_reparenthesized_after_nested_stripped` to `sqlparser_common.rs`. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…imitation The receiver helpers for the postfix `.field`/`.*` and `[…]` operators whitelisted variants that are only ever produced by the parser wrapped in `Expr::Nested`, so a `Nested`-stripped AST (an implicit rewrite, a fuzzed round trip) printed them bare and the trailing token re-bound: * `write_dot_receiver` treated a bare `Identifier`/`QualifiedWildcard` receiver as safe, but `a` followed by `.b`/`.*` prints `a.b`/`a.*`, which reparses as the qualified identifier `Identifier([a, b])` / `QualifiedWildcard([a])` instead of a field/wildcard access. The parser only builds those accesses over a parenthesized receiver, so it always wraps the name in `Nested`. Drop both from the safe set. `FieldAccess` / `WildcardAccess` receivers stay safe: their own printing parenthesizes a bare-name base (`(a).b.c`), so the chain remains self-delimiting and parser-shaped `(x).a.b` does not gain spurious parens. * `write_subscript_receiver` only parenthesized keyword identifiers. A `Cast` receiver is unsafe because the type parser eats the following `[…]` as an array suffix (`(a::int4)[1]` -> `a::int4[1]` is `a` cast to `int4[]`), and a nested `Subscript` receiver is unsafe because consecutive `[…]` flatten into one node (`(a[1])[2]` -> `a[1][2]` is a single two-position subscript). Replace the keyword check with an explicit safe whitelist that parenthesizes `Cast`, nested `Subscript`, and the operator/comparison constructs by default. Both are churn-free: the parser never emits these receivers bare (it eats the `[…]` into the cast type, flattens subscripts, and wraps dotted names in `Nested`), so the parser and pretty-printer datadriven suites are unchanged. Found by extending the round-trip grammar fuzzer to cover subscripts, field access, `COLLATE`, `AT TIME ZONE`, and the `position`/`extract`/`substring` special forms. Tests: adds `postfix_access_receiver_reparenthesized_after_nested_stripped` to `sqlparser_common.rs`. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
`write_dot_receiver` listed `AnySubquery`/`AllSubquery` as safe `.` receivers, but they print as `<expr> <op> ANY/ALL (<query>)` — the trailing `(<query>)` is only a sub-part of the expression, so a following `.x`/`.*` binds to that inner subquery rather than the whole quantified comparison. Printed bare, a `Nested`-stripped `FieldAccess`/`WildcardAccess` over one of these re-associated on reparse (`(a = ANY (SELECT b FROM t)).c` -> `a = ANY (SELECT b FROM t).c`), which also surfaced inside `BETWEEN` bounds as an `Expected AND` parse error. Drop both from the safe set so they are parenthesized. The single-primary subquery forms (`Subquery`, `ArraySubquery`, `ListSubquery`, `MapSubquery`) stay safe: they are one `(…)`/`ARRAY(…)` token sequence, so a trailing dot attaches to the whole thing. The parser only builds these accesses over a parenthesized receiver (wrapped in `Expr::Nested`), so the change is churn-free and the datadriven suites are unchanged. Tests: extends `postfix_access_receiver_reparenthesized_after_nested_stripped` with the quantified-subquery receiver cases (incl. one inside a `BETWEEN` bound). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
All found via #36982