Skip to content

sql: paranthesize printing fixes#37145

Draft
def- wants to merge 10 commits into
MaterializeInc:mainfrom
def-:pr-sql-paranthesize-fixes
Draft

sql: paranthesize printing fixes#37145
def- wants to merge 10 commits into
MaterializeInc:mainfrom
def-:pr-sql-paranthesize-fixes

Conversation

@def-

@def- def- commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

All found via #36982

def- and others added 10 commits June 18, 2026 21:44
`AstDisplay for Expr::Cast` printed `<operand>::<type>` unconditionally,
relying on the parser having wrapped a low-precedence operand in
`Expr::Nested`. That makes the printer non-self-sufficient: any AST whose
protective `Nested` is absent — e.g. one produced by an AST transform, or a
round-trip oracle that strips cosmetic parens — prints ambiguously. For
`Cast(- 0)` (the AST of `CAST(-0 AS int4)` once the `Nested` is gone) it
emitted `- 0::int4`, which reparses as `- (0::int4)`: a structurally and
semantically different expression (`CAST(-0 AS int4)` vs `-(0::int4)`).

Fix: parenthesize the operand when it is not `prints_self_delimiting`,
mirroring the existing `Expr::Collate` arm. `Expr::Nested` is itself
self-delimiting, so parser-produced ASTs (which already carry the wrapper)
are unchanged — no output difference and no test churn; only the
otherwise-mis-printed transformed/stripped ASTs are corrected.

Found by the sql-parser grammar fuzzer.

Tests: adds `cast_operand_reparenthesized_after_nested_stripped`, which
strips the `Nested` (as a transform would) and asserts the display still
round-trips. The full datadriven suite confirms no existing cast output
changes.

Release note: Fix a SQL-printing bug where a cast applied to certain
parenthesized expressions could be displayed without the parentheses,
changing the meaning if the output was re-parsed.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
`AstDisplay for Expr::Between` wrote both bounds bare. The parser parses
each bound at `Precedence::Like`, so a bound whose top operator binds at or
below `Like` — a comparison (`=`/`<`/…), a quantified `ANY`/`ALL`, or
`AND`/`OR`/`NOT`/`IS`/`LIKE`/`IN`/`BETWEEN` — re-associates out of the
`BETWEEN` if printed without parens: `x BETWEEN a AND b = c` reparses as
`(x BETWEEN a AND b) = c`.

The parser wraps such bounds in `Expr::Nested`, so this only mis-prints ASTs
whose wrapper is absent (e.g. produced by an AST transform or a paren-
stripping round-trip). Fix: a new `write_between_bound` parenthesizes a bound
whose top operator binds at or below `Like` (the precedence boundary mirrors
`Parser::get_next_precedence`); `Expr::Nested` is not in that set, so
parser-produced ASTs keep their existing single layer of parens — no output
change and no datadriven churn.

Found by the sql-parser grammar fuzzer.

Tests: adds `between_bound_reparenthesized_after_nested_stripped`; the full
datadriven suite confirms no existing BETWEEN output changes.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The `AstDisplay` arms for binary `Op`, `And`, `Or`, `Not`, `IsExpr`,
`InList`, `InSubquery`, and `Like` wrote their operands bare, relying on the
parser having wrapped any looser-binding sub-expression in `Expr::Nested`.
That makes the printer non-self-sufficient: an AST whose protective `Nested`
is absent (produced by an AST transform, or a paren-stripping round-trip)
prints ambiguously and reparses to a different tree — e.g. `Op(*, a+b, c)`
printed `a + b * c`, `And(a OR b, c)` printed `a OR b AND c`, and
`Op(+, NOT x IS DISTINCT FROM y, c)` printed `NOT x IS DISTINCT FROM y + c`.

Make the printer precedence-aware: classify each expr's top operator
(`expr_precedence`, ranked like `Parser::Precedence`) and parenthesize an
operand that binds strictly looser than the surrounding operator (and, on the
right of a left-associative binary op, at equal precedence). `Expr::Nested`
ranks as an atom, so parser-produced ASTs — which already carry the wrapper —
are unchanged: no output difference and no datadriven churn. Only the
otherwise-mis-printed transformed/stripped ASTs are corrected. This retires
the whole operator-precedence drift class the grammar fuzzer was finding.

Found by the sql-parser grammar fuzzer.

Tests: adds `binary_op_operand_reparenthesized_after_nested_stripped`
(strips `Nested`, asserts the display round-trips for arithmetic, logical, and
mixed-precedence expressions). The full datadriven suite confirms no existing
expression output changes.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
`write_quantified_left` parenthesizes a low-precedence left operand of
`<left> op ANY/ALL (...)` so the comparison operator doesn't reach into it on
reparse. But it only matched the operand's *top* node, missing a
low-precedence expression hidden under a right-transparent prefix operator:
`Op(-, Not(InList))` (the AST of `- NOT a IN (b)`) has `Op` (unary minus) on
top, so it wasn't parenthesized, and `- NOT a IN (b) = ANY (...)` reparsed
with `= ANY` pulled inside the `NOT` (`-(NOT((a IN b) = ANY ...))`).

Peel leading prefix operators (`-`/`+`/`~`) before the set check, so the
operand the comparison would actually attach to is what's tested. As
elsewhere, parser-produced ASTs wrap such lefts in `Expr::Nested`, so this
only affects transformed/stripped ASTs — no datadriven churn.

Found by the sql-parser grammar fuzzer (surfaced only after ~355k executions,
once the earlier precedence fixes were in place).

Tests: extends `binary_op_operand_reparenthesized_after_nested_stripped` with
the prefix-over-quantified case.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The expression printer decided whether an operand needed parentheses from
the operand's *top* operator alone (`expr_precedence`). That is correct
only for precedence-consistent ASTs — ones a precedence-climbing parser
would produce, where the top operator is the loosest in the whole subtree.
For ASTs that lose their protective `Expr::Nested` (an implicit cast the
planner inserts, a rewrite, a fuzz-stripped round trip), the top operator
can be *tighter* than an operator buried down a spine, and the bare print
re-associates on reparse.

Concretely, `Op(<>, a, InSubquery(AnyExpr(b, =, ARRAY[c]), q))` printed as
`a <> b = ANY (ARRAY[c]) IN (...)`: the right operand's top is `IN` (binds
tighter than `<>`), so the old check added no parens, but its left spine
exposes `= ANY` at the same precedence as `<>`, which steals the `<>` into
its own left on reparse.

Replace the per-construct spot-checks with a single precedence model built
from two mirrored helpers:

* `right_edge` — the loosest precedence on an expression's right spine,
  following right-transparent prefixes and the right operands of binary /
  `BETWEEN` / `LIKE` / `IS DISTINCT FROM`. Used for left operands and the
  subjects of constructs that print to their right (`IS`, `IN`, `LIKE`,
  `BETWEEN`, `AND`/`OR` left, quantified-comparison left, binary left).
* `left_edge` — the mirror, the loosest precedence on the left spine.
  Used for right operands (binary right, `AND`/`OR` right, `LIKE`
  pattern/escape, `NOT` operand).

Both rank the self-delimiting forms (`(…)`, `ARRAY[…]`, a literal, a
`COLLATE`/`::`/`[…]` whose own operand is already parenthesized, an
`Expr::Nested`) as `ATOM`, so parser-produced ASTs — which wrap every
re-associating operand in `Nested` — print exactly as before. The new
parens only appear for ASTs that lost their `Nested`. The parser and
pretty-printer datadriven suites are unchanged, confirming no churn.

This subsumes the earlier spot-fixes for cast operands, `BETWEEN` bounds,
binary operands, and quantified-comparison lefts into one model.

Tests: extends `binary_op_operand_reparenthesized_after_nested_stripped`
in `sqlparser_common.rs` with the left-spine right-operand case.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
`Parser::parse_between` parses both bounds with
`parse_subexpr(Precedence::Like)`, starting fresh with nothing to the
bound's left. It therefore walks the bound's *left* spine and stops at the
first operator binding at or below `Like`, leaving that operator outside the
bound: `x BETWEEN 1 IS NULL AND y` parses `1` as the low bound and then
expects `AND` but finds `IS`.

`write_between_bound` decided parenthesization with `right_edge`, which
ranks the right-closing forms (`IS NULL`, `= ANY (…)`, `IN (…)`,
`a <op> ANY/ALL`) as `ATOM` because nothing binds into them *from the
right* — exactly the forms whose looseness lives on the left spine. So the
printer emitted them bare and the round trip broke (or silently
re-associated, e.g. a comparison bound rebinding the whole `BETWEEN` as its
left operand).

Switch the bound to `left_edge`, the mirror that walks the left spine, which
is the edge the parser actually consumes here. The `BETWEEN` subject keeps
`right_edge` (the `BETWEEN` keyword prints to its right). As with the rest of
the precedence model, parser-produced bounds are wrapped in `Expr::Nested`
(ranked `ATOM` on both edges), so the parser and pretty-printer datadriven
suites are unchanged — the new parens only appear for ASTs that lost their
`Nested`.

Tests: extends `between_bound_reparenthesized_after_nested_stripped` in
`sqlparser_common.rs` with the right-closing bound forms (`IS NULL`,
`= ANY`, `IN`, a boolean bound, and a left-spine operator buried under a
tighter top).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
`Parser::parse_is` parses the right-hand side of `IS DISTINCT FROM` with
`parse_subexpr(<IS precedence>)`, so a RHS whose left spine binds at or
below `IS` (`OR`/`AND`/another `IS`) re-associates out of the `IS` on
reparse: `a IS DISTINCT FROM b OR c` parses as `(a IS DISTINCT FROM b) OR c`.
The printer emitted the RHS bare through `IsExprConstruct`'s display, with no
precedence handling, so a `Nested`-stripped AST drifted.

This drift is invisible to a stable-string round trip — `IsExpr(a,
DistinctFrom(Or(b,c)))` and `Or(IsExpr(a, DistinctFrom(b)), c)` print to the
same string `a IS DISTINCT FROM b OR c` — which is why the round-trip fuzzer
never flagged it despite generating the construct. The bug is real: the two
ASTs are distinct and the first does not reparse to itself.

Handle `DistinctFrom` in the `IsExpr` print arm (where the precedence
helpers are in scope) and parenthesize its RHS with `left_edge` (the RHS is
parsed fresh at `IS`, walking its left spine), matching the rest of the
model. The bare-keyword constructs (`NULL`/`TRUE`/`FALSE`/`UNKNOWN`) still
go through `IsExprConstruct`'s display. Parser-produced RHSs are wrapped in
`Expr::Nested` (`ATOM`), so the datadriven suites are unchanged.

Tests: adds `is_distinct_from_rhs_reparenthesized_after_nested_stripped` to
`sqlparser_common.rs`, which checks the round trip structurally (the
stable-string collision makes a string check insufficient).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A `[I]LIKE` pattern parses at `Like` precedence and, when an `ESCAPE`
follows, sits immediately to the left of the `ESCAPE` keyword. Only
`[I]LIKE` consumes `ESCAPE`, so a `[I]LIKE` exposed on the pattern's *right*
spine steals the outer escape as its own: `x LIKE NOT (a LIKE b) ESCAPE c`,
printed bare as `x LIKE NOT a LIKE b ESCAPE c`, binds the escape onto the
inner `a LIKE b` instead of the outer `LIKE`.

The pattern's left edge was already guarded (so its left spine doesn't
re-associate out of the `LIKE`), but the escape-stealing hazard is on the
right spine and only exists when an escape is present. Add a right-edge
guard conditioned on `escape.is_some()`. The parser never builds a pattern
that is itself a trailing escapeless `[I]LIKE` under an outer escape (it
left-nests instead), so parser-produced ASTs are unaffected and the
datadriven suites are unchanged.

This drift collides under stable-string printing (both ASTs print to the
same text), so the round-trip fuzz oracle was strengthened to compare
reparsed ASTs structurally rather than by re-printed string.

Tests: adds `like_pattern_reparenthesized_after_nested_stripped` to
`sqlparser_common.rs`.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…imitation

The receiver helpers for the postfix `.field`/`.*` and `[…]` operators
whitelisted variants that are only ever produced by the parser wrapped in
`Expr::Nested`, so a `Nested`-stripped AST (an implicit rewrite, a fuzzed
round trip) printed them bare and the trailing token re-bound:

* `write_dot_receiver` treated a bare `Identifier`/`QualifiedWildcard`
  receiver as safe, but `a` followed by `.b`/`.*` prints `a.b`/`a.*`, which
  reparses as the qualified identifier `Identifier([a, b])` /
  `QualifiedWildcard([a])` instead of a field/wildcard access. The parser
  only builds those accesses over a parenthesized receiver, so it always
  wraps the name in `Nested`. Drop both from the safe set. `FieldAccess` /
  `WildcardAccess` receivers stay safe: their own printing parenthesizes a
  bare-name base (`(a).b.c`), so the chain remains self-delimiting and
  parser-shaped `(x).a.b` does not gain spurious parens.

* `write_subscript_receiver` only parenthesized keyword identifiers. A `Cast`
  receiver is unsafe because the type parser eats the following `[…]` as an
  array suffix (`(a::int4)[1]` -> `a::int4[1]` is `a` cast to `int4[]`), and a
  nested `Subscript` receiver is unsafe because consecutive `[…]` flatten into
  one node (`(a[1])[2]` -> `a[1][2]` is a single two-position subscript).
  Replace the keyword check with an explicit safe whitelist that
  parenthesizes `Cast`, nested `Subscript`, and the operator/comparison
  constructs by default.

Both are churn-free: the parser never emits these receivers bare (it eats the
`[…]` into the cast type, flattens subscripts, and wraps dotted names in
`Nested`), so the parser and pretty-printer datadriven suites are unchanged.
Found by extending the round-trip grammar fuzzer to cover subscripts, field
access, `COLLATE`, `AT TIME ZONE`, and the `position`/`extract`/`substring`
special forms.

Tests: adds `postfix_access_receiver_reparenthesized_after_nested_stripped`
to `sqlparser_common.rs`.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
`write_dot_receiver` listed `AnySubquery`/`AllSubquery` as safe `.` receivers,
but they print as `<expr> <op> ANY/ALL (<query>)` — the trailing `(<query>)`
is only a sub-part of the expression, so a following `.x`/`.*` binds to that
inner subquery rather than the whole quantified comparison. Printed bare, a
`Nested`-stripped `FieldAccess`/`WildcardAccess` over one of these re-associated
on reparse (`(a = ANY (SELECT b FROM t)).c` -> `a = ANY (SELECT b FROM t).c`),
which also surfaced inside `BETWEEN` bounds as an `Expected AND` parse error.

Drop both from the safe set so they are parenthesized. The single-primary
subquery forms (`Subquery`, `ArraySubquery`, `ListSubquery`, `MapSubquery`)
stay safe: they are one `(…)`/`ARRAY(…)` token sequence, so a trailing dot
attaches to the whole thing. The parser only builds these accesses over a
parenthesized receiver (wrapped in `Expr::Nested`), so the change is
churn-free and the datadriven suites are unchanged.

Tests: extends `postfix_access_receiver_reparenthesized_after_nested_stripped`
with the quantified-subquery receiver cases (incl. one inside a `BETWEEN`
bound).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant