sql: paranthesize printing fixes by def- · Pull Request #37145 · MaterializeInc/materialize

def- · 2026-06-18T21:46:16Z

All found via #36982

`AstDisplay for Expr::Cast` printed `<operand>::<type>` unconditionally, relying on the parser having wrapped a low-precedence operand in `Expr::Nested`. That makes the printer non-self-sufficient: any AST whose protective `Nested` is absent — e.g. one produced by an AST transform, or a round-trip oracle that strips cosmetic parens — prints ambiguously. For `Cast(- 0)` (the AST of `CAST(-0 AS int4)` once the `Nested` is gone) it emitted `- 0::int4`, which reparses as `- (0::int4)`: a structurally and semantically different expression (`CAST(-0 AS int4)` vs `-(0::int4)`). Fix: parenthesize the operand when it is not `prints_self_delimiting`, mirroring the existing `Expr::Collate` arm. `Expr::Nested` is itself self-delimiting, so parser-produced ASTs (which already carry the wrapper) are unchanged — no output difference and no test churn; only the otherwise-mis-printed transformed/stripped ASTs are corrected. Found by the sql-parser grammar fuzzer. Tests: adds `cast_operand_reparenthesized_after_nested_stripped`, which strips the `Nested` (as a transform would) and asserts the display still round-trips. The full datadriven suite confirms no existing cast output changes. Release note: Fix a SQL-printing bug where a cast applied to certain parenthesized expressions could be displayed without the parentheses, changing the meaning if the output was re-parsed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

`AstDisplay for Expr::Between` wrote both bounds bare. The parser parses each bound at `Precedence::Like`, so a bound whose top operator binds at or below `Like` — a comparison (`=`/`<`/…), a quantified `ANY`/`ALL`, or `AND`/`OR`/`NOT`/`IS`/`LIKE`/`IN`/`BETWEEN` — re-associates out of the `BETWEEN` if printed without parens: `x BETWEEN a AND b = c` reparses as `(x BETWEEN a AND b) = c`. The parser wraps such bounds in `Expr::Nested`, so this only mis-prints ASTs whose wrapper is absent (e.g. produced by an AST transform or a paren- stripping round-trip). Fix: a new `write_between_bound` parenthesizes a bound whose top operator binds at or below `Like` (the precedence boundary mirrors `Parser::get_next_precedence`); `Expr::Nested` is not in that set, so parser-produced ASTs keep their existing single layer of parens — no output change and no datadriven churn. Found by the sql-parser grammar fuzzer. Tests: adds `between_bound_reparenthesized_after_nested_stripped`; the full datadriven suite confirms no existing BETWEEN output changes. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The `AstDisplay` arms for binary `Op`, `And`, `Or`, `Not`, `IsExpr`, `InList`, `InSubquery`, and `Like` wrote their operands bare, relying on the parser having wrapped any looser-binding sub-expression in `Expr::Nested`. That makes the printer non-self-sufficient: an AST whose protective `Nested` is absent (produced by an AST transform, or a paren-stripping round-trip) prints ambiguously and reparses to a different tree — e.g. `Op(*, a+b, c)` printed `a + b * c`, `And(a OR b, c)` printed `a OR b AND c`, and `Op(+, NOT x IS DISTINCT FROM y, c)` printed `NOT x IS DISTINCT FROM y + c`. Make the printer precedence-aware: classify each expr's top operator (`expr_precedence`, ranked like `Parser::Precedence`) and parenthesize an operand that binds strictly looser than the surrounding operator (and, on the right of a left-associative binary op, at equal precedence). `Expr::Nested` ranks as an atom, so parser-produced ASTs — which already carry the wrapper — are unchanged: no output difference and no datadriven churn. Only the otherwise-mis-printed transformed/stripped ASTs are corrected. This retires the whole operator-precedence drift class the grammar fuzzer was finding. Found by the sql-parser grammar fuzzer. Tests: adds `binary_op_operand_reparenthesized_after_nested_stripped` (strips `Nested`, asserts the display round-trips for arithmetic, logical, and mixed-precedence expressions). The full datadriven suite confirms no existing expression output changes. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

`write_quantified_left` parenthesizes a low-precedence left operand of `<left> op ANY/ALL (...)` so the comparison operator doesn't reach into it on reparse. But it only matched the operand's *top* node, missing a low-precedence expression hidden under a right-transparent prefix operator: `Op(-, Not(InList))` (the AST of `- NOT a IN (b)`) has `Op` (unary minus) on top, so it wasn't parenthesized, and `- NOT a IN (b) = ANY (...)` reparsed with `= ANY` pulled inside the `NOT` (`-(NOT((a IN b) = ANY ...))`). Peel leading prefix operators (`-`/`+`/`~`) before the set check, so the operand the comparison would actually attach to is what's tested. As elsewhere, parser-produced ASTs wrap such lefts in `Expr::Nested`, so this only affects transformed/stripped ASTs — no datadriven churn. Found by the sql-parser grammar fuzzer (surfaced only after ~355k executions, once the earlier precedence fixes were in place). Tests: extends `binary_op_operand_reparenthesized_after_nested_stripped` with the prefix-over-quantified case. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The expression printer decided whether an operand needed parentheses from the operand's *top* operator alone (`expr_precedence`). That is correct only for precedence-consistent ASTs — ones a precedence-climbing parser would produce, where the top operator is the loosest in the whole subtree. For ASTs that lose their protective `Expr::Nested` (an implicit cast the planner inserts, a rewrite, a fuzz-stripped round trip), the top operator can be *tighter* than an operator buried down a spine, and the bare print re-associates on reparse. Concretely, `Op(<>, a, InSubquery(AnyExpr(b, =, ARRAY[c]), q))` printed as `a <> b = ANY (ARRAY[c]) IN (...)`: the right operand's top is `IN` (binds tighter than `<>`), so the old check added no parens, but its left spine exposes `= ANY` at the same precedence as `<>`, which steals the `<>` into its own left on reparse. Replace the per-construct spot-checks with a single precedence model built from two mirrored helpers: * `right_edge` — the loosest precedence on an expression's right spine, following right-transparent prefixes and the right operands of binary / `BETWEEN` / `LIKE` / `IS DISTINCT FROM`. Used for left operands and the subjects of constructs that print to their right (`IS`, `IN`, `LIKE`, `BETWEEN`, `AND`/`OR` left, quantified-comparison left, binary left). * `left_edge` — the mirror, the loosest precedence on the left spine. Used for right operands (binary right, `AND`/`OR` right, `LIKE` pattern/escape, `NOT` operand). Both rank the self-delimiting forms (`(…)`, `ARRAY[…]`, a literal, a `COLLATE`/`::`/`[…]` whose own operand is already parenthesized, an `Expr::Nested`) as `ATOM`, so parser-produced ASTs — which wrap every re-associating operand in `Nested` — print exactly as before. The new parens only appear for ASTs that lost their `Nested`. The parser and pretty-printer datadriven suites are unchanged, confirming no churn. This subsumes the earlier spot-fixes for cast operands, `BETWEEN` bounds, binary operands, and quantified-comparison lefts into one model. Tests: extends `binary_op_operand_reparenthesized_after_nested_stripped` in `sqlparser_common.rs` with the left-spine right-operand case. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

`Parser::parse_between` parses both bounds with `parse_subexpr(Precedence::Like)`, starting fresh with nothing to the bound's left. It therefore walks the bound's *left* spine and stops at the first operator binding at or below `Like`, leaving that operator outside the bound: `x BETWEEN 1 IS NULL AND y` parses `1` as the low bound and then expects `AND` but finds `IS`. `write_between_bound` decided parenthesization with `right_edge`, which ranks the right-closing forms (`IS NULL`, `= ANY (…)`, `IN (…)`, `a <op> ANY/ALL`) as `ATOM` because nothing binds into them *from the right* — exactly the forms whose looseness lives on the left spine. So the printer emitted them bare and the round trip broke (or silently re-associated, e.g. a comparison bound rebinding the whole `BETWEEN` as its left operand). Switch the bound to `left_edge`, the mirror that walks the left spine, which is the edge the parser actually consumes here. The `BETWEEN` subject keeps `right_edge` (the `BETWEEN` keyword prints to its right). As with the rest of the precedence model, parser-produced bounds are wrapped in `Expr::Nested` (ranked `ATOM` on both edges), so the parser and pretty-printer datadriven suites are unchanged — the new parens only appear for ASTs that lost their `Nested`. Tests: extends `between_bound_reparenthesized_after_nested_stripped` in `sqlparser_common.rs` with the right-closing bound forms (`IS NULL`, `= ANY`, `IN`, a boolean bound, and a left-spine operator buried under a tighter top). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

`Parser::parse_is` parses the right-hand side of `IS DISTINCT FROM` with `parse_subexpr(<IS precedence>)`, so a RHS whose left spine binds at or below `IS` (`OR`/`AND`/another `IS`) re-associates out of the `IS` on reparse: `a IS DISTINCT FROM b OR c` parses as `(a IS DISTINCT FROM b) OR c`. The printer emitted the RHS bare through `IsExprConstruct`'s display, with no precedence handling, so a `Nested`-stripped AST drifted. This drift is invisible to a stable-string round trip — `IsExpr(a, DistinctFrom(Or(b,c)))` and `Or(IsExpr(a, DistinctFrom(b)), c)` print to the same string `a IS DISTINCT FROM b OR c` — which is why the round-trip fuzzer never flagged it despite generating the construct. The bug is real: the two ASTs are distinct and the first does not reparse to itself. Handle `DistinctFrom` in the `IsExpr` print arm (where the precedence helpers are in scope) and parenthesize its RHS with `left_edge` (the RHS is parsed fresh at `IS`, walking its left spine), matching the rest of the model. The bare-keyword constructs (`NULL`/`TRUE`/`FALSE`/`UNKNOWN`) still go through `IsExprConstruct`'s display. Parser-produced RHSs are wrapped in `Expr::Nested` (`ATOM`), so the datadriven suites are unchanged. Tests: adds `is_distinct_from_rhs_reparenthesized_after_nested_stripped` to `sqlparser_common.rs`, which checks the round trip structurally (the stable-string collision makes a string check insufficient). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

A `[I]LIKE` pattern parses at `Like` precedence and, when an `ESCAPE` follows, sits immediately to the left of the `ESCAPE` keyword. Only `[I]LIKE` consumes `ESCAPE`, so a `[I]LIKE` exposed on the pattern's *right* spine steals the outer escape as its own: `x LIKE NOT (a LIKE b) ESCAPE c`, printed bare as `x LIKE NOT a LIKE b ESCAPE c`, binds the escape onto the inner `a LIKE b` instead of the outer `LIKE`. The pattern's left edge was already guarded (so its left spine doesn't re-associate out of the `LIKE`), but the escape-stealing hazard is on the right spine and only exists when an escape is present. Add a right-edge guard conditioned on `escape.is_some()`. The parser never builds a pattern that is itself a trailing escapeless `[I]LIKE` under an outer escape (it left-nests instead), so parser-produced ASTs are unaffected and the datadriven suites are unchanged. This drift collides under stable-string printing (both ASTs print to the same text), so the round-trip fuzz oracle was strengthened to compare reparsed ASTs structurally rather than by re-printed string. Tests: adds `like_pattern_reparenthesized_after_nested_stripped` to `sqlparser_common.rs`. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…imitation The receiver helpers for the postfix `.field`/`.*` and `[…]` operators whitelisted variants that are only ever produced by the parser wrapped in `Expr::Nested`, so a `Nested`-stripped AST (an implicit rewrite, a fuzzed round trip) printed them bare and the trailing token re-bound: * `write_dot_receiver` treated a bare `Identifier`/`QualifiedWildcard` receiver as safe, but `a` followed by `.b`/`.*` prints `a.b`/`a.*`, which reparses as the qualified identifier `Identifier([a, b])` / `QualifiedWildcard([a])` instead of a field/wildcard access. The parser only builds those accesses over a parenthesized receiver, so it always wraps the name in `Nested`. Drop both from the safe set. `FieldAccess` / `WildcardAccess` receivers stay safe: their own printing parenthesizes a bare-name base (`(a).b.c`), so the chain remains self-delimiting and parser-shaped `(x).a.b` does not gain spurious parens. * `write_subscript_receiver` only parenthesized keyword identifiers. A `Cast` receiver is unsafe because the type parser eats the following `[…]` as an array suffix (`(a::int4)[1]` -> `a::int4[1]` is `a` cast to `int4[]`), and a nested `Subscript` receiver is unsafe because consecutive `[…]` flatten into one node (`(a[1])[2]` -> `a[1][2]` is a single two-position subscript). Replace the keyword check with an explicit safe whitelist that parenthesizes `Cast`, nested `Subscript`, and the operator/comparison constructs by default. Both are churn-free: the parser never emits these receivers bare (it eats the `[…]` into the cast type, flattens subscripts, and wraps dotted names in `Nested`), so the parser and pretty-printer datadriven suites are unchanged. Found by extending the round-trip grammar fuzzer to cover subscripts, field access, `COLLATE`, `AT TIME ZONE`, and the `position`/`extract`/`substring` special forms. Tests: adds `postfix_access_receiver_reparenthesized_after_nested_stripped` to `sqlparser_common.rs`. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

`write_dot_receiver` listed `AnySubquery`/`AllSubquery` as safe `.` receivers, but they print as `<expr> <op> ANY/ALL (<query>)` — the trailing `(<query>)` is only a sub-part of the expression, so a following `.x`/`.*` binds to that inner subquery rather than the whole quantified comparison. Printed bare, a `Nested`-stripped `FieldAccess`/`WildcardAccess` over one of these re-associated on reparse (`(a = ANY (SELECT b FROM t)).c` -> `a = ANY (SELECT b FROM t).c`), which also surfaced inside `BETWEEN` bounds as an `Expected AND` parse error. Drop both from the safe set so they are parenthesized. The single-primary subquery forms (`Subquery`, `ArraySubquery`, `ListSubquery`, `MapSubquery`) stay safe: they are one `(…)`/`ARRAY(…)` token sequence, so a trailing dot attaches to the whole thing. The parser only builds these accesses over a parenthesized receiver (wrapped in `Expr::Nested`), so the change is churn-free and the datadriven suites are unchanged. Tests: extends `postfix_access_receiver_reparenthesized_after_nested_stripped` with the quantified-subquery receiver cases (incl. one inside a `BETWEEN` bound). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

def- and others added 10 commits June 18, 2026 21:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: paranthesize printing fixes#37145

sql: paranthesize printing fixes#37145
def- wants to merge 10 commits into
MaterializeInc:mainfrom
def-:pr-sql-paranthesize-fixes

def- commented Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

def- commented Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant