Describe the bug
The unwrap_cast optimization (in both the logical ExprSimplifier and the physical simplifier) still produces incorrect query results for narrowing casts even after #22837. #22837 fixed the specific case of timestamp precision narrowing, but the same class of bug remains for other many-to-one casts, most notably decimal scale narrowing.
This ticket tracks the general problem and the remaining decimal fix
Reproducer (decimal, still broken after #22837)
CREATE TABLE t(d DECIMAL(20,4)) AS VALUES (1.2345), (1.2399), (1.2500);
-- The scale-narrowing cast rounds (half-up):
-- 1.2345 -> 1.23
-- 1.2399 -> 1.24
-- 1.2500 -> 1.25
SELECT d, arrow_cast(d, 'Decimal128(20, 2)') AS d2 FROM t;
+--------+------+
| d | d2 |
+--------+------+
| 1.2345 | 1.23 |
| 1.2399 | 1.24 |
| 1.2500 | 1.25 |
+--------+------+
-- Only 1.2345 rounds to 1.23, so this MUST return 1 row:
SELECT * FROM t
WHERE arrow_cast(d, 'Decimal128(20, 2)') = arrow_cast(1.23, 'Decimal128(20, 2)');
-- predicate is rewritten to `d = 1.2300` (exact) -> returns 0 rows. WRONG (should be 1 row).
+---+
| d |
+---+
+---+
0 row(s) fetched.
The original predicate CAST(d AS Decimal128(20,2)) = 1.23 means "any d that rounds to 1.23 at scale 2" — a range of source values, the half-open interval [1.225, 1.235). That range contains 1.2345 but not the exact value 1.2300, so the rewrite to d = 1.2300 wrongly returns 0 rows instead of 1.
Background / summary
The unwrap_cast optimization rewrites predicates of the form:
CAST(col AS target_type) <op> literal_target
--> col <op> try_cast_literal_to_type(literal_target, col_type)
This is only valid when the cast applied to the column is one-to-one. When the column cast is many-to-one, multiple distinct source values map to the same target value, so an exact-equality rewrite on the source column is semantically wrong — the correct source-domain preimage is a range, not a singleton.
This matches Spark's UnwrapCastInBinaryComparison, which has a two-layer check (see this comment for details):
- Layer 1 — a check on the data types (is this cast direction safe to unwrap at all?)
- Layer 2 — a check that the literal can be cast losslessly
DataFusion currently only has the equivalent of Layer 2 (try_cast_literal_to_type) and is missing the general Layer 1 type-direction check. #22837 added a narrow Layer-1-style guard for timestamp precision narrowing only; this ticket tracks the general fix.
Why this doesn't blow up more often
The optimization mostly stays correct today by accident, because it leans on try_cast_literal_to_type being conservative. That helper is a restricted, value-preserving cast of the literal that returns None when the value can't be represented exactly in the source type (out of range, decimal digits lost, etc.). For many lossy casts the literal simply fails to round-trip, try_cast_literal_to_type returns None, and the unwrap is silently skipped — so no wrong answer.
The bug surfaces precisely in the cases where:
- the literal does round-trip exactly into the source type (so Layer 2 passes), but
- the column-side cast is many-to-one, so exact equality on the source column is still wrong.
Decimal scale narrowing is exactly this trap: casting the literal 1.23 (scale 2) up to scale 4 → 1.2300 is exact and value-preserving, so Layer 2 happily passes — but CAST(d AS Decimal(20,2)) on the column is a rounding, many-to-one cast, so the rewrite is wrong. Timestamp precision narrowing (now fixed) had the same shape.
So the right fix is a Layer-1 type-direction check (or a preimage-based rewrite) rather than relying on try_cast_literal_to_type alone.
Proposed approaches (PRs from @discord9)
Acceptance criteria
Describe the bug
The
unwrap_castoptimization (in both the logicalExprSimplifierand the physical simplifier) still produces incorrect query results for narrowing casts even after #22837. #22837 fixed the specific case of timestamp precision narrowing, but the same class of bug remains for other many-to-one casts, most notably decimal scale narrowing.This ticket tracks the general problem and the remaining decimal fix
Reproducer (decimal, still broken after #22837)
The original predicate
CAST(d AS Decimal128(20,2)) = 1.23means "anydthat rounds to 1.23 at scale 2" — a range of source values, the half-open interval[1.225, 1.235). That range contains1.2345but not the exact value1.2300, so the rewrite tod = 1.2300wrongly returns 0 rows instead of 1.Background / summary
The
unwrap_castoptimization rewrites predicates of the form:This is only valid when the cast applied to the column is one-to-one. When the column cast is many-to-one, multiple distinct source values map to the same target value, so an exact-equality rewrite on the source column is semantically wrong — the correct source-domain preimage is a range, not a singleton.
This matches Spark's
UnwrapCastInBinaryComparison, which has a two-layer check (see this comment for details):DataFusion currently only has the equivalent of Layer 2 (
try_cast_literal_to_type) and is missing the general Layer 1 type-direction check. #22837 added a narrow Layer-1-style guard for timestamp precision narrowing only; this ticket tracks the general fix.Why this doesn't blow up more often
The optimization mostly stays correct today by accident, because it leans on
try_cast_literal_to_typebeing conservative. That helper is a restricted, value-preserving cast of the literal that returnsNonewhen the value can't be represented exactly in the source type (out of range, decimal digits lost, etc.). For many lossy casts the literal simply fails to round-trip,try_cast_literal_to_typereturnsNone, and the unwrap is silently skipped — so no wrong answer.The bug surfaces precisely in the cases where:
Decimal scale narrowing is exactly this trap: casting the literal
1.23(scale 2) up to scale 4 →1.2300is exact and value-preserving, so Layer 2 happily passes — butCAST(d AS Decimal(20,2))on the column is a rounding, many-to-one cast, so the rewrite is wrong. Timestamp precision narrowing (now fixed) had the same shape.So the right fix is a Layer-1 type-direction check (or a preimage-based rewrite) rather than relying on
try_cast_literal_to_typealone.Proposed approaches (PRs from @discord9)
udf_preimage): represents the source-domain preimage asExact(ScalarValue)orRange(Interval), so many-to-one casts (timestamp/decimal narrowing) can be rewritten correctly into a half-open range (col >= lo AND col < hi) instead of being either blocked or wrongly collapsed to a singleton. Makes exact rewrites closed-by-default.Acceptance criteria