Skip to content

fix: coerce filter expressions before simplifying#6935

Open
ddupg wants to merge 3 commits into
lance-format:mainfrom
ddupg:bug/filter-infer-type
Open

fix: coerce filter expressions before simplifying#6935
ddupg wants to merge 3 commits into
lance-format:mainfrom
ddupg:bug/filter-infer-type

Conversation

@ddupg
Copy link
Copy Markdown
Contributor

@ddupg ddupg commented May 25, 2026

Summary

Fix Planner::optimize_expr so expression type coercion runs before simplification. This matches DataFusion's normal plan pipeline where analyzer type coercion runs before optimizer simplification.

This matters for immutable UDFs with literal arguments. With the previous order, simplification could evaluate a constant UDF call before Int64 literals were coerced to the UDF's expected Float64 inputs. For geospatial filters, st_point(0, 0) could reach the geo UDF as integer arrays and panic during the UDF's strict float downcast. st_point(0.0, 0.0) worked because the literals were already Float64.

Repro Demo

This is a standalone Python repro script. It is not added to the repo.

import tempfile

import lance
import numpy as np
import pyarrow as pa
from geoarrow.rust.core import point, points


def run_filter(ds, expr):
    table = ds.to_table(filter=expr)
    values = table.to_pydict()
    print(f"{expr} -> {table.num_rows} rows: {values}")
    assert table.num_rows == 1


with tempfile.TemporaryDirectory() as tmpdir:
    point_array = points([np.array([1.0, 10.0]), np.array([2.0, 10.0])])
    schema = pa.schema([pa.field(point("xy")).with_name("point")])
    table = pa.Table.from_arrays([point_array], schema=schema)
    ds = lance.write_dataset(table, f"{tmpdir}/geo_filter.lance")

    run_filter(ds, "st_distance(point, st_point(0.0, 0.0)) < 5")
    run_filter(ds, "st_distance(point, st_point(0, 0)) < 5")

Observed before the fix:

st_distance(point, st_point(0.0, 0.0)) < 5 -> 1 rows: {'point': [{'x': 1.0, 'y': 2.0}]}
thread 'lance_background_thread' panicked at .../arrow-array-58.3.0/src/cast.rs:849:33:
primitive array
RuntimeError: Task was aborted

Observed after the fix:

st_distance(point, st_point(0.0, 0.0)) < 5 -> 1 rows: {'point': [{'x': 1.0, 'y': 2.0}]}
st_distance(point, st_point(0, 0)) < 5 -> 1 rows: {'point': [{'x': 1.0, 'y': 2.0}]}

@github-actions github-actions Bot added the bug Something isn't working label May 25, 2026
@ddupg ddupg force-pushed the bug/filter-infer-type branch from 5497c5f to 7c61117 Compare May 25, 2026 13:04
@codecov
Copy link
Copy Markdown

codecov Bot commented May 25, 2026

Codecov Report

❌ Patch coverage is 87.09677% with 4 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance-datafusion/src/planner.rs 87.09% 3 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

@ddupg ddupg marked this pull request as ready for review May 25, 2026 14:11
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@yanghua yanghua self-requested a review May 26, 2026 03:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant