Skip to content

fix(datafusion): coerce filter literals for dictionary-encoded columns#7003

Open
valkum wants to merge 2 commits into
lance-format:mainfrom
valkum:dict_expr
Open

fix(datafusion): coerce filter literals for dictionary-encoded columns#7003
valkum wants to merge 2 commits into
lance-format:mainfrom
valkum:dict_expr

Conversation

@valkum
Copy link
Copy Markdown
Contributor

@valkum valkum commented May 29, 2026

Closes #7002

SQL string filters on a dictionary-encoded column (e.g. Dictionary(Int16, Utf8)) failed to plan with "could not convert to literal of type 'Dictionary(...)'" because safe_coerce_scalar had no arm for a dictionary target and no ScalarValue::Dictionary input arm. This also silently lost scalar-index pushdown for =/IN predicates on dictionary columns.

Add two generic guard clauses to safe_coerce_scalar: unwrap a dictionary literal and coerce its inner value, and coerce a value to a dictionary target by recursing on the value type and re-wrapping. Nulls keep their untyped form, matching the existing behavior for all targets.

Enabling pushdown exposed a todo!() in OrderableScalarValue::cmp for Dictionary vs Dictionary (scalar indexes store dictionary columns as Dictionary scalar keys in a BTreeMap), which would have turned a previously-working full-scan query into a panic. Implement the comparison by delegating to the inner values.

This was created by Claude Code using Opus 4.8.

Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@github-actions github-actions Bot added the bug Something isn't working label May 29, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7df6035e4c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread rust/lance-datafusion/src/expr.rs Outdated
SQL string filters on a dictionary-encoded column (e.g.
`Dictionary(Int16, Utf8)`) failed to plan with "could not convert to
literal of type 'Dictionary(...)'" because `safe_coerce_scalar` had no
arm for a dictionary target and no `ScalarValue::Dictionary` input arm.
This also silently lost scalar-index pushdown for `=`/`IN` predicates on
dictionary columns.

Add two generic guard clauses to `safe_coerce_scalar`: unwrap a
dictionary literal and coerce its inner value, and coerce a value to a
dictionary target by recursing on the value type and re-wrapping. Nulls
keep their untyped form, matching the existing behavior for all targets.

Enabling pushdown exposed a `todo!()` in `OrderableScalarValue::cmp` for
`Dictionary` vs `Dictionary` (scalar indexes store dictionary columns as
`Dictionary` scalar keys in a BTreeMap), which would have turned a
previously-working full-scan query into a panic. Implement the
comparison by delegating to the inner values.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@valkum
Copy link
Copy Markdown
Contributor Author

valkum commented May 29, 2026

Rebased on top of the recent BinaryView PR

@codecov
Copy link
Copy Markdown

codecov Bot commented May 29, 2026

Codecov Report

❌ Patch coverage is 98.78788% with 2 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance-datafusion/src/logical_expr.rs 97.05% 1 Missing ⚠️
rust/lance/src/dataset/scanner.rs 98.21% 0 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

@valkum
Copy link
Copy Markdown
Contributor Author

valkum commented May 29, 2026

Currently working on the feedback from codex.

`safe_coerce_scalar` wraps a filter literal in the dictionary type so it matches a dictionary column, but the guard returned early for any `is_null()` value, leaving typed nulls like `Utf8(None)` unwrapped instead of producing `Dictionary(Int16, Utf8(None))` (the form DataFusion itself uses for a null of dictionary type via `try_new_null`). Narrow the guard to only the untyped `ScalarValue::Null` so typed nulls flow through the normal coerce-and-wrap path. This is representation hygiene at the resolution layer: execution and scalar-index pushdown behave identically with or without it (a `= NULL` branch is non-sargable, so a typed null never reaches the dictionary comparison), so it is covered by unit tests on `safe_coerce_scalar` and `resolve_value` rather than an end-to-end scan test.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LanceDB's only_if("col IN ('x', 'y') fails if col is of type dictionary

1 participant