fix(datafusion): coerce filter literals for dictionary-encoded columns#7003
Open
valkum wants to merge 2 commits into
Open
fix(datafusion): coerce filter literals for dictionary-encoded columns#7003valkum wants to merge 2 commits into
valkum wants to merge 2 commits into
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7df6035e4c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
SQL string filters on a dictionary-encoded column (e.g. `Dictionary(Int16, Utf8)`) failed to plan with "could not convert to literal of type 'Dictionary(...)'" because `safe_coerce_scalar` had no arm for a dictionary target and no `ScalarValue::Dictionary` input arm. This also silently lost scalar-index pushdown for `=`/`IN` predicates on dictionary columns. Add two generic guard clauses to `safe_coerce_scalar`: unwrap a dictionary literal and coerce its inner value, and coerce a value to a dictionary target by recursing on the value type and re-wrapping. Nulls keep their untyped form, matching the existing behavior for all targets. Enabling pushdown exposed a `todo!()` in `OrderableScalarValue::cmp` for `Dictionary` vs `Dictionary` (scalar indexes store dictionary columns as `Dictionary` scalar keys in a BTreeMap), which would have turned a previously-working full-scan query into a panic. Implement the comparison by delegating to the inner values. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Contributor
Author
|
Rebased on top of the recent BinaryView PR |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
Contributor
Author
|
Currently working on the feedback from codex. |
`safe_coerce_scalar` wraps a filter literal in the dictionary type so it matches a dictionary column, but the guard returned early for any `is_null()` value, leaving typed nulls like `Utf8(None)` unwrapped instead of producing `Dictionary(Int16, Utf8(None))` (the form DataFusion itself uses for a null of dictionary type via `try_new_null`). Narrow the guard to only the untyped `ScalarValue::Null` so typed nulls flow through the normal coerce-and-wrap path. This is representation hygiene at the resolution layer: execution and scalar-index pushdown behave identically with or without it (a `= NULL` branch is non-sargable, so a typed null never reaches the dictionary comparison), so it is covered by unit tests on `safe_coerce_scalar` and `resolve_value` rather than an end-to-end scan test. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #7002
SQL string filters on a dictionary-encoded column (e.g.
Dictionary(Int16, Utf8)) failed to plan with "could not convert to literal of type 'Dictionary(...)'" becausesafe_coerce_scalarhad no arm for a dictionary target and noScalarValue::Dictionaryinput arm. This also silently lost scalar-index pushdown for=/INpredicates on dictionary columns.Add two generic guard clauses to
safe_coerce_scalar: unwrap a dictionary literal and coerce its inner value, and coerce a value to a dictionary target by recursing on the value type and re-wrapping. Nulls keep their untyped form, matching the existing behavior for all targets.Enabling pushdown exposed a
todo!()inOrderableScalarValue::cmpforDictionaryvsDictionary(scalar indexes store dictionary columns asDictionaryscalar keys in a BTreeMap), which would have turned a previously-working full-scan query into a panic. Implement the comparison by delegating to the inner values.This was created by Claude Code using Opus 4.8.