Group scan time expression rewrite functionality for UDFs in new module in datafusion-physical-expr-adapter#23125
Conversation
Signed-off-by: Adam Gutglick <adamgsal@gmail.com>
|
Thank you for opening this pull request! Reviewer note: cargo-semver-checks reported the current version number is not SemVer-compatible with the changes in this pull request (compared against the base branch). Details |
| //! Rewrites that lower scan-metadata scalar UDFs into concrete physical | ||
| //! expressions for the specific file being scanned. |
There was a problem hiding this comment.
It might be good to make this more general "rewrite expressions in preparation for files being scanned, such as scan-metadata scalar UDFs"
| //! Rewrites that lower scan-metadata scalar UDFs into concrete physical | ||
| //! expressions for the specific file being scanned. | ||
| //! | ||
| //! Functions like `file_row_index()` and `input_file_name()` are placeholders |
There was a problem hiding this comment.
I recommend making these actual links so the compiler can make sure the do not get stale
//! Functions like [`file_row_index()`] and [`input_file_name()`] are placeholders| datafusion-common-runtime = { workspace = true } | ||
| datafusion-execution = { workspace = true } | ||
| datafusion-expr = { workspace = true } | ||
| datafusion-functions = { workspace = true } |
There was a problem hiding this comment.
I think we are trying very hard to avoid this dependency -- the data sources should not depend on the pre-packaged functions to keep them as general as possible
There was a problem hiding this comment.
Maybe
datafusion-physical-expr-adatper
That would be a better choice
There was a problem hiding this comment.
The issue here is that datafusion-physical-expr-adapter already depends on datafusion-functions, and datafusion-datasource depends on it. I was also wondering about this tree, I was considering introducing another crate here but that seems like a big change for something this small.
There was a problem hiding this comment.
I'll move it there for now, if this pattern and module keep expanding at some point it'll make more sense to maybe rethink some of the crate structure around it.
There was a problem hiding this comment.
done, I've also updated the PR's description and title.
datafusion-datasourcedatafusion-physical-expr-adapter
fafcff4 to
a015ecd
Compare
Which issue does this PR close?
Rationale for this change
Instead of continuously adding more and more UDF-specific rewrite utilities to
datafusion/physical-expr-adapter/src/schema_rewriter.rs, move all of them to a newrewritemodule.This issue was raised in recent PRs that added
input_file_name()andfile_row_index().What changes are included in this PR?
rewritemodule indatafusion-physical-expr-adapter, and move some functionality to it.Are these changes tested?
Existing testing suites.
Are there any user-facing changes?
I don't think any of this code was released.