Add ExecutionPlan::apply_expressions()#20337
Conversation
adriangb
left a comment
There was a problem hiding this comment.
This makes sense to me. It mirrors the APIs for Logical expressions, is clean and a relatively small change.
But since this is an API change let's leave this open for a couple of days and get at least 1 more approval from a committer before moving forward with it.
| // Check expressions from this node | ||
| let exprs = plan.expressions(); | ||
| for expr in exprs.iter() { | ||
| if let Some(_df) = expr.as_any().downcast_ref::<DynamicFilterPhysicalExpr>() { |
There was a problem hiding this comment.
Should this expr.apply() for nested expressions? Should it deduplicate Arc'ed copies?
There was a problem hiding this comment.
Should this expr.apply() for nested expressions?
iiuc the LogicalPlan counterpart returns just the top level expressions.
Should it deduplicate Arc'ed copies?
yeah deduping is a good idea
There was a problem hiding this comment.
I was referring to this helper function, not the general API. The general API should only expose top level expressions and do no deduplication.
There was a problem hiding this comment.
about deduping, the objective of this test is to prove how many times the Dynamic Filter appears in the plan and if each node is able count how many dynamic filters it contains, if we dedup then we would count it once only
| /// joins). | ||
| fn children(&self) -> Vec<&Arc<dyn ExecutionPlan>>; | ||
|
|
||
| /// Returns all expressions (non-recursively) evaluated by the current |
There was a problem hiding this comment.
This API forces an allocation and also cloning all the PhysicalExprs -- what would you think about adding apply_expressions and map_expressions methods to parallel the ones on LogicalPlan instead?
- https://docs.rs/datafusion/latest/datafusion/logical_expr/enum.LogicalPlan.html#method.apply_expressions
- https://docs.rs/datafusion/latest/datafusion/logical_expr/enum.LogicalPlan.html#method.map_expressions
Maybe you can start with just the apply_expressions one in this PR
I think we should probably also not provide a default implementation to force all implementations to properly visit the expressions
If we provide this default implementation, then downstream implementors will likely not implement the API and if something in the datafusion core depends on the API in the future it will be hard to debug what is going on
There was a problem hiding this comment.
I think we should probably also not provide a default implementation to force all implementations to properly visit the expressions
If we provide this default implementation, then downstream implementors will likely not implement the API and if something in the datafusion core depends on the API in the future it will be hard to debug what is going on
makes sense, I included a default implementation because didn't want to incroduce a breaking change but is better to be safe and force the implementation 👍
what would you think about adding apply_expressions and map_expressions methods to parallel the ones on LogicalPlan instead?
nice catch, I missed the allocation fact, I will give it a try
alamb
left a comment
There was a problem hiding this comment.
Thanks @LiaCastaneda and @adriangb
I am a little worried about the default implementation here --
I also think a slightly different API might be worth considering
|
Thanks for reviewing Andrew - that's very good feedback that I missed in my review. I agree that |
|
Thanks both for the reviews! I will work on your suggestion @alamb |
10c7c28 to
51dd8d0
Compare
938297d to
bd5b02f
Compare
bd5b02f to
88730b0
Compare
adriangb
left a comment
There was a problem hiding this comment.
Some minor comments but I think we can merge this whenever you think it's ready Lía
|
There are some conflicts again, wil fix them... |
Thank you and sorry for the delays causing conflicts and bump to v54 |
…ns-function-physical-plan
|
no worries, they were not too complex to solve, I added if so, I think the PR is good to go |
|
Hi! There is a patch #20009 that adds a more expressive API by splitting responsibilities into:
This approach not only helps to check for specific types of expressions in the plan but also enables replacing them, which extends the number of contexts where the API can be used. It looks a bit confusing to have all these methods together ( pub fn visit_expressions(
plan: &dyn ExecutionPlan,
f: &mut dyn FnMut(&dyn PhysicalExpr) -> Result<TreeNodeRecursion>,
) -> Result<TreeNodeRecursion> {
let mut tnr = TreeNodeRecursion::Continue;
for expr in plan.physical_expressions() {
tnr = tnr.visit_sibling(|| f(expr.as_ref()))?;
}
Ok(tnr)
} |
|
👋 Hey, I was not aware there was already an initiative to build a similar API. This PR implements |
Yes, it would be nice to have a writing API. The important property we need is that |
|
I think we can reuse the properties of the rest of the plan (avoiding I created this issue #20899. I haven't started working on it yet and probably won't have much time this week, so I'll likely give it a try next week, but feel free to take it if you'd like |
|
Actually, now that I think about it, there are some cases where we would need to recompute properties right? for example, if a user changes an expression from a > something to a < something. How do we specify in this API whether we want to recompute properties or not? should |
Yes, it may be useful to explicitly ask for properties re-computation. And it seems for me that by default the safest option is to force properties to be re-computed. Another way to satisfy it is to introduce "args struct" like: struct MapExpressionsArgs<'a> {
f: &'a dyn FnMut(&Arc<dyn PhysicalExpr>) -> Result<Arc<dyn PhysicalExpr>>,
preserve_properties: bool,
}Like is done here: datafusion/datafusion/catalog/src/table.rs Lines 366 to 372 in 8d9b080 to not add a bool argument each time when the method semantics is extended. But maybe this is overkill here and bool parameter will be enough. |
|
lets continue this discussion in the issue |
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes apache#123` indicates that this PR will close issue apache#123. --> - Closes apache#18296 Needed for datafusion-contrib/datafusion-distributed#180 ## Rationale for this change Right now, there is no easy way to know if a given node in the plan holds Dynamic Filters or to traverse all physical expressions in an ExecutionPlan. This PR implements `apply_expressions()` that visits all `PhysicalExpr`s inside an `ExecutionPlan` using a callback pattern, including `DynamicFilterPhysicalExpr`. This is similar to the existing `apply_expressions()` API for `LogicalPlan`. ## What changes are included in this PR? - Added `apply_expressions()` method to the `ExecutionPlan` trait with no default implementation, forcing all implementors to explicitly handle their expressions - Uses a visitor pattern with `FnMut(&dyn PhysicalExpr) -> Result<TreeNodeRecursion>` to avoid allocations - Implemented `apply_expressions()` for all `ExecutionPlan` implementations - Also added `apply_expressions()` to `FileSource` and `DataSource` traits (required, no default) ## Are these changes tested? Yes, added a test that traverses the plan and discovers dynamic filters using `apply_expressions()`. ## Are there any user-facing changes? Yes, the new API `ExecutionPlan::apply_expressions()`, `FileSource::apply_expressions()`, and `DataSource::apply_expressions()`. --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
…che#22437) ## Which issue does this PR close? - Reverts apache#20337 - Addresses concerns raised in apache#22415 - Closes apache#22415 ## Rationale for this change `ExecutionPlan::apply_expressions()` was added in apache#20337 with no default implementation, forcing every custom `ExecutionPlan`, `FileSource`, and `DataSource` implementor to add the method as part of upgrading to DataFusion 54. As discussed on apache#22415, per @LiaCastaneda and @adriangb the method is not yet called from anywhere in DataFusion and the originally intended use (dynamic-filter discovery/serialization for distributed scenarios) is blocked on other in-progress work (apache#20009, apache#21350). The combined effect on downstream users is a required code change with no immediate benefit, and ambiguity about what a "correct" implementation even means today (e.g. is returning `Ok(TreeNodeRecursion::Continue)` is safe right now but becomes incorrect as soon as the method starts being used by an optimizer pass?. The plan agreed in the discussion is to remove the API from the 54.0 release and re-add it together with the concrete consumer that needs it. cc @adriangb @LiaCastaneda @milenkovicm. ## What changes are included in this PR? `git revert -m 1` of the merge commit, with the following manual conflict resolutions and follow-ups: ## Are these changes tested? By CI ## Are there any user-facing changes? Yes -- this removes the new public API: - `ExecutionPlan::apply_expressions` - `FileSource::apply_expressions` - `DataSource::apply_expressions` These were only added in 54 and are not yet released. Custom implementors no longer need to implement these methods.
… trait method The two MockReqExec impls in this test file override ExecutionPlan::apply_expressions, added when apache#20337 introduced the trait method. Upstream apache#22437 reverted that addition, so the overrides now reference a trait method that no longer exists and the test crate fails to compile after rebasing onto main. Removing both override blocks restores the trait-default behavior (no-op) used before apache#20337.
Which issue does this PR close?
DynamicFilterPhysicalExprexpressions from outside the plan #18296Needed for datafusion-contrib/datafusion-distributed#180
Rationale for this change
Right now, there is no easy way to know if a given node in the plan holds Dynamic Filters or to traverse all physical expressions in an ExecutionPlan. This PR implements
apply_expressions()that visits allPhysicalExprs inside anExecutionPlanusing a callback pattern, includingDynamicFilterPhysicalExpr. This is similar to the existingapply_expressions()API forLogicalPlan.What changes are included in this PR?
apply_expressions()method to theExecutionPlantrait with no default implementation, forcing all implementors to explicitly handle their expressionsFnMut(&dyn PhysicalExpr) -> Result<TreeNodeRecursion>to avoid allocationsapply_expressions()for allExecutionPlanimplementationsapply_expressions()toFileSourceandDataSourcetraits (required, no default)Are these changes tested?
Yes, added a test that traverses the plan and discovers dynamic filters using
apply_expressions().Are there any user-facing changes?
Yes, the new API
ExecutionPlan::apply_expressions(),FileSource::apply_expressions(), andDataSource::apply_expressions().