Skip to content

Add ExecutionPlan::apply_expressions()#20337

Merged
adriangb merged 21 commits into
apache:mainfrom
LiaCastaneda:lia/add-expressions-function-physical-plan
Mar 2, 2026
Merged

Add ExecutionPlan::apply_expressions()#20337
adriangb merged 21 commits into
apache:mainfrom
LiaCastaneda:lia/add-expressions-function-physical-plan

Conversation

@LiaCastaneda
Copy link
Copy Markdown
Contributor

@LiaCastaneda LiaCastaneda commented Feb 13, 2026

Which issue does this PR close?

Needed for datafusion-contrib/datafusion-distributed#180

Rationale for this change

Right now, there is no easy way to know if a given node in the plan holds Dynamic Filters or to traverse all physical expressions in an ExecutionPlan. This PR implements apply_expressions() that visits all PhysicalExprs inside an ExecutionPlan using a callback pattern, including DynamicFilterPhysicalExpr. This is similar to the existing apply_expressions() API for LogicalPlan.

What changes are included in this PR?

  • Added apply_expressions() method to the ExecutionPlan trait with no default implementation, forcing all implementors to explicitly handle their expressions
  • Uses a visitor pattern with FnMut(&dyn PhysicalExpr) -> Result<TreeNodeRecursion> to avoid allocations
  • Implemented apply_expressions() for all ExecutionPlan implementations
  • Also added apply_expressions() to FileSource and DataSource traits (required, no default)

Are these changes tested?

Yes, added a test that traverses the plan and discovers dynamic filters using apply_expressions().

Are there any user-facing changes?

Yes, the new API ExecutionPlan::apply_expressions(), FileSource::apply_expressions(), and DataSource::apply_expressions().

@github-actions github-actions Bot added core Core DataFusion crate datasource Changes to the datasource crate physical-plan Changes to the physical-plan crate labels Feb 13, 2026
@LiaCastaneda LiaCastaneda changed the title Implement expressions() Implement ExecutionPlan::expressions() Feb 13, 2026
Comment thread datafusion/core/tests/physical_optimizer/filter_pushdown.rs
Copy link
Copy Markdown
Contributor

@adriangb adriangb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me. It mirrors the APIs for Logical expressions, is clean and a relatively small change.

But since this is an API change let's leave this open for a couple of days and get at least 1 more approval from a committer before moving forward with it.

Comment thread datafusion/core/tests/physical_optimizer/filter_pushdown.rs
// Check expressions from this node
let exprs = plan.expressions();
for expr in exprs.iter() {
if let Some(_df) = expr.as_any().downcast_ref::<DynamicFilterPhysicalExpr>() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this expr.apply() for nested expressions? Should it deduplicate Arc'ed copies?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this expr.apply() for nested expressions?

iiuc the LogicalPlan counterpart returns just the top level expressions.

Should it deduplicate Arc'ed copies?

yeah deduping is a good idea

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was referring to this helper function, not the general API. The general API should only expose top level expressions and do no deduplication.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

about deduping, the objective of this test is to prove how many times the Dynamic Filter appears in the plan and if each node is able count how many dynamic filters it contains, if we dedup then we would count it once only

/// joins).
fn children(&self) -> Vec<&Arc<dyn ExecutionPlan>>;

/// Returns all expressions (non-recursively) evaluated by the current
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API forces an allocation and also cloning all the PhysicalExprs -- what would you think about adding apply_expressions and map_expressions methods to parallel the ones on LogicalPlan instead?

Maybe you can start with just the apply_expressions one in this PR

I think we should probably also not provide a default implementation to force all implementations to properly visit the expressions

If we provide this default implementation, then downstream implementors will likely not implement the API and if something in the datafusion core depends on the API in the future it will be hard to debug what is going on

Copy link
Copy Markdown
Contributor Author

@LiaCastaneda LiaCastaneda Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should probably also not provide a default implementation to force all implementations to properly visit the expressions
If we provide this default implementation, then downstream implementors will likely not implement the API and if something in the datafusion core depends on the API in the future it will be hard to debug what is going on

makes sense, I included a default implementation because didn't want to incroduce a breaking change but is better to be safe and force the implementation 👍

what would you think about adding apply_expressions and map_expressions methods to parallel the ones on LogicalPlan instead?

nice catch, I missed the allocation fact, I will give it a try

Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @LiaCastaneda and @adriangb

I am a little worried about the default implementation here --

I also think a slightly different API might be worth considering

@adriangb
Copy link
Copy Markdown
Contributor

adriangb commented Feb 13, 2026

Thanks for reviewing Andrew - that's very good feedback that I missed in my review. I agree that apply_expressions(|expr: &Arc<dyn PhysicalExpr>| ...) would be a better API.

@LiaCastaneda
Copy link
Copy Markdown
Contributor Author

Thanks both for the reviews! I will work on your suggestion @alamb

@LiaCastaneda LiaCastaneda marked this pull request as draft February 16, 2026 07:45
@github-actions github-actions Bot added optimizer Optimizer rules catalog Related to the catalog crate labels Feb 16, 2026
@LiaCastaneda LiaCastaneda force-pushed the lia/add-expressions-function-physical-plan branch from 10c7c28 to 51dd8d0 Compare February 16, 2026 08:40
@LiaCastaneda LiaCastaneda changed the title Implement ExecutionPlan::expressions() Implement ExecutionPlan::apply_expressions() Feb 16, 2026
@github-actions github-actions Bot added the ffi Changes to the ffi crate label Feb 16, 2026
@LiaCastaneda LiaCastaneda force-pushed the lia/add-expressions-function-physical-plan branch from 938297d to bd5b02f Compare February 16, 2026 09:19
@LiaCastaneda LiaCastaneda force-pushed the lia/add-expressions-function-physical-plan branch from bd5b02f to 88730b0 Compare February 16, 2026 09:21
@LiaCastaneda LiaCastaneda marked this pull request as ready for review February 16, 2026 09:27
Copy link
Copy Markdown
Contributor

@adriangb adriangb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments but I think we can merge this whenever you think it's ready Lía

@LiaCastaneda
Copy link
Copy Markdown
Contributor Author

There are some conflicts again, wil fix them...

@adriangb
Copy link
Copy Markdown
Contributor

adriangb commented Mar 2, 2026

There are some conflicts again, wil fix them...

Thank you and sorry for the delays causing conflicts and bump to v54

@LiaCastaneda
Copy link
Copy Markdown
Contributor Author

no worries, they were not too complex to solve, I added docs/source/library-user-guide/upgrading/54.0.0.md does it look ok? (it's the first time I add a file from scratch in DF)

if so, I think the PR is good to go

@adriangb adriangb added this pull request to the merge queue Mar 2, 2026
Merged via the queue into apache:main with commit a5f490e Mar 2, 2026
33 checks passed
@askalt
Copy link
Copy Markdown
Contributor

askalt commented Mar 11, 2026

Hi! There is a patch #20009 that adds a more expressive API by splitting responsibilities into:

  1. reading expressions
  2. writing expressions

This approach not only helps to check for specific types of expressions in the plan but also enables replacing them, which extends the number of contexts where the API can be used. It looks a bit confusing to have all these methods together (apply_expressions, physical_expressions and with_physical_expressions), so with this, we can implement apply_expressions as a simple helper, like:

pub fn visit_expressions(
    plan: &dyn ExecutionPlan,
    f: &mut dyn FnMut(&dyn PhysicalExpr) -> Result<TreeNodeRecursion>,
) -> Result<TreeNodeRecursion> {
    let mut tnr = TreeNodeRecursion::Continue;
    for expr in plan.physical_expressions() {
        tnr = tnr.visit_sibling(|| f(expr.as_ref()))?;
    }
    Ok(tnr)
}

@LiaCastaneda
Copy link
Copy Markdown
Contributor Author

👋 Hey, I was not aware there was already an initiative to build a similar API. This PR implements apply_expressions, which mirrors LogicalPlan::apply_expressions and is intended to be read only and allocation free. Ideally, we should also implement map_expressions (mirroring LogicalPlan::map_expressions) to support modifying PhysicalExprs and rebuilding the node at the same time. Would both of these APIs cover your use case?

@askalt
Copy link
Copy Markdown
Contributor

askalt commented Mar 12, 2026

Would both of these APIs cover your use case?

Yes, it would be nice to have a writing API. The important property we need is that map_expressions should not recompute plan properties, assuming that they are not changed (user responsibility), i.e. we avoid a typical plan ::new() call in this case. Is there an issue or branch to track the implementation?

@LiaCastaneda
Copy link
Copy Markdown
Contributor Author

I think we can reuse the properties of the rest of the plan (avoiding ::new()), similar to how LogicalPlan::map_expressions does it.

I created this issue #20899. I haven't started working on it yet and probably won't have much time this week, so I'll likely give it a try next week, but feel free to take it if you'd like

@LiaCastaneda
Copy link
Copy Markdown
Contributor Author

Actually, now that I think about it, there are some cases where we would need to recompute properties right? for example, if a user changes an expression from a > something to a < something. How do we specify in this API whether we want to recompute properties or not? should map_expressions have a recompute_properties: bool argument? 🤔

@askalt
Copy link
Copy Markdown
Contributor

askalt commented Mar 12, 2026

Actually, now that I think about it, there are some cases where we would need to recompute properties right? for example, if a user changes an expression from a > something to a < something. How do we specify in this API whether we want to recompute properties or not? should map_expressions have a recompute_properties: bool argument? 🤔

Yes, it may be useful to explicitly ask for properties re-computation. And it seems for me that by default the safest option is to force properties to be re-computed.

Another way to satisfy it is to introduce "args struct" like:

struct MapExpressionsArgs<'a> {
    f: &'a dyn FnMut(&Arc<dyn PhysicalExpr>) -> Result<Arc<dyn PhysicalExpr>>,
    preserve_properties: bool,
}

Like is done here:

/// Arguments for scanning a table with [`TableProvider::scan_with_args`].
#[derive(Debug, Clone, Default)]
pub struct ScanArgs<'a> {
filters: Option<&'a [Expr]>,
projection: Option<&'a [usize]>,
limit: Option<usize>,
}

to not add a bool argument each time when the method semantics is extended. But maybe this is overkill here and bool parameter will be enough.

@LiaCastaneda
Copy link
Copy Markdown
Contributor Author

lets continue this discussion in the issue

de-bgunter pushed a commit to de-bgunter/datafusion that referenced this pull request Mar 24, 2026
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

- Closes apache#18296

Needed for
datafusion-contrib/datafusion-distributed#180

## Rationale for this change

Right now, there is no easy way to know if a given node in the plan
holds Dynamic Filters or to traverse all physical expressions in an
ExecutionPlan. This PR implements `apply_expressions()` that visits all
`PhysicalExpr`s inside an `ExecutionPlan` using a callback pattern,
including `DynamicFilterPhysicalExpr`. This is similar to the existing
`apply_expressions()` API for `LogicalPlan`.

## What changes are included in this PR?

- Added `apply_expressions()` method to the `ExecutionPlan` trait with
no default implementation, forcing all implementors to explicitly handle
their expressions
- Uses a visitor pattern with `FnMut(&dyn PhysicalExpr) ->
Result<TreeNodeRecursion>` to avoid allocations
- Implemented `apply_expressions()` for all `ExecutionPlan`
implementations
- Also added `apply_expressions()` to `FileSource` and `DataSource`
traits (required, no default)

## Are these changes tested?

Yes, added a test that traverses the plan and discovers dynamic filters
using `apply_expressions()`.

## Are there any user-facing changes?

Yes, the new API `ExecutionPlan::apply_expressions()`,
`FileSource::apply_expressions()`, and
`DataSource::apply_expressions()`.

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
pull Bot pushed a commit to TCeason/arrow-datafusion that referenced this pull request May 21, 2026
…che#22437)

## Which issue does this PR close?

- Reverts apache#20337
- Addresses concerns raised in
apache#22415
- Closes apache#22415

## Rationale for this change

`ExecutionPlan::apply_expressions()` was added in apache#20337 with no default
implementation, forcing every custom `ExecutionPlan`, `FileSource`, and
`DataSource` implementor to add the method as part of upgrading to
DataFusion 54.

As discussed on apache#22415, per @LiaCastaneda and @adriangb the method is
not yet called from anywhere in DataFusion and the originally intended
use (dynamic-filter discovery/serialization for distributed scenarios)
is blocked on other in-progress work (apache#20009, apache#21350).

The combined effect on downstream users is a required code change with
no immediate benefit, and ambiguity about what a "correct"
implementation even means today (e.g. is returning
`Ok(TreeNodeRecursion::Continue)` is safe right now but becomes
incorrect as soon as the method starts being used by an optimizer pass?.

The plan agreed in the discussion is to remove the API from the 54.0
release and re-add it together with the concrete consumer that needs it.
cc @adriangb @LiaCastaneda @milenkovicm.

## What changes are included in this PR?

`git revert -m 1` of the merge commit, with the following manual
conflict resolutions and follow-ups:

## Are these changes tested?

By CI

## Are there any user-facing changes?

Yes -- this removes the new public API:

- `ExecutionPlan::apply_expressions`
- `FileSource::apply_expressions`
- `DataSource::apply_expressions`

These were only added in 54 and are not yet released. Custom
implementors no longer need to implement these methods.
alamb added a commit that referenced this pull request May 22, 2026
#22437) (#22445)

- Backports #22437 from @alamb
to the branch-54 line

This PR cherry-picks the revert of `ExecutionPlan::apply_expressions()`
(#20337) onto `branch-54` so that DataFusion 54.0 does not ship the new
public API.
zhuqi-lucas added a commit to zhuqi-lucas/arrow-datafusion that referenced this pull request May 23, 2026
… trait method

The two MockReqExec impls in this test file override
ExecutionPlan::apply_expressions, added when apache#20337 introduced the
trait method. Upstream apache#22437 reverted that addition, so the
overrides now reference a trait method that no longer exists and the
test crate fails to compile after rebasing onto main. Removing both
override blocks restores the trait-default behavior (no-op) used
before apache#20337.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api change Changes the API exposed to users of the crate catalog Related to the catalog crate core Core DataFusion crate datasource Changes to the datasource crate documentation Improvements or additions to documentation ffi Changes to the ffi crate optimizer Optimizer rules physical-plan Changes to the physical-plan crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Access DynamicFilterPhysicalExpr expressions from outside the plan

4 participants