Skip to content

feat: expose dynamic filter bloom helper APIs#9

Draft
discord9 wants to merge 54 commits into
GreptimeTeam:mainfrom
discord9:fix/dynamic-filter-bloom-api
Draft

feat: expose dynamic filter bloom helper APIs#9
discord9 wants to merge 54 commits into
GreptimeTeam:mainfrom
discord9:fix/dynamic-filter-bloom-api

Conversation

@discord9

Copy link
Copy Markdown

What's changed

This PR exposes small internal helper APIs needed by GreptimeDB remote dynamic filter Bloom encoding:

  • expose DynamicFilterPhysicalExpr::is_complete() so callers can check completion state without awaiting;
  • expose HashTableLookupExpr::on_columns() and seeds();
  • expose HashTableLookupExpr::distinct_join_hash_count() and visit_distinct_join_hashes(...) for Map::HashMap;
  • add JoinHashMapType::visit_hashes(...) implementations for hash-map based join maps.

Map::ArrayMap returns Ok(false) for distinct-hash visitation so callers can safely fall back instead of approximating.

Why

GreptimeDB RDF Bloom remote dynamic filters need to build a compact Bloom payload from the existing join hash lookup state without collecting all hashes into a temporary vector or depending on private fields.

Validation

  • cargo fmt --all -- --check
  • cargo test -p datafusion-physical-plan dynamic_filter --lib
  • cargo test -p datafusion-physical-plan visit_hashes --lib
  • cargo test -p datafusion-physical-plan hash_table_lookup_expr --lib

comphead and others added 30 commits March 1, 2026 17:43
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->
Preparation for DataFusion release 53.0.0
- apache#19692


## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
…apache#20680)

## Which issue does this PR close?

N/A

## Rationale for this change

Backport for apache#20625

When enabling the `recursive_protection` feature for the `datafusion`
crate, the `sql` feature is enabled. This is undesirable if the
downstream project would like the `sql` feature to be off.

## What changes are included in this PR?

Use the `?` syntax for features of dependencies for
`recursive_protection`. This was already correctly done for other
features such as `unicode_expressions`.

<https://doc.rust-lang.org/cargo/reference/features.html>

## Are these changes tested?

N/A

## Are there any user-facing changes?

This makes dependency management better for downstream projects and is
not a breaking change.

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

- Closes #.

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

Co-authored-by: Heran Lin <heran@lakesail.com>
…) queries (#… (apache#20726)

…20710)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123. -->

- Closes apache#20669 .

## Rationale for this change
Return probe_side.len() for count(*) queries
<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes. -->

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?
slt tests
<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

- Closes #.

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
)

## Which issue does this PR close?

- Closes apache#20704

## Rationale for this change

FFI_TableOptions fails with a warning that is getting swallowed in the
unit tests.

## What changes are included in this PR?

Correctly check format for table options.

## Are these changes tested?

Unit tests updated.

## Are there any user-facing changes?

None, internal only.

## Context

Related to apache#20705 but
targetting `branch-53`.
…tor::Colon` (apache#20717)

## Which issue does this PR close?

- part of apache#19692  
<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

- Needed in
datafusion-contrib/datafusion-variant#26

- Backport of apache#20628 on v53
branch.

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

- `sqlparser-rs` currently exposes the colon operator (`:`) as a special
`JsonAccess` expression. So it fails in datafusion's parsing before an
`ExprPlanner` is even invoked.
- Add `Operator::Colon`. Currently it's not used/implemented in
datafusion.

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

- Fixes the above problem by converting `JsonAccess` to a normal binary
expr, on which the `ExprPlanner` is invoked and custom parsing can be
done.

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

Added tests.

Also did a prototype of a custom `ExprPlanner` in datafusion-variant
using this to convert colon operator to `variant_get` function -
datafusion-contrib/datafusion-variant#31

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

Add `Operator::Colon`
…size() to reduce memory pool interactions (apache#20733)

Backport apache#20729 to `branch-53`.
…20672) (apache#20792)

- Part of apache#19692
- Closes apache#20683 on branch-53

**This PR:**
- Backports apache#20672 from
@xanderbailey to the [branch-53]

Co-authored-by: Xander <zander181@googlemail.com>
…flatten keys) (apache#20505) (apache#20791)

- Part of apache#19692
- Closes apache#20696 on branch-53

**This PR:**
- Backports apache#20505 from @alamb
to the [branch-53]

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
…pache#20887)

- Part of apache#19692

This PR:
- Backports apache#20609 from
@comphead to the branch-53 line

Co-authored-by: Oleks V <comphead@users.noreply.github.com>
…ing (apache#20635) (apache#20885)

- Part of apache#19692
- Closes apache#20634 on branch-53

This PR:
- Backports apache#20635 from
@neilconway to the branch-53 line

Co-authored-by: Neil Conway <neil.conway@gmail.com>
…t inner filter proves zero selectivity (apache#20743) (apache#20882)

- Part of apache#19692
- Closes apache#20742 on branch-53

This PR:
- Backports apache#20743 from
@haohuaijin to the branch-53 line

Co-authored-by: Huaijin <haohuaijin@gmail.com>
…truct cols (apache#20698) (apache#20884)

- Part of apache#19692
- Closes apache#20695 on branch-53

This PR:
- Backports apache#20698 from
@friendlymatthew to the branch-53 line

Co-authored-by: Matthew Kim <38759997+friendlymatthew@users.noreply.github.com>
- Part of apache#19692
- Closes apache#20737 on branch-53

This PR:
- Backports apache#20738 from
@haohuaijin to the branch-53 line

Co-authored-by: Huaijin <haohuaijin@gmail.com>
…he#20424) (apache#20890)

- Part of apache#19692

This PR:
- Backports apache#20424 from
@ariel-miculas to the branch-53 line

Co-authored-by: Ariel Miculas-Trif <ariel.miculas@gmail.com>
…tafusion-proto (apache#20574) (apache#20891)

- Part of apache#19692
- Closes apache#20575 on branch-53

This PR:
- Backports apache#20574 from
@nathanb9 to the branch-53 line

Co-authored-by: nathan <56370526+nathanb9@users.noreply.github.com>
…quired (apache#20900) (apache#20903)

- Part of apache#19692
- Backports apache#20900 from @askalt
to the branch-53 line

This PR:
- Backports apache#20900 to branch-53

Co-authored-by: Albert Skalt <133099191+askalt@users.noreply.github.com>
…ith ANSI mode support (apache#20461) (apache#20896)

- Part of apache#19692

This PR:
- Backports apache#20461 from
@davidlghellin to the branch-53 line

Co-authored-by: David López <hola@devel0pez.com>
…ing (apache#20372) (apache#20893)

- Part of apache#19692
- Closes apache#20371 on branch-53

This PR:
- Backports apache#20372 from
@erenavsarogullari to the branch-53 line

Co-authored-by: Eren Avsarogullari <eren@apache.org>
Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>
…pache#19739) (apache#20897)

- Part of apache#19692
- Closes apache#16281 on branch-53

This PR:
- Backports apache#19739 from
@kumarUjjawal to the branch-53 line

Co-authored-by: Kumar Ujjawal <ujjawalpathak6@gmail.com>
…ushed down into TableScan (apache#19884) (apache#20898)

- Part of apache#19692
- Closes apache#19840 on branch-53

This PR:
- Backports apache#19884 from @kosiew
to the branch-53 line

Co-authored-by: kosiew <kosiew@gmail.com>
…ry in ForeignTableProvider::scan (apache#20393) (apache#20895)

- Part of apache#19692

This PR:
- Backports apache#20393 from
@Kontinuation to the branch-53 line

Co-authored-by: Kristin Cowalcijk <bo@wherobots.com>
…LL) (apache#20391) (apache#20892)

- Part of apache#19692
- Closes apache#20388 on branch-53

This PR:
- Backports apache#20391 from
@fwojciec to the branch-53 line

Co-authored-by: Filip Wojciechowski <fwojciec@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…apache#20685) (apache#20914)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123. -->

- Closes apache#20611 .

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes. -->

## What changes are included in this PR?

The Spark function is actual wrapper on top of `array_has` function.
After result is being produced the nulls mask is set respectively for
the output indices which correspond to input rows having nulls

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

- Closes #.

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
…ache#20724) (apache#20858) (apache#20918)

- Part of apache#20724
- Closes apache#20724 on branch-53

This PR:
- Backports apache#20858 from
@gboucher90 to the branch-53 line

Co-authored-by: gboucher90 <gboucher90@users.noreply.github.com>
…filter (apache#20231) (apache#20932)

- Part of apache#19692
- Closes apache#20194 on branch-53

This PR:
- Backports apache#20231 from
@EeshanBembi to the branch-53 line

Co-authored-by: EeshanBembi <33062610+EeshanBembi@users.noreply.github.com>
## Which issue does this PR close?

- Part of apache#19692

## Rationale for this change

We have backported a bunch of bug fixes to branch-53, so let's make sure
the release notes reflect that

## What changes are included in this PR?

I ran
```shell
uv run ./dev/release/generate-changelog.py 52.3.0 branch-53 53.0.0 > dev/changelog/53.0.0.md
```

And then had codex review via

```
› Please review dev/changelog/53.0.0.md to ensure it reflects all commits between where `apache/branch-53` and `main` diverged
```

Then I updated the change log to reflect the original authors not the
backport authors


## Are these changes tested?

By CI
alamb and others added 24 commits March 17, 2026 08:25
…0987) (apache#20991)

- Part of apache#19692
- Closes apache#20992 on branch-53

This PR:
- Backports apache#20987 from @alamb
to the branch-53 line
…e#20978) (apache#21000)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123. -->

Part of: apache#19692

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes. -->

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

- Closes #.

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

Co-authored-by: Daniël Heres <danielheres@gmail.com>
…to prevent overflows (apache#20998) (apache#21008)

(cherry picked from commit e74e58f)

I confirmed that this fixed CometFuzzTestSuite "join" which found the
original issue while testing `branch-53`.

## Which issue does this PR close?
- Closes apache#20995.

## Rationale for this change
apache#20995 has details but it is very straightforward. `dense_ratio`
calculation overflows since overflow guard is after not before

## What changes are included in this PR?
Prevent hash join overflow and unit test for it
- backports apache#20998 from
@buraksenn

## Are these changes tested?
Added a test case for both min and max scenario

Co-authored-by: Burak Şen <buraksenb@gmail.com>
…1016)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123. -->

- Closes apache#21011 .

## Rationale for this change

Handle correctly `array_remove_*` functions if NULL is a value to delete

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes. -->

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

(cherry picked from commit 6ab16cc)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

- Closes #.

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
…che#20962) (apache#20996)

- Part of apache#19692
- Closes apache#20996 on branch-53

This PR:
- Backports apache#20962 from
@erratic-pattern to the branch-53 line
- Backports the related tests from
apache#20960

Co-authored-by: Adam Curtis <adam.curtis.dev@gmail.com>
- Port of apache#21084 to 53

- Related to apache#21079

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
…elds (apache#21057) (apache#21142)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

- Related to apache#21063
- Related to apache#21079

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

Currently if we see a filter with a limit underneath, we don't push the
filter past the limit. However, sort nodes and table scan nodes can have
fetch fields which do essentially the same thing, and we don't stop
filters being pushed past them. This is a correctness bug that can lead
to undefined behaviour.

I added checks for exactly this condition so we don't push the filter
down. I think the prior expectation was that there would be a limit node
between any of these nodes, but this is also not true. In
`push_down_limit.rs`, there's code that does this optimisation when a
limit has a sort under it:


```rust
LogicalPlan::Sort(mut sort) => {
    let new_fetch = {
        let sort_fetch = skip + fetch;
        Some(sort.fetch.map(|f| f.min(sort_fetch)).unwrap_or(sort_fetch))
    };
    if new_fetch == sort.fetch {
        if skip > 0 {
            original_limit(skip, fetch, LogicalPlan::Sort(sort))
        } else {
            Ok(Transformed::yes(LogicalPlan::Sort(sort)))
        }
    } else {
        sort.fetch = new_fetch;
        limit.input = Arc::new(LogicalPlan::Sort(sort));
        Ok(Transformed::yes(LogicalPlan::Limit(limit)))
    }
}
```

The first time this runs, it sets the internal fetch of the sort to
new_fetch, and on the second optimisation pass it hits the branch where
we just get rid of the limit node altogether, leaving the sort node
exposed to potential filters which can now push down into it.

There is also a related fix in `gather_filters_for_pushdown` in
`SortExec`, which does the same thing for physical plan nodes. If we see
that a given execution plan has non-empty fetch, it should not allow any
parent filters to be pushed down.

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

Added checks in the optimisation rule to avoid pushing filters past
children with built-in limits.

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

Yes:
- Unit tests in `push_down_filter.rs`
- Fixed an existing test in `window.slt`
- Unit tests for the physical plan change in `sort.rs`
- New slt test in `push_down_filter_sort_fetch.slt` for this exact
behaviour

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
No

Co-authored-by: Shiv Bhatia <shivbhatia10@gmail.com>
Co-authored-by: Shiv Bhatia <sbhatia@palantir.com>
…oin keys (apache#21121) (apache#21162)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

- Related to apache#21124 
- Related to apache#21079

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

When a Substrait join expression contains both equal and
is_not_distinct_from predicates (e.g. Spark pushes a null-safe filter
into a join that already has a regular equality key), the
`split_eq_and_noneq_join_predicate_with_nulls_equality` function uses a
single `nulls_equal_nulls` boolean that gets overwritten per-predicate.
Whichever operator is processed last determines the `NullEquality` for
all keys, silently dropping NULL-matching rows.

Since NullEquality is a join-level setting (not per-key) across all
physical join implementations (HashJoinExec, SortMergeJoinExec,
SymmetricHashJoinExec), the correct fix is to match DataFusion's own SQL
planner behavior: demote IS NOT DISTINCT FROM keys to the join filter
when mixed with Eq keys. This is already correctly handled for SQL as
shown in

[join_is_not_distinct_from.slt:L188](https://sourcegraph.com/r/github.com/apache/datafusion@2b7d4f9a5b005905b23128274ad37c3306ffcd15/-/blob/datafusion/sqllogictest/test_files/join_is_not_distinct_from.slt?L188)
```
# Test mixed equal and IS NOT DISTINCT FROM conditions
# The `IS NOT DISTINCT FROM` expr should NOT in HashJoin's `on` predicate
query TT
EXPLAIN SELECT t1.id AS t1_id, t2.id AS t2_id, t1.val, t2.val
FROM t1
JOIN t2 ON t1.id = t2.id AND t1.val IS NOT DISTINCT FROM t2.val
----
logical_plan
01)Projection: t1.id AS t1_id, t2.id AS t2_id, t1.val, t2.val
02)--Inner Join: t1.id = t2.id Filter: t1.val IS NOT DISTINCT FROM t2.val
03)----TableScan: t1 projection=[id, val]
04)----TableScan: t2 projection=[id, val]
```

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

`datafusion/substrait/src/logical_plan/consumer/rel/join_rel.rs`:
- Collect eq_keys and indistinct_keys separately instead of using a
single vec with an overwritable boolean
- When both are present (mixed case), use eq_keys as equijoin keys with
NullEqualsNothing and reconstruct the IsNotDistinctFrom expressions into
the join filter
  - Return NullEquality directly instead of converting from bool

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->
Yes, three levels of coverage:

1. Unit tests (join_rel.rs) — directly assert the output of
split_eq_and_noneq_join_predicate_with_nulls_equality for eq-only,
indistinct-only, mixed, and non-column-operand cases
2. Integration test (consumer_integration.rs) — loads a JSON-encoded
Substrait plan with a JoinRel containing both operators through
from_substrait_plan, executes it, and asserts 6 rows (including
NULL=NULL matches)
3. Existing SLT (join_is_not_distinct_from.slt:179-205) — confirms the
SQL planner already exhibits the same demotion behavior that this PR
adds to the Substrait consumer

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
No API changes. Substrait plans with mixed equal/is_not_distinct_from
join predicates now correctly preserve null-safe semantics instead of
silently dropping NULL-matching rows.
…ache#21183)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123. -->

- Closes #.

## Rationale for this change

The rewriter actually has 3 responsibilities:
1. Index remapping — column indices in expressions may not match the
file schema
  2. Type casting — when logical and physical field types differ
3. Missing column handling — replacing references to absent columns with
nulls

Do not use cycles for schema rewrite if predicate is not set or logic
schema equal to physical schema

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes. -->

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

- Closes #.

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
… schema for … (apache#21451)

…spill files

(cherry picked from commit e133dd3)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

- Closes #.

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

Co-authored-by: Marko Grujic <markoog@gmail.com>
…park bitmap/… (apache#21452)

…math modules

(cherry picked from commit 39fb9cc)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

- Closes #.

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

Co-authored-by: David López <hola@devel0pez.com>
…ion pushdown (apache#21492)

(cherry picked from commit 330d57f)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

- Closes #.

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

Co-authored-by: Huaijin <haohuaijin@gmail.com>
- Part of apache#21079
- Closes apache#21155 on branch-53

This PR:
- Backports apache#21439 from
@timsaucer to the branch-53 line

Co-authored-by: Tim Saucer <timsaucer@gmail.com>
apache#20658) (apache#21523)

- Part of apache#21079
- Closes apache#20905 on branch-53

This PR:
- Backports apache#21358 from @alamb
to the branch-53 line

Co-authored-by: Viktor Yershov <viktor@spice.ai>
…#21587)

- Part of apache#21079

# Rationale
@comphead notes that `cargo audit` is failing on
apache#21559 (comment)

I previously fixed something similar on `branch-52`: 
- apache#21415

# Changes 
- Backports the cargo audit dependency updates from apache#21415 to branch-53
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

- Closes #.

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Prefer the finer timestamp unit when coercing mixed timestamp units, and
check overflow when casting timestamps to finer precision for both scalar
and array values.
…coercion-precision

fix: preserve timestamp precision during coercion
* Allow dynamic filter pushdown for left join

* Backport join dynamic filter gate

Signed-off-by: discord9 <discord9@163.com>

---------

Signed-off-by: discord9 <discord9@163.com>
Signed-off-by: discord9 <discord9@163.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.