feat: expose dynamic filter bloom helper APIs#9
Draft
discord9 wants to merge 54 commits into
Draft
Conversation
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes apache#123` indicates that this PR will close issue apache#123. --> Preparation for DataFusion release 53.0.0 - apache#19692 ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
…apache#20680) ## Which issue does this PR close? N/A ## Rationale for this change Backport for apache#20625 When enabling the `recursive_protection` feature for the `datafusion` crate, the `sql` feature is enabled. This is undesirable if the downstream project would like the `sql` feature to be off. ## What changes are included in this PR? Use the `?` syntax for features of dependencies for `recursive_protection`. This was already correctly done for other features such as `unicode_expressions`. <https://doc.rust-lang.org/cargo/reference/features.html> ## Are these changes tested? N/A ## Are there any user-facing changes? This makes dependency management better for downstream projects and is not a breaking change. ## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes apache#123` indicates that this PR will close issue apache#123. --> - Closes #. ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. --> Co-authored-by: Heran Lin <heran@lakesail.com>
…he#20677) (apache#20700) Backport apache#20677 to the 53 release branch.
…) queries (#… (apache#20726) …20710) ## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes apache#123` indicates that this PR will close issue apache#123. --> - Closes apache#20669 . ## Rationale for this change Return probe_side.len() for count(*) queries <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## Are these changes tested? slt tests <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. --> ## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes apache#123` indicates that this PR will close issue apache#123. --> - Closes #. ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
) ## Which issue does this PR close? - Closes apache#20704 ## Rationale for this change FFI_TableOptions fails with a warning that is getting swallowed in the unit tests. ## What changes are included in this PR? Correctly check format for table options. ## Are these changes tested? Unit tests updated. ## Are there any user-facing changes? None, internal only. ## Context Related to apache#20705 but targetting `branch-53`.
…tor::Colon` (apache#20717) ## Which issue does this PR close? - part of apache#19692 <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes apache#123` indicates that this PR will close issue apache#123. --> - Needed in datafusion-contrib/datafusion-variant#26 - Backport of apache#20628 on v53 branch. ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> - `sqlparser-rs` currently exposes the colon operator (`:`) as a special `JsonAccess` expression. So it fails in datafusion's parsing before an `ExprPlanner` is even invoked. - Add `Operator::Colon`. Currently it's not used/implemented in datafusion. ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> - Fixes the above problem by converting `JsonAccess` to a normal binary expr, on which the `ExprPlanner` is invoked and custom parsing can be done. ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> Added tests. Also did a prototype of a custom `ExprPlanner` in datafusion-variant using this to convert colon operator to `variant_get` function - datafusion-contrib/datafusion-variant#31 ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. --> Add `Operator::Colon`
…size() to reduce memory pool interactions (apache#20733) Backport apache#20729 to `branch-53`.
…20672) (apache#20792) - Part of apache#19692 - Closes apache#20683 on branch-53 **This PR:** - Backports apache#20672 from @xanderbailey to the [branch-53] Co-authored-by: Xander <zander181@googlemail.com>
…flatten keys) (apache#20505) (apache#20791) - Part of apache#19692 - Closes apache#20696 on branch-53 **This PR:** - Backports apache#20505 from @alamb to the [branch-53] Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
…pache#20887) - Part of apache#19692 This PR: - Backports apache#20609 from @comphead to the branch-53 line Co-authored-by: Oleks V <comphead@users.noreply.github.com>
…ing (apache#20635) (apache#20885) - Part of apache#19692 - Closes apache#20634 on branch-53 This PR: - Backports apache#20635 from @neilconway to the branch-53 line Co-authored-by: Neil Conway <neil.conway@gmail.com>
…t inner filter proves zero selectivity (apache#20743) (apache#20882) - Part of apache#19692 - Closes apache#20742 on branch-53 This PR: - Backports apache#20743 from @haohuaijin to the branch-53 line Co-authored-by: Huaijin <haohuaijin@gmail.com>
…truct cols (apache#20698) (apache#20884) - Part of apache#19692 - Closes apache#20695 on branch-53 This PR: - Backports apache#20698 from @friendlymatthew to the branch-53 line Co-authored-by: Matthew Kim <38759997+friendlymatthew@users.noreply.github.com>
- Part of apache#19692 - Closes apache#20737 on branch-53 This PR: - Backports apache#20738 from @haohuaijin to the branch-53 line Co-authored-by: Huaijin <haohuaijin@gmail.com>
…he#20424) (apache#20890) - Part of apache#19692 This PR: - Backports apache#20424 from @ariel-miculas to the branch-53 line Co-authored-by: Ariel Miculas-Trif <ariel.miculas@gmail.com>
…tafusion-proto (apache#20574) (apache#20891) - Part of apache#19692 - Closes apache#20575 on branch-53 This PR: - Backports apache#20574 from @nathanb9 to the branch-53 line Co-authored-by: nathan <56370526+nathanb9@users.noreply.github.com>
…quired (apache#20900) (apache#20903) - Part of apache#19692 - Backports apache#20900 from @askalt to the branch-53 line This PR: - Backports apache#20900 to branch-53 Co-authored-by: Albert Skalt <133099191+askalt@users.noreply.github.com>
…ith ANSI mode support (apache#20461) (apache#20896) - Part of apache#19692 This PR: - Backports apache#20461 from @davidlghellin to the branch-53 line Co-authored-by: David López <hola@devel0pez.com>
…ing (apache#20372) (apache#20893) - Part of apache#19692 - Closes apache#20371 on branch-53 This PR: - Backports apache#20372 from @erenavsarogullari to the branch-53 line Co-authored-by: Eren Avsarogullari <eren@apache.org> Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>
…pache#19739) (apache#20897) - Part of apache#19692 - Closes apache#16281 on branch-53 This PR: - Backports apache#19739 from @kumarUjjawal to the branch-53 line Co-authored-by: Kumar Ujjawal <ujjawalpathak6@gmail.com>
…ushed down into TableScan (apache#19884) (apache#20898) - Part of apache#19692 - Closes apache#19840 on branch-53 This PR: - Backports apache#19884 from @kosiew to the branch-53 line Co-authored-by: kosiew <kosiew@gmail.com>
…ry in ForeignTableProvider::scan (apache#20393) (apache#20895) - Part of apache#19692 This PR: - Backports apache#20393 from @Kontinuation to the branch-53 line Co-authored-by: Kristin Cowalcijk <bo@wherobots.com>
…LL) (apache#20391) (apache#20892) - Part of apache#19692 - Closes apache#20388 on branch-53 This PR: - Backports apache#20391 from @fwojciec to the branch-53 line Co-authored-by: Filip Wojciechowski <fwojciec@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…apache#20685) (apache#20914) ## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes apache#123` indicates that this PR will close issue apache#123. --> - Closes apache#20611 . ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> ## What changes are included in this PR? The Spark function is actual wrapper on top of `array_has` function. After result is being produced the nulls mask is set respectively for the output indices which correspond to input rows having nulls <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. --> ## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes apache#123` indicates that this PR will close issue apache#123. --> - Closes #. ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
…ache#20724) (apache#20858) (apache#20918) - Part of apache#20724 - Closes apache#20724 on branch-53 This PR: - Backports apache#20858 from @gboucher90 to the branch-53 line Co-authored-by: gboucher90 <gboucher90@users.noreply.github.com>
…filter (apache#20231) (apache#20932) - Part of apache#19692 - Closes apache#20194 on branch-53 This PR: - Backports apache#20231 from @EeshanBembi to the branch-53 line Co-authored-by: EeshanBembi <33062610+EeshanBembi@users.noreply.github.com>
## Which issue does this PR close? - Part of apache#19692 ## Rationale for this change We have backported a bunch of bug fixes to branch-53, so let's make sure the release notes reflect that ## What changes are included in this PR? I ran ```shell uv run ./dev/release/generate-changelog.py 52.3.0 branch-53 53.0.0 > dev/changelog/53.0.0.md ``` And then had codex review via ``` › Please review dev/changelog/53.0.0.md to ensure it reflects all commits between where `apache/branch-53` and `main` diverged ``` Then I updated the change log to reflect the original authors not the backport authors ## Are these changes tested? By CI
…0987) (apache#20991) - Part of apache#19692 - Closes apache#20992 on branch-53 This PR: - Backports apache#20987 from @alamb to the branch-53 line
…e#20978) (apache#21000) ## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes apache#123` indicates that this PR will close issue apache#123. --> Part of: apache#19692 ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. --> ## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes apache#123` indicates that this PR will close issue apache#123. --> - Closes #. ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. --> Co-authored-by: Daniël Heres <danielheres@gmail.com>
…to prevent overflows (apache#20998) (apache#21008) (cherry picked from commit e74e58f) I confirmed that this fixed CometFuzzTestSuite "join" which found the original issue while testing `branch-53`. ## Which issue does this PR close? - Closes apache#20995. ## Rationale for this change apache#20995 has details but it is very straightforward. `dense_ratio` calculation overflows since overflow guard is after not before ## What changes are included in this PR? Prevent hash join overflow and unit test for it - backports apache#20998 from @buraksenn ## Are these changes tested? Added a test case for both min and max scenario Co-authored-by: Burak Şen <buraksenb@gmail.com>
…1016) ## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes apache#123` indicates that this PR will close issue apache#123. --> - Closes apache#21011 . ## Rationale for this change Handle correctly `array_remove_*` functions if NULL is a value to delete <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. --> (cherry picked from commit 6ab16cc) ## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes apache#123` indicates that this PR will close issue apache#123. --> - Closes #. ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
…che#20962) (apache#20996) - Part of apache#19692 - Closes apache#20996 on branch-53 This PR: - Backports apache#20962 from @erratic-pattern to the branch-53 line - Backports the related tests from apache#20960 Co-authored-by: Adam Curtis <adam.curtis.dev@gmail.com>
- Port of apache#21084 to 53 - Related to apache#21079 Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
…elds (apache#21057) (apache#21142) ## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes apache#123` indicates that this PR will close issue apache#123. --> - Related to apache#21063 - Related to apache#21079 ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> Currently if we see a filter with a limit underneath, we don't push the filter past the limit. However, sort nodes and table scan nodes can have fetch fields which do essentially the same thing, and we don't stop filters being pushed past them. This is a correctness bug that can lead to undefined behaviour. I added checks for exactly this condition so we don't push the filter down. I think the prior expectation was that there would be a limit node between any of these nodes, but this is also not true. In `push_down_limit.rs`, there's code that does this optimisation when a limit has a sort under it: ```rust LogicalPlan::Sort(mut sort) => { let new_fetch = { let sort_fetch = skip + fetch; Some(sort.fetch.map(|f| f.min(sort_fetch)).unwrap_or(sort_fetch)) }; if new_fetch == sort.fetch { if skip > 0 { original_limit(skip, fetch, LogicalPlan::Sort(sort)) } else { Ok(Transformed::yes(LogicalPlan::Sort(sort))) } } else { sort.fetch = new_fetch; limit.input = Arc::new(LogicalPlan::Sort(sort)); Ok(Transformed::yes(LogicalPlan::Limit(limit))) } } ``` The first time this runs, it sets the internal fetch of the sort to new_fetch, and on the second optimisation pass it hits the branch where we just get rid of the limit node altogether, leaving the sort node exposed to potential filters which can now push down into it. There is also a related fix in `gather_filters_for_pushdown` in `SortExec`, which does the same thing for physical plan nodes. If we see that a given execution plan has non-empty fetch, it should not allow any parent filters to be pushed down. ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> Added checks in the optimisation rule to avoid pushing filters past children with built-in limits. ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> Yes: - Unit tests in `push_down_filter.rs` - Fixed an existing test in `window.slt` - Unit tests for the physical plan change in `sort.rs` - New slt test in `push_down_filter_sort_fetch.slt` for this exact behaviour ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. --> No Co-authored-by: Shiv Bhatia <shivbhatia10@gmail.com> Co-authored-by: Shiv Bhatia <sbhatia@palantir.com>
…oin keys (apache#21121) (apache#21162) ## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes apache#123` indicates that this PR will close issue apache#123. --> - Related to apache#21124 - Related to apache#21079 ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> When a Substrait join expression contains both equal and is_not_distinct_from predicates (e.g. Spark pushes a null-safe filter into a join that already has a regular equality key), the `split_eq_and_noneq_join_predicate_with_nulls_equality` function uses a single `nulls_equal_nulls` boolean that gets overwritten per-predicate. Whichever operator is processed last determines the `NullEquality` for all keys, silently dropping NULL-matching rows. Since NullEquality is a join-level setting (not per-key) across all physical join implementations (HashJoinExec, SortMergeJoinExec, SymmetricHashJoinExec), the correct fix is to match DataFusion's own SQL planner behavior: demote IS NOT DISTINCT FROM keys to the join filter when mixed with Eq keys. This is already correctly handled for SQL as shown in [join_is_not_distinct_from.slt:L188](https://sourcegraph.com/r/github.com/apache/datafusion@2b7d4f9a5b005905b23128274ad37c3306ffcd15/-/blob/datafusion/sqllogictest/test_files/join_is_not_distinct_from.slt?L188) ``` # Test mixed equal and IS NOT DISTINCT FROM conditions # The `IS NOT DISTINCT FROM` expr should NOT in HashJoin's `on` predicate query TT EXPLAIN SELECT t1.id AS t1_id, t2.id AS t2_id, t1.val, t2.val FROM t1 JOIN t2 ON t1.id = t2.id AND t1.val IS NOT DISTINCT FROM t2.val ---- logical_plan 01)Projection: t1.id AS t1_id, t2.id AS t2_id, t1.val, t2.val 02)--Inner Join: t1.id = t2.id Filter: t1.val IS NOT DISTINCT FROM t2.val 03)----TableScan: t1 projection=[id, val] 04)----TableScan: t2 projection=[id, val] ``` ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> `datafusion/substrait/src/logical_plan/consumer/rel/join_rel.rs`: - Collect eq_keys and indistinct_keys separately instead of using a single vec with an overwritable boolean - When both are present (mixed case), use eq_keys as equijoin keys with NullEqualsNothing and reconstruct the IsNotDistinctFrom expressions into the join filter - Return NullEquality directly instead of converting from bool ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> Yes, three levels of coverage: 1. Unit tests (join_rel.rs) — directly assert the output of split_eq_and_noneq_join_predicate_with_nulls_equality for eq-only, indistinct-only, mixed, and non-column-operand cases 2. Integration test (consumer_integration.rs) — loads a JSON-encoded Substrait plan with a JoinRel containing both operators through from_substrait_plan, executes it, and asserts 6 rows (including NULL=NULL matches) 3. Existing SLT (join_is_not_distinct_from.slt:179-205) — confirms the SQL planner already exhibits the same demotion behavior that this PR adds to the Substrait consumer ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. --> No API changes. Substrait plans with mixed equal/is_not_distinct_from join predicates now correctly preserve null-safe semantics instead of silently dropping NULL-matching rows.
…ache#21183) ## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes apache#123` indicates that this PR will close issue apache#123. --> - Closes #. ## Rationale for this change The rewriter actually has 3 responsibilities: 1. Index remapping — column indices in expressions may not match the file schema 2. Type casting — when logical and physical field types differ 3. Missing column handling — replacing references to absent columns with nulls Do not use cycles for schema rewrite if predicate is not set or logic schema equal to physical schema <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. --> ## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes apache#123` indicates that this PR will close issue apache#123. --> - Closes #. ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
… schema for … (apache#21451) …spill files (cherry picked from commit e133dd3) ## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes apache#123` indicates that this PR will close issue apache#123. --> - Closes #. ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. --> Co-authored-by: Marko Grujic <markoog@gmail.com>
…park bitmap/… (apache#21452) …math modules (cherry picked from commit 39fb9cc) ## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes apache#123` indicates that this PR will close issue apache#123. --> - Closes #. ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. --> Co-authored-by: David López <hola@devel0pez.com>
…ion pushdown (apache#21492) (cherry picked from commit 330d57f) ## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes apache#123` indicates that this PR will close issue apache#123. --> - Closes #. ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. --> Co-authored-by: Huaijin <haohuaijin@gmail.com>
- Part of apache#21079 - Closes apache#21155 on branch-53 This PR: - Backports apache#21439 from @timsaucer to the branch-53 line Co-authored-by: Tim Saucer <timsaucer@gmail.com>
apache#20658) (apache#21523) - Part of apache#21079 - Closes apache#20905 on branch-53 This PR: - Backports apache#21358 from @alamb to the branch-53 line Co-authored-by: Viktor Yershov <viktor@spice.ai>
…#21587) - Part of apache#21079 # Rationale @comphead notes that `cargo audit` is failing on apache#21559 (comment) I previously fixed something similar on `branch-52`: - apache#21415 # Changes - Backports the cargo audit dependency updates from apache#21415 to branch-53
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes apache#123` indicates that this PR will close issue apache#123. --> - Closes #. ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. --> --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Prefer the finer timestamp unit when coercing mixed timestamp units, and check overflow when casting timestamps to finer precision for both scalar and array values.
…coercion-precision fix: preserve timestamp precision during coercion
* Allow dynamic filter pushdown for left join * Backport join dynamic filter gate Signed-off-by: discord9 <discord9@163.com> --------- Signed-off-by: discord9 <discord9@163.com>
Signed-off-by: discord9 <discord9@163.com>
This was referenced Jun 24, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What's changed
This PR exposes small internal helper APIs needed by GreptimeDB remote dynamic filter Bloom encoding:
DynamicFilterPhysicalExpr::is_complete()so callers can check completion state without awaiting;HashTableLookupExpr::on_columns()andseeds();HashTableLookupExpr::distinct_join_hash_count()andvisit_distinct_join_hashes(...)forMap::HashMap;JoinHashMapType::visit_hashes(...)implementations for hash-map based join maps.Map::ArrayMapreturnsOk(false)for distinct-hash visitation so callers can safely fall back instead of approximating.Why
GreptimeDB RDF Bloom remote dynamic filters need to build a compact Bloom payload from the existing join hash lookup state without collecting all hashes into a temporary vector or depending on private fields.
Validation
cargo fmt --all -- --checkcargo test -p datafusion-physical-plan dynamic_filter --libcargo test -p datafusion-physical-plan visit_hashes --libcargo test -p datafusion-physical-plan hash_table_lookup_expr --lib