Skip to content

Gate new ScalarSubqueryExec node behind session property#22530

Open
LiaCastaneda wants to merge 7 commits into
apache:mainfrom
LiaCastaneda:scalar-subquery-physical-exec-flag
Open

Gate new ScalarSubqueryExec node behind session property#22530
LiaCastaneda wants to merge 7 commits into
apache:mainfrom
LiaCastaneda:scalar-subquery-physical-exec-flag

Conversation

@LiaCastaneda
Copy link
Copy Markdown
Contributor

@LiaCastaneda LiaCastaneda commented May 26, 2026

Which issue does this PR close?

Related to discussion on #21240 and #21080 (comment).

PR #21240 introduced ScalarSubqueryExec / ScalarSubqueryExpr to execute uncorrelated scalar subqueries during physical execution. The two communicate via shared in process state (a slot in ExecutionProps), which breaks distributed execution that may split execution across a network boundary between the producer (ScalarSubqueryExec) and the consumer expression (ScalarSubqueryExpr). See more details on this explanation in datafusion-contrib/datafusion-distributed#460

What changes are included in this PR?

Adds a new optimizer config option datafusion.optimizer.enable_physical_uncorrelated_scalar_subquery (default true, preserving the current behavior). When true (default), behavior is unchanged from current main; when false, all scalar subqueries are rewritten to left joins by ScalarSubqueryToJoin and ScalarSubqueryExec is never constructed (which was the previous behavior).

Are these changes tested?

Yes all tests pass and added uncorrelated_scalar_subquery_rewritten_when_flag_off to test the negative case.

Are there any user-facing changes?

Yes, a new config option datafusion.optimizer.physical_uncorrelated_scalar_subquery (this just changes the way the query is executed but not the results)

@github-actions github-actions Bot added documentation Improvements or additions to documentation optimizer Optimizer rules core Core DataFusion crate common Related to common crate labels May 26, 2026
@LiaCastaneda LiaCastaneda force-pushed the scalar-subquery-physical-exec-flag branch from 1c7af79 to 4f23ed0 Compare May 26, 2026 14:11
@LiaCastaneda LiaCastaneda force-pushed the scalar-subquery-physical-exec-flag branch from 4f23ed0 to 8416100 Compare May 26, 2026 14:16
@github-actions github-actions Bot added the sqllogictest SQL Logic Tests (.slt) label May 26, 2026
@LiaCastaneda LiaCastaneda force-pushed the scalar-subquery-physical-exec-flag branch from 6e88a1c to ddc20cd Compare May 26, 2026 14:54
@LiaCastaneda LiaCastaneda marked this pull request as ready for review May 26, 2026 15:16
Copy link
Copy Markdown
Contributor

@gabotechs gabotechs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks @LiaCastaneda. Is there a chance you could take a look at this one @neilconway?

Comment thread datafusion/sqllogictest/test_files/subquery.slt
@LiaCastaneda
Copy link
Copy Markdown
Contributor Author

LiaCastaneda commented May 26, 2026

I also ran the tpch queries in my local with the flag turned off (old path), all results match. Maybe it's worth adding it as part of the regular checks

edit: added here d1b9dad

Copy link
Copy Markdown
Contributor

@neilconway neilconway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this makes sense as an interim measure if it will be too difficult to adapt df-distributed and/or ballista in the short-term, but long-term I'd prefer not to have a config option that silently produces incorrect query results. Can we add a note that disabling this is not recommended, and that we plan to remove the config option in the future -- say in a few DF releases from now?

Comment thread datafusion/sqllogictest/test_files/subquery.slt Outdated
Comment thread datafusion/common/src/config.rs Outdated
Comment thread datafusion/optimizer/src/scalar_subquery_to_join.rs Outdated
@milenkovicm
Copy link
Copy Markdown
Contributor

thank you @LiaCastaneda , @gabotechs & @neilconway for driving this

@LiaCastaneda LiaCastaneda force-pushed the scalar-subquery-physical-exec-flag branch from dbb8450 to 4523e07 Compare May 27, 2026 09:30
@LiaCastaneda
Copy link
Copy Markdown
Contributor Author

LiaCastaneda commented May 27, 2026

Can we add a note that disabling this is not recommended, and that we plan to remove the config option in the future -- say in a few DF releases from now?

Makes sense, I will also create an issue to keep track on this and not forget

@alamb
Copy link
Copy Markdown
Contributor

alamb commented May 28, 2026

Can we add a note that disabling this is not recommended, and that we plan to remove the config option in the future -- say in a few DF releases from now?

Makes sense, I will also create an issue to keep track on this and not forget

Looks like it is

Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went over this PR carefully and it looks good to me. Thank you @LiaCastaneda and @neilconway

Comment thread datafusion/common/src/config.rs
/// physical execution. When set to false, all scalar subqueries
/// (including uncorrelated ones) are rewritten to left joins by the
/// `ScalarSubqueryToJoin` optimizer rule.
///
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my understanding, if this flag is enabled does it

  1. restore DataFusion 53 behavior (which can be wrong in some cases)
  2. Introduce some new ways for incorrect results?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

restore DataFusion 53 behavior (which can be wrong in some cases)

yes

Introduce some new ways for incorrect results?

It just introduces the incorrect results/limitations that DataFusion 53 already had:

  • no support of scalar subqueries in order by and join on expressions.
  • when the subquery returns more than 1 row DF 53 does not throw an error and instead returns wrong results.

Comment thread datafusion/optimizer/src/scalar_subquery_to_join.rs Outdated
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 29, 2026

Thank you for opening this pull request!

Reviewer note: cargo-semver-checks reported the current version number is not SemVer-compatible with the changes in this pull request (compared against the base branch).

Details
     Cloning apache/main
    Building datafusion v53.1.0 (current)
       Built [ 105.005s] (current)
     Parsing datafusion v53.1.0 (current)
      Parsed [   0.034s] (current)
    Building datafusion v53.1.0 (baseline)
       Built [  96.508s] (baseline)
     Parsing datafusion v53.1.0 (baseline)
      Parsed [   0.034s] (baseline)
    Checking datafusion v53.1.0 -> v53.1.0 (no change; assume patch)
     Checked [   0.566s] 222 checks: 222 pass, 30 skip
     Summary no semver update required
    Finished [ 204.187s] datafusion
    Building datafusion-common v53.1.0 (current)
       Built [  33.629s] (current)
     Parsing datafusion-common v53.1.0 (current)
      Parsed [   0.057s] (current)
    Building datafusion-common v53.1.0 (baseline)
       Built [  34.137s] (baseline)
     Parsing datafusion-common v53.1.0 (baseline)
      Parsed [   0.058s] (baseline)
    Checking datafusion-common v53.1.0 -> v53.1.0 (no change; assume patch)
     Checked [   0.653s] 222 checks: 221 pass, 1 fail, 0 warn, 30 skip

--- failure constructible_struct_adds_field: externally-constructible struct adds field ---

Description:
A pub struct constructible with a struct literal has a new pub field. Existing struct literals must be updated to include the new field.
        ref: https://doc.rust-lang.org/reference/expressions/struct-expr.html
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.47.0/src/lints/constructible_struct_adds_field.ron

Failed in:
  field OptimizerOptions.enable_physical_uncorrelated_scalar_subquery in /home/runner/work/datafusion/datafusion/datafusion/common/src/config.rs:1085

     Summary semver requires new major version: 1 major and 0 minor checks failed
    Finished [  69.910s] datafusion-common
    Building datafusion-optimizer v53.1.0 (current)
       Built [  27.452s] (current)
     Parsing datafusion-optimizer v53.1.0 (current)
      Parsed [   0.031s] (current)
    Building datafusion-optimizer v53.1.0 (baseline)
       Built [  26.699s] (baseline)
     Parsing datafusion-optimizer v53.1.0 (baseline)
      Parsed [   0.031s] (baseline)
    Checking datafusion-optimizer v53.1.0 -> v53.1.0 (no change; assume patch)
     Checked [   0.158s] 222 checks: 222 pass, 30 skip
     Summary no semver update required
    Finished [  55.443s] datafusion-optimizer
    Building datafusion-sqllogictest v53.1.0 (current)
       Built [ 167.074s] (current)
     Parsing datafusion-sqllogictest v53.1.0 (current)
      Parsed [   0.022s] (current)
    Building datafusion-sqllogictest v53.1.0 (baseline)
       Built [ 164.764s] (baseline)
     Parsing datafusion-sqllogictest v53.1.0 (baseline)
      Parsed [   0.021s] (baseline)
    Checking datafusion-sqllogictest v53.1.0 -> v53.1.0 (no change; assume patch)
     Checked [   0.083s] 222 checks: 222 pass, 30 skip
     Summary no semver update required
    Finished [ 335.050s] datafusion-sqllogictest

@github-actions github-actions Bot added the auto detected api change Auto detected API change label May 29, 2026
Copy link
Copy Markdown
Contributor

@gabotechs gabotechs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 from me, unless @neilconway has any other concern, I think we can pull this in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto detected api change Auto detected API change common Related to common crate core Core DataFusion crate documentation Improvements or additions to documentation optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants