Skip to content

Gate new ScalarSubqueryExec node behind session property#22530

Open
LiaCastaneda wants to merge 5 commits into
apache:mainfrom
LiaCastaneda:scalar-subquery-physical-exec-flag
Open

Gate new ScalarSubqueryExec node behind session property#22530
LiaCastaneda wants to merge 5 commits into
apache:mainfrom
LiaCastaneda:scalar-subquery-physical-exec-flag

Conversation

@LiaCastaneda
Copy link
Copy Markdown
Contributor

@LiaCastaneda LiaCastaneda commented May 26, 2026

Which issue does this PR close?

Related to discussion on #21240 and #21080 (comment).

PR #21240 introduced ScalarSubqueryExec / ScalarSubqueryExpr to execute uncorrelated scalar subqueries during physical execution. The two communicate via shared in process state (a slot in ExecutionProps), which breaks distributed execution that may split execution across a network boundary between the producer (ScalarSubqueryExec) and the consumer expression (ScalarSubqueryExpr). See more details on this explanation in datafusion-contrib/datafusion-distributed#460

What changes are included in this PR?

Adds a new optimizer config option datafusion.optimizer.physical_uncorrelated_scalar_subquery (default true, preserving the current behavior). When true (default), behavior is unchanged from current main; when false, all scalar subqueries are rewritten to left joins by ScalarSubqueryToJoin and ScalarSubqueryExec is never constructed (which was the previous behavior).

Are these changes tested?

Yes all tests pass and added uncorrelated_scalar_subquery_rewritten_when_flag_off to test the negative case.

Are there any user-facing changes?

Yes, a new config option datafusion.optimizer.physical_uncorrelated_scalar_subquery (this just changes the way the query is executed but not the results)

@github-actions github-actions Bot added documentation Improvements or additions to documentation optimizer Optimizer rules core Core DataFusion crate common Related to common crate labels May 26, 2026
@LiaCastaneda LiaCastaneda force-pushed the scalar-subquery-physical-exec-flag branch from 1c7af79 to 4f23ed0 Compare May 26, 2026 14:11
@LiaCastaneda LiaCastaneda force-pushed the scalar-subquery-physical-exec-flag branch from 4f23ed0 to 8416100 Compare May 26, 2026 14:16
@github-actions github-actions Bot added the sqllogictest SQL Logic Tests (.slt) label May 26, 2026
@LiaCastaneda LiaCastaneda force-pushed the scalar-subquery-physical-exec-flag branch from 6e88a1c to ddc20cd Compare May 26, 2026 14:54
@LiaCastaneda LiaCastaneda marked this pull request as ready for review May 26, 2026 15:16
Copy link
Copy Markdown
Contributor

@gabotechs gabotechs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks @LiaCastaneda. Is there a chance you could take a look at this one @neilconway?

Comment thread datafusion/sqllogictest/test_files/subquery.slt
@LiaCastaneda
Copy link
Copy Markdown
Contributor Author

LiaCastaneda commented May 26, 2026

I also ran the tpch queries in my local with the flag turned off (old path), all results match. Maybe it's worth adding it as part of the regular checks

edit: added here d1b9dad

Copy link
Copy Markdown
Contributor

@neilconway neilconway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this makes sense as an interim measure if it will be too difficult to adapt df-distributed and/or ballista in the short-term, but long-term I'd prefer not to have a config option that silently produces incorrect query results. Can we add a note that disabling this is not recommended, and that we plan to remove the config option in the future -- say in a few DF releases from now?

Comment thread datafusion/sqllogictest/test_files/subquery.slt Outdated
Comment thread datafusion/common/src/config.rs Outdated
/// execution. When set to false, all scalar subqueries (including
/// uncorrelated ones) are rewritten to left joins by the
/// `ScalarSubqueryToJoin` optimizer rule.
pub physical_uncorrelated_scalar_subquery: bool, default = true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other similar config options use the phrasing enable_...; we should probably adopt that for consistency.

physical_uncorrelated_scalar_subquery is also a mouthful, although I can't immediately think of a more concise name that is also accurate.

Comment thread datafusion/optimizer/src/scalar_subquery_to_join.rs Outdated
@milenkovicm
Copy link
Copy Markdown
Contributor

thank you @LiaCastaneda , @gabotechs & @neilconway for driving this

@LiaCastaneda LiaCastaneda force-pushed the scalar-subquery-physical-exec-flag branch from dbb8450 to 4523e07 Compare May 27, 2026 09:30
@LiaCastaneda
Copy link
Copy Markdown
Contributor Author

LiaCastaneda commented May 27, 2026

Can we add a note that disabling this is not recommended, and that we plan to remove the config option in the future -- say in a few DF releases from now?

Makes sense, I will also create an issue to keep track on this and not forget

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common Related to common crate core Core DataFusion crate documentation Improvements or additions to documentation optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants