fix: clear handled OFFSET before child recursion in LimitPushdown#22525
Conversation
| @@ -989,3 +989,34 @@ c-4 | |||
|
|
|||
| statement ok | |||
| DROP TABLE t21176; | |||
|
|
|||
| # Regression test for https://github.com/apache/datafusion/issues/22489 | |||
There was a problem hiding this comment.
I don't think this guards against the issue:
I reverted the code change locally
diff --git a/datafusion/physical-optimizer/src/limit_pushdown.rs b/datafusion/physical-optimizer/src/limit_pushdown.rs
index 63c4f21bd9..6164d86e53 100644
--- a/datafusion/physical-optimizer/src/limit_pushdown.rs
+++ b/datafusion/physical-optimizer/src/limit_pushdown.rs
@@ -375,14 +375,6 @@ pub(crate) fn pushdown_limits(
(new_node, global_state) = pushdown_limit_helper(new_node.data, global_state)?;
}
- // Once a limit has been materialized above the current node, child
- // subtrees should not inherit its `skip`. Keep `fetch`, but clear
- // `skip` before recursing so child-local limits are not merged with
- // an `OFFSET` that has already been applied.
- if global_state.satisfied {
- global_state.skip = 0;
- }
-
// Apply pushdown limits in children
let children = new_node.data.children();
let mut changed = false;And then I ran the tests:
cargo test --profile=ci --test sqllogictests
...
Running with 16 test threads (available parallelism: 16)
Completed 472 test files in 9 secondsThere was a problem hiding this comment.
Here is a proposed fix:
There was a problem hiding this comment.
Thanks you for this
alamb
left a comment
There was a problem hiding this comment.
Thank you @kumarUjjawal -- I don't fully understand the fix, but it does seem to fix the bug and I think the test coverage looks good
| expr: col("c1", &schema)?, | ||
| options: SortOptions { | ||
| descending: true, | ||
| nulls_first: false, |
There was a problem hiding this comment.
maybe worth calling out that this is a different sort order
There was a problem hiding this comment.
I think we should also add a test (or confirm one already exists) where the two sorts DO have the same sort and ensure the limit is still pushed
| // Once a limit has been materialized above the current node, child | ||
| // subtrees should not inherit its `skip`. Keep `fetch`, but clear | ||
| // `skip` before recursing so child-local limits are not merged with | ||
| // an `OFFSET` that has already been applied. |
There was a problem hiding this comment.
I am surprised this doesn't need to check for the actual sort keys too 🤔
Which issue does this PR close?
Rationale for this change
LimitPushdowncarriesGlobalRequirementswhile walking the physical plan.In the bad plan shape from #22489, an outer
OFFSETwas already handled abovea sort barrier, but its
skipstill remained in the state when recursioncontinued into the child subtree. That stale
skipthen merged with an innerLIMITand reduced its fetch incorrectly, which caused a grouped row to bedropped.
The fix is to clear
skiponce the limit requirement has already been handled,while keeping
fetchso valid limit pushdown into child sorts still happens.What changes are included in this PR?
skipbefore recursing into children when the limit requirement is already handled.fetchunchanged so valid TopK-style pushdown still works.OFFSET/ sort barrier / innerLIMITshape.Are these changes tested?
Yes
Are there any user-facing changes?
No API Change