Skip to content

PushDownFilter claims to be preserving predicate order but in fact reverses it #21642

@joroKr21

Description

@joroKr21

Describe the bug

The PushDownFilter logical optimizer rule has ApplyOrder::TopDown and it claims to preserve filter oder:
"use IndexSet to remove dupes while preserving predicate order" but this code is actually reversing the order of multiple filter plan nodes (as opposed to a single filter with a conjunction expression):

                let parents_predicates = split_conjunction_owned(filter.predicate);

                // remove duplicated filters
                let child_predicates = split_conjunction_owned(child_filter.predicate);
                let new_predicates = parents_predicates
                    .into_iter()
                    .chain(child_predicates)
                    // use IndexSet to remove dupes while preserving predicate order
                    .collect::<IndexSet<_>>()
                    .into_iter()
                    .collect::<Vec<_>>();

Note that the execution order of the unoptimized plan would be - child filters first (as inner nodes are the input of outer nodes) and then parent filters. But the optimized filter predicates are parent filters first, then child filters.

To Reproduce

If I make a small test change:

diff --git a/datafusion/optimizer/src/push_down_filter.rs b/datafusion/optimizer/src/push_down_filter.rs
index a1a636cfef..1f8dbc91ce 100644
--- a/datafusion/optimizer/src/push_down_filter.rs
+++ b/datafusion/optimizer/src/push_down_filter.rs
@@ -3239,7 +3239,8 @@ mod tests {
             vec![col("a").eq(lit(10i64)), col("b").gt(lit(11i64))],
             Some(vec![0]),
         )?
-        .filter(and(col("a").eq(lit(10i64)), col("b").gt(lit(11i64))))?
+        .filter(col("a").eq(lit(10i64)))?
+        .filter(col("b").gt(lit(11i64)))?
         .project(vec![col("a"), col("b")])?
         .build()?;

the predicates are reversed:

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Snapshot Summary ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Snapshot: multi_combined_filter
Source: datafusion/optimizer/src/push_down_filter.rs:3247
────────────────────────────────────────────────────────────────────────────────
Expression: optimized_plan
────────────────────────────────────────────────────────────────────────────────
-old snapshot
+new results
────────────┬───────────────────────────────────────────────────────────────────
    1     1 │ Projection: a, b
    2       │-  Filter: a = Int64(10) AND b > Int64(11)
          2 │+  Filter: b > Int64(11) AND a = Int64(10)
    3     3 │     TableScan: test projection=[a], partial_filters=[a = Int64(10), b > Int64(11)]
────────────┴───────────────────────────────────────────────────────────────────

Expected behavior

I expect PushDownFilter to preserve the execution order of predicates as in the unoptimized plan.

Additional context

In environments where we don't have information about the filter selectivity, the user might have domain knowledge and put the more selective filters first, expecting a boost in query performance. But counterintuitively, using two filters instead of one filter with a conjunction, reverses the execution order.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions