Skip to content

proto: serialize and dedupe dynamic filters#20416

Draft
jayshrivastava wants to merge 1 commit intoapache:mainfrom
jayshrivastava:js/dedupe-dynamic-filter-inner-state
Draft

proto: serialize and dedupe dynamic filters#20416
jayshrivastava wants to merge 1 commit intoapache:mainfrom
jayshrivastava:js/dedupe-dynamic-filter-inner-state

Conversation

@jayshrivastava
Copy link

@jayshrivastava jayshrivastava commented Feb 17, 2026

Which issue does this PR close?

Informs: datafusion-contrib/datafusion-distributed#180
Closes: #20418

Rationale for this change

Consider this scenario

  1. You have a plan with a HashJoinExec and DataSourceExec
  2. You run the physical optimizer and the DataSourceExec accepts DynamicFilterPhysicalExpr pushdown from the HashJoinExec
  3. You serialize the plan, deserialize it, and execute it

What should happen is that the dynamic filter should "work", meaning

  1. When you deserialize the plan, both the HashJoinExec and DataSourceExec should have pointers to the same DynamicFilterPhysicalExpr
  2. The DynamicFilterPhysicalExpr should be updated during execution by the HashJoinExec and the DataSourceExec should filter out rows

This does not happen today for a few reasons, a couple of which this PR aims to address

  1. DynamicFilterPhysicalExpr is not survive round-tripping. The internal exprs get inlined (ex. it may be serialized as Literal)
  2. Even if DynamicFilterPhysicalExpr survives round-tripping, during pushdown, it's often the case that the DynamicFilterPhysicalExpr is rewritten. In this case, you have two DynamicFilterPhysicalExpr which are different Arcs but share the same Inner dynamic filter state. The current DeduplicatingProtoConverter does not handle this specific form of deduping.

This PR aims to fix those problems by adding serde for DynamicFilterPhysicalExpr and deduping logic for the inner state of dynamic filters.

It does not yet add a test for the HashJoinExec and DataSourceExec filter pushdown case, but this is relevant follow up work. I tried to keep the PR small for reviewers.

Are these changes tested?

Yes, via unit tests.

Are there any user-facing changes?

DynamicFilterPhysicalExpr are now serialized by the default codec

@github-actions github-actions bot added physical-expr Changes to the physical-expr crates proto Related to proto crate labels Feb 17, 2026
@jayshrivastava jayshrivastava changed the title wip proto: serialize dynamic filters Feb 18, 2026
@jayshrivastava jayshrivastava changed the title proto: serialize dynamic filters proto: serialize and dedupe dynamic filters Feb 18, 2026
Informs: datafusion-contrib/datafusion-distributed#180
Closes: apache#20418

Consider this scenario
1. You have a plan with a `HashJoinExec` and `DataSourceExec`
2. You run the physical optimizer and the `DataSourceExec` accepts `DynamicFilterPhysicalExpr` pushdown from the `HashJoinExec`
3. You serialize the plan, deserialize it, and execute it

What should happen is that the dynamic filter should "work", meaning
1. When you deserialize the plan, both the `HashJoinExec` and `DataSourceExec` should have pointers to the same `DynamicFilterPhysicalExpr`
2. The `DynamicFilterPhysicalExpr` should be updated during execution by the `HashJoinExec`  and the `DataSourceExec` should filter out rows

This does not happen today for a few reasons, a couple of which this PR aims to address
1. `DynamicFilterPhysicalExpr` is not survive round-tripping. The internal exprs get inlined (ex. it may be serialized as `Literal`)
2. Even if `DynamicFilterPhysicalExpr` survives round-tripping, during pushdown, it's often the case that the `DynamicFilterPhysicalExpr` is rewritten. In this case, you have two `DynamicFilterPhysicalExpr` which are different `Arc`s but share the same `Inner` dynamic filter state. The current `DeduplicatingProtoConverter` does not handle this specific form of deduping.

This PR aims to fix those problems by adding serde for `DynamicFilterPhysicalExpr` and deduping logic for the inner state of dynamic filters.

It does not yet add a test for the `HashJoinExec` and `DataSourceExec` filter pushdown case, but this is relevant follow up work. I tried to keep the PR small for reviewers.

Yes, via unit tests.

`DynamicFilterPhysicalExpr` are now serialized by the default codec
@jayshrivastava jayshrivastava force-pushed the js/dedupe-dynamic-filter-inner-state branch from 158f2cf to e0c6be3 Compare February 18, 2026 00:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-expr Changes to the physical-expr crates proto Related to proto crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Serialize dynamic filters across network boundaries

1 participant