bench: Add IN list benchmarks for non-constant list expressions by zhangxffff · Pull Request #20444 · apache/datafusion

zhangxffff · 2026-02-20T04:24:19Z

Which issue does this PR close?

Relates to Optimize IN list with columns evaluation with vectorized Arrow eq kernel #20427 .

Rationale for this change

The existing in_list benchmarks only cover the static filter path (constant literal lists), which uses HashSet lookup. There are no benchmarks for the dynamic evaluation path, triggered when the IN list contains non-constant expressions such as column references (e.g., a IN (b, c, d)). Adding these benchmarks establishes a baseline for measuring the impact upcoming optimizations to the dynamic path. (see #20428).

What changes are included in this PR?

Add criterion benchmarks for the dynamic IN list evaluation path:

bench_dynamic_int32: Int32 column references, list sizes [3, 8, 28] × match rates [0%, 50%, 100%] × null rates [0%, 20%]
bench_dynamic_utf8: Utf8 column references, list sizes [3, 8, 28] × match rates [0%, 50%, 100%]

Are these changes tested?

Yes. The benchmarks compile and run correctly. No implementation code is changed.

Are there any user-facing changes?

adriangb

Some minor docstring improvements

adriangb · 2026-02-20T09:35:50Z

datafusion/physical-expr/benches/in_list.rs

 }

 const IN_LIST_LENGTHS: [usize; 4] = [3, 8, 28, 100];
+const DYNAMIC_LIST_LENGTHS: [usize; 3] = [3, 8, 28];


Does ('a', 1, 123.24) also force this "dynamic" path? If so would use the term "heterogeneous" for that. If not and it's only columns that trigger this code path I would use the term "LIST_WITH_COLUMNS_LENGTHS.

No, only column references trigger this code path. Heterogeneous literals like 1 IN ('a', 1, 123.24) are type-coerced and still go through the static (HashSet) path. Renamed to LIST_WITH_COLUMNS_LENGTHS, and also renamed all related functions/benchmark names to remove the "dynamic" terminology.

adriangb · 2026-02-20T09:41:02Z

datafusion/physical-expr/benches/in_list.rs

+/// Benchmarks the dynamic evaluation path (no static filter) by including
+/// a column reference in the IN list, which prevents static filter creation.


It would be nice to show an example of how the arguments to this function map to the equivalent SQL being benchmarked.

Thanks for your advice, I have added equivalent SQL examples in docstring.

adriangb · 2026-02-20T09:43:59Z

datafusion/physical-expr/benches/in_list.rs

+    });
+}
+
+/// Benchmarks the dynamic IN list path for Int32 arrays with column references.


It would be nice to see examples in this docstring of what the SQL being benchmarked is, e.g.:

// select 1 in x from t; // where t: // create table t ...

Added equivalent SQL examples to both bench_with_columns_int32 and bench_with_columns_utf8:

/// Equivalent SQL: /// ```sql /// CREATE TABLE t (a INT, b0 INT, b1 INT, ...); /// SELECT * FROM t WHERE a IN (b0, b1, ...); /// ```

adriangb · 2026-02-20T14:42:54Z

run benchmark in_list

bench: Add dynamic IN list benchmarks for non-constant list expressions

cb13f93

github-actions bot added the physical-expr Changes to the physical-expr crates label Feb 20, 2026

zhangxffff mentioned this pull request Feb 20, 2026

perf: Optimize dynamic IN list evaluation with vectorized Arrow eq kernel #20428

Open

adriangb approved these changes Feb 20, 2026

View reviewed changes

rename dynamic to list_with_columns and add equivalent sql

99faf91

adriangb approved these changes Feb 20, 2026

View reviewed changes

Merge branch 'main' into bench-dynamic-in-list

e071da7

zhangxffff changed the title ~~bench: Add dynamic IN list benchmarks for non-constant list expressions~~ bench: Add IN list benchmarks for non-constant list expressions Feb 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bench: Add IN list benchmarks for non-constant list expressions#20444

bench: Add IN list benchmarks for non-constant list expressions#20444
zhangxffff wants to merge 3 commits intoapache:mainfrom
zhangxffff:bench-dynamic-in-list

zhangxffff commented Feb 20, 2026

Uh oh!

adriangb left a comment

Uh oh!

adriangb Feb 20, 2026

Uh oh!

zhangxffff Feb 20, 2026

Uh oh!

adriangb Feb 20, 2026

Uh oh!

zhangxffff Feb 20, 2026

Uh oh!

adriangb Feb 20, 2026

Uh oh!

zhangxffff Feb 20, 2026

Uh oh!

adriangb commented Feb 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

		/// Benchmarks the dynamic evaluation path (no static filter) by including
		/// a column reference in the IN list, which prevents static filter creation.

Conversation

zhangxffff commented Feb 20, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

adriangb left a comment

Choose a reason for hiding this comment

Uh oh!

adriangb Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

zhangxffff Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

adriangb Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

zhangxffff Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

adriangb Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

zhangxffff Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

adriangb commented Feb 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments