bench: Add IN list benchmarks for non-constant list expressions#20444
bench: Add IN list benchmarks for non-constant list expressions#20444zhangxffff wants to merge 3 commits intoapache:mainfrom
Conversation
adriangb
left a comment
There was a problem hiding this comment.
Some minor docstring improvements
| } | ||
|
|
||
| const IN_LIST_LENGTHS: [usize; 4] = [3, 8, 28, 100]; | ||
| const DYNAMIC_LIST_LENGTHS: [usize; 3] = [3, 8, 28]; |
There was a problem hiding this comment.
Does ('a', 1, 123.24) also force this "dynamic" path? If so would use the term "heterogeneous" for that. If not and it's only columns that trigger this code path I would use the term "LIST_WITH_COLUMNS_LENGTHS.
There was a problem hiding this comment.
No, only column references trigger this code path. Heterogeneous literals like 1 IN ('a', 1, 123.24) are type-coerced and still go through the static (HashSet) path. Renamed to LIST_WITH_COLUMNS_LENGTHS, and also renamed all related functions/benchmark names to remove the "dynamic" terminology.
| /// Benchmarks the dynamic evaluation path (no static filter) by including | ||
| /// a column reference in the IN list, which prevents static filter creation. |
There was a problem hiding this comment.
It would be nice to show an example of how the arguments to this function map to the equivalent SQL being benchmarked.
There was a problem hiding this comment.
Thanks for your advice, I have added equivalent SQL examples in docstring.
| }); | ||
| } | ||
|
|
||
| /// Benchmarks the dynamic IN list path for Int32 arrays with column references. |
There was a problem hiding this comment.
It would be nice to see examples in this docstring of what the SQL being benchmarked is, e.g.:
// select 1 in x from t;
// where t:
// create table t ...There was a problem hiding this comment.
Added equivalent SQL examples to both bench_with_columns_int32 and bench_with_columns_utf8:
/// Equivalent SQL:
/// ```sql
/// CREATE TABLE t (a INT, b0 INT, b1 INT, ...);
/// SELECT * FROM t WHERE a IN (b0, b1, ...);
/// ```
|
run benchmark in_list |
Which issue does this PR close?
Rationale for this change
The existing
in_listbenchmarks only cover the static filter path (constant literal lists), which uses HashSet lookup. There are no benchmarks for the dynamic evaluation path, triggered when the IN list contains non-constant expressions such as column references (e.g.,a IN (b, c, d)). Adding these benchmarks establishes a baseline for measuring the impact upcoming optimizations to the dynamic path. (see #20428).What changes are included in this PR?
Add criterion benchmarks for the dynamic IN list evaluation path:
bench_dynamic_int32: Int32 column references, list sizes [3, 8, 28] × match rates [0%, 50%, 100%] × null rates [0%, 20%]bench_dynamic_utf8: Utf8 column references, list sizes [3, 8, 28] × match rates [0%, 50%, 100%]Are these changes tested?
Yes. The benchmarks compile and run correctly. No implementation code is changed.
Are there any user-facing changes?