Skip to content

bench: Add IN list benchmarks for non-constant list expressions#20444

Open
zhangxffff wants to merge 3 commits intoapache:mainfrom
zhangxffff:bench-dynamic-in-list
Open

bench: Add IN list benchmarks for non-constant list expressions#20444
zhangxffff wants to merge 3 commits intoapache:mainfrom
zhangxffff:bench-dynamic-in-list

Conversation

@zhangxffff
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

The existing in_list benchmarks only cover the static filter path (constant literal lists), which uses HashSet lookup. There are no benchmarks for the dynamic evaluation path, triggered when the IN list contains non-constant expressions such as column references (e.g., a IN (b, c, d)). Adding these benchmarks establishes a baseline for measuring the impact upcoming optimizations to the dynamic path. (see #20428).

What changes are included in this PR?

Add criterion benchmarks for the dynamic IN list evaluation path:

  • bench_dynamic_int32: Int32 column references, list sizes [3, 8, 28] × match rates [0%, 50%, 100%] × null rates [0%, 20%]
  • bench_dynamic_utf8: Utf8 column references, list sizes [3, 8, 28] × match rates [0%, 50%, 100%]

Are these changes tested?

Yes. The benchmarks compile and run correctly. No implementation code is changed.

Are there any user-facing changes?

Copy link
Contributor

@adriangb adriangb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor docstring improvements

}

const IN_LIST_LENGTHS: [usize; 4] = [3, 8, 28, 100];
const DYNAMIC_LIST_LENGTHS: [usize; 3] = [3, 8, 28];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does ('a', 1, 123.24) also force this "dynamic" path? If so would use the term "heterogeneous" for that. If not and it's only columns that trigger this code path I would use the term "LIST_WITH_COLUMNS_LENGTHS.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, only column references trigger this code path. Heterogeneous literals like 1 IN ('a', 1, 123.24) are type-coerced and still go through the static (HashSet) path. Renamed to LIST_WITH_COLUMNS_LENGTHS, and also renamed all related functions/benchmark names to remove the "dynamic" terminology.

Comment on lines 225 to 226
/// Benchmarks the dynamic evaluation path (no static filter) by including
/// a column reference in the IN list, which prevents static filter creation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to show an example of how the arguments to this function map to the equivalent SQL being benchmarked.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your advice, I have added equivalent SQL examples in docstring.

});
}

/// Benchmarks the dynamic IN list path for Int32 arrays with column references.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to see examples in this docstring of what the SQL being benchmarked is, e.g.:

// select 1 in x from t;
// where t:
// create table t ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added equivalent SQL examples to both bench_with_columns_int32 and bench_with_columns_utf8:

  /// Equivalent SQL:
  /// ```sql
  /// CREATE TABLE t (a INT, b0 INT, b1 INT, ...);
  /// SELECT * FROM t WHERE a IN (b0, b1, ...);
  /// ```

@adriangb
Copy link
Contributor

run benchmark in_list

@zhangxffff zhangxffff changed the title bench: Add dynamic IN list benchmarks for non-constant list expressions bench: Add IN list benchmarks for non-constant list expressions Feb 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-expr Changes to the physical-expr crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments