Skip to content

fix: LIKE 'prefix%' pruning fails on Utf8View and LargeUtf8 columns#22562

Open
lyne7-sc wants to merge 2 commits into
apache:mainfrom
lyne7-sc:fix/pruning_predicate_like
Open

fix: LIKE 'prefix%' pruning fails on Utf8View and LargeUtf8 columns#22562
lyne7-sc wants to merge 2 commits into
apache:mainfrom
lyne7-sc:fix/pruning_predicate_like

Conversation

@lyne7-sc
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

LIKE 'prefix%' predicates on Utf8View and LargeUtf8 columns produce predicate_evaluation_errors, causing row group and page index pruning to be skipped entirely.

The cause is build_like_match always synthesizes bound literals as ScalarValue::Utf8, regardless of the actual column type. When the column is Utf8View or LargeUtf8, the subsequent comparison between the Utf8-typed bound and the min/max statistics (which use the column's native type) fails with a type mismatch error.

What changes are included in this PR?

  • Updated build_like_match to use string_literal_as with the column's data_type() instead of hardcoding ScalarValue::Utf8 for the lower/upper bound literals.
  • Added a regression test (prune_like_prefix) that verifies LIKE prefix pruning works correctly on UTF8 columns with expected row group statistics pruning and zero predicate errors.

Are these changes tested?

Yes.

Are there any user-facing changes?

No.

@github-actions github-actions Bot added the core Core DataFusion crate label May 27, 2026
Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me -- thank you @lyne7-sc

}

#[tokio::test]
async fn prune_like_prefix() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I verified that this test fails like this without the code chnage

thread 'parquet::row_group_pruning::prune_like_prefix' (83571793) panicked at datafusion/core/tests/parquet/row_group_pruning.rs:138:9:
assertion `left == right` failed: mismatched predicate_evaluation error
  left: Some(5)
 right: Some(0)
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

/// Wrap a string in a `Literal` whose `ScalarValue` matches `target_type`
fn string_literal_as(value: String, target_type: &DataType) -> Arc<dyn PhysicalExpr> {
let utf8 = ScalarValue::Utf8(Some(value));
let scalar = try_cast_literal_to_type(&utf8, target_type).unwrap_or(utf8);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

It is sad that this potentially results in a new allocation -- maybe as a follow on PR we can avoid the allocation in try_cast_literal_to_type

@alamb alamb added the performance Make DataFusion faster label May 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate performance Make DataFusion faster

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LIKE 'prefix%' pruning fails on Utf8View and LargeUtf8 columns

2 participants