Skip to content

fix/perf: Optimize ~! '.*' case to False instead of Eq ""#20702

Open
petern48 wants to merge 3 commits intoapache:mainfrom
petern48:bug_regexp_optim
Open

fix/perf: Optimize ~! '.*' case to False instead of Eq ""#20702
petern48 wants to merge 3 commits intoapache:mainfrom
petern48:bug_regexp_optim

Conversation

@petern48
Copy link
Contributor

@petern48 petern48 commented Mar 4, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

A pre-existing optimization rule for the !~ .* (regexp not match) case rewrote the plan to Eq "", which would return empty strings as part of the result. This is incorrect and doesn't match the output without the optimization rule.

Instead, this PR rewrites the plan to simply lit(false). The reasoning why it's always false is as follows:

  • The .* pattern matches all strings including empty strings (""), so empty strings should not be returned
  • In the null case, NULL !~ .* evaluates to NULL (hence not true), so the query also doesn't return null rows.

I've confirmed this behavior matches the result of running queries manually with the optimization rule turned off.

Are these changes tested?

Fixed expected output in tests.

Are there any user-facing changes?

Yes, a minor bug fix. When querying s !~ .*, empty strings will no longer be included in the result which is consistent with the behavior without the optimization rule.

@petern48 petern48 changed the title Fix: Optimize ~! '.*' and '.*' cases to False instead of Eq "" fix/perf: Optimize ~! '.*' case to False instead of Eq "" Mar 4, 2026
@github-actions github-actions bot added optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt) labels Mar 4, 2026
@petern48 petern48 marked this pull request as ready for review March 4, 2026 18:28
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @petern48

/// - combinations (alternatives) of the above, will be concatenated with `OR` or `AND`
/// - `EQ .*` to NotNull
/// - `NE .*` means IS EMPTY
/// - `NE .*` to false (.* matches non-empty and empty strings, and NULL !~ '.*' results in NULL so this can never be true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is only false in the context of filters (and this code can currently be used for both filters and regular expressions)

I think null !~ '.*' is actually null (not false)

I think a correct rewrite is

CASE 
  WHEN col IS NOT NULL THEN FALSE 
  ELSE NULL
END

)?;

// Test `!= ".*"` transforms to checking if the column is empty
// Test `!~ ".*"` transforms to false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we please also add a test explicitly for a null input?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: regexp simplify optimization incorrect simplifies .* pattern to Eq "" operation

2 participants