Skip to content

fix(btree): include null pages in non-IsNull queries for correct thre…#6043

Merged
Xuanwo merged 1 commit intolance-format:mainfrom
wkalt:fix/nullable-or-mask
Mar 4, 2026
Merged

fix(btree): include null pages in non-IsNull queries for correct thre…#6043
Xuanwo merged 1 commit intolance-format:mainfrom
wkalt:fix/nullable-or-mask

Conversation

@wkalt
Copy link
Copy Markdown
Contributor

@wkalt wkalt commented Feb 27, 2026

…e-valued logic

BTree index search only consulted pages whose value range matched the query. When a queried value (e.g. 0) didn't exist in any page, no pages were searched and the null row set was empty. This caused NOT(x = 0) to produce a BlockList with no nulls tracked, incorrectly allowing NULL rows to pass the filter.

The fix adds null pages (both partially-null and all-null) to every non-IsNull query's page list as Matches::Some, so FlatIndex::search() evaluates the predicate and correctly marks those rows as NULL. This preserves three-valued logic for compound expressions like (x != 0) OR (x < 5).

…e-valued logic

BTree index search only consulted pages whose value range matched the
query. When a queried value (e.g. 0) didn't exist in any page, no pages
were searched and the null row set was empty. This caused `NOT(x = 0)`
to produce a BlockList with no nulls tracked, incorrectly allowing NULL
rows to pass the filter.

The fix adds null pages (both partially-null and all-null) to every
non-IsNull query's page list as Matches::Some, so FlatIndex::search()
evaluates the predicate and correctly marks those rows as NULL. This
preserves three-valued logic for compound expressions like
`(x != 0) OR (x < 5)`.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

PR Review

Solid bug fix with good analysis. The three-valued logic gap when queried values don't appear in any BTree page is a real correctness issue, and the fix is well-targeted.

No blocking issues found

The approach of adding null pages as Matches::Some (not Matches::All) is correct — it forces FlatIndex::search() to evaluate the predicate, which properly classifies null rows as NULL rather than TRUE. The HashSet deduplication avoids double-counting pages that already appear in the results. Both unit and integration tests cover the regression path well.

Minor note (non-blocking)

The project guidelines in CLAUDE.md recommend RoaringBitmap over HashSet<u32>. In this case, page counts are small enough that HashSet is perfectly fine (and possibly faster), so this is not actionable — just flagging for awareness.

LGTM

@fenfeng9
Copy link
Copy Markdown
Contributor

This PR appears to address the issue #5918.

@wkalt
Copy link
Copy Markdown
Contributor Author

wkalt commented Feb 27, 2026

@fenfeng9 awesome! Thanks for the pointer. This bug was found with a generative testing framework I have been hacking on.

@github-actions github-actions Bot added the bug Something isn't working label Feb 27, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 27, 2026

Codecov Report

❌ Patch coverage is 94.28571% with 4 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance-index/src/scalar/btree.rs 94.28% 3 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

Copy link
Copy Markdown
Collaborator

@Xuanwo Xuanwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this!

@Xuanwo Xuanwo merged commit fbdf1ce into lance-format:main Mar 4, 2026
31 of 32 checks passed
wjones127 pushed a commit to wjones127/lance that referenced this pull request Mar 4, 2026
lance-format#6043)

…e-valued logic

BTree index search only consulted pages whose value range matched the
query. When a queried value (e.g. 0) didn't exist in any page, no pages
were searched and the null row set was empty. This caused `NOT(x = 0)`
to produce a BlockList with no nulls tracked, incorrectly allowing NULL
rows to pass the filter.

The fix adds null pages (both partially-null and all-null) to every
non-IsNull query's page list as Matches::Some, so FlatIndex::search()
evaluates the predicate and correctly marks those rows as NULL. This
preserves three-valued logic for compound expressions like `(x != 0) OR
(x < 5)`.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants