Skip to content

GH-50043: [C++][Python] Fix hash_any/hash_all on sliced boolean arrays#50094

Merged
rok merged 3 commits into
apache:mainfrom
fenfeng9:fix-50043-hash-bool-slice-offset
Jun 4, 2026
Merged

GH-50043: [C++][Python] Fix hash_any/hash_all on sliced boolean arrays#50094
rok merged 3 commits into
apache:mainfrom
fenfeng9:fix-50043-hash-bool-slice-offset

Conversation

@fenfeng9
Copy link
Copy Markdown
Contributor

@fenfeng9 fenfeng9 commented Jun 4, 2026

Rationale for this change

hash_any and hash_all could return incorrect results for sliced nullable boolean arrays.

The validity bitmap used the slice offset, but the boolean values bitmap did not.

What changes are included in this PR?

Apply the slice offset when reading boolean values in hash_any / hash_all.

Add C++ and Python regression tests.

Are these changes tested?

Yes.

Are there any user-facing changes?

No.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 4, 2026

⚠️ GitHub issue #50043 has been automatically assigned in GitHub to PR creator.

Copy link
Copy Markdown
Member

@rok rok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this @fenfeng9 !
A minor C++ nit from my side.

Comment thread cpp/src/arrow/acero/hash_aggregate_test.cc Outdated
Comment thread cpp/src/arrow/acero/hash_aggregate_test.cc Outdated
@github-actions github-actions Bot added awaiting merge Awaiting merge and removed awaiting review Awaiting review labels Jun 4, 2026
Comment thread python/pyarrow/tests/test_table.py Outdated
@github-actions github-actions Bot added awaiting changes Awaiting changes and removed awaiting merge Awaiting merge labels Jun 4, 2026
@rok
Copy link
Copy Markdown
Member

rok commented Jun 4, 2026

I wonder if we have the same bug elsewhere in aggregate kernels (e.g. for different types). It's out of scope here, but do you have any input @fenfeng9 ?

@fenfeng9 fenfeng9 force-pushed the fix-50043-hash-bool-slice-offset branch from 3f7ceae to 22dd4b3 Compare June 4, 2026 12:05
@github-actions github-actions Bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Jun 4, 2026
Comment thread cpp/src/arrow/acero/hash_aggregate_test.cc Outdated
@github-actions github-actions Bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Jun 4, 2026
@fenfeng9
Copy link
Copy Markdown
Contributor Author

fenfeng9 commented Jun 4, 2026

I wonder if we have the same bug elsewhere in aggregate kernels (e.g. for different types). It's out of scope here, but do you have any input @fenfeng9 ?

I don't see the same kind of offset issue in nearby aggregate kernels for other data types.

I'll take a closer look separately, and if I find another case, I'll file a new issue.

@rok
Copy link
Copy Markdown
Member

rok commented Jun 4, 2026

I wonder if we have the same bug elsewhere in aggregate kernels (e.g. for different types). It's out of scope here, but do you have any input @fenfeng9 ?

I don't see the same kind of offset issue in nearby aggregate kernels for other data types.

I'll take a closer look separately, and if I find another case, I'll file a new issue.

Perfect, thank you!

This looks good to me now. Will merge if green.

@fenfeng9
Copy link
Copy Markdown
Contributor Author

fenfeng9 commented Jun 4, 2026

I wonder if we have the same bug elsewhere in aggregate kernels (e.g. for different types). It's out of scope here, but do you have any input @fenfeng9 ?

I don't see the same kind of offset issue in nearby aggregate kernels for other data types.
I'll take a closer look separately, and if I find another case, I'll file a new issue.

Perfect, thank you!

This looks good to me now. Will merge if green.

Thank you for your quick and patient review.

@github-actions github-actions Bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Jun 4, 2026
@github-actions github-actions Bot added awaiting changes Awaiting changes awaiting merge Awaiting merge and removed awaiting change review Awaiting change review awaiting changes Awaiting changes labels Jun 4, 2026
@rok rok merged commit 95cb5d0 into apache:main Jun 4, 2026
57 checks passed
@rok rok removed the awaiting merge Awaiting merge label Jun 4, 2026
@rok
Copy link
Copy Markdown
Member

rok commented Jun 4, 2026

Thanks again @fenfeng9!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants