HBASE-29995 Improve existing hash implementations by reading 4 bytes at once by jinhyukify · Pull Request #7934 · apache/hbase

jinhyukify · 2026-03-13T15:42:58Z

Jira https://issues.apache.org/jira/browse/HBASE-29995

… hashing

…y-byte

Apache9 · 2026-03-15T13:37:12Z

Please add a dummy change in pom.xml so we can run all the tests in HBase to see if there are any problems with the changes? once we confirm there is no problem, you can do a force push to remove the dummy commit and then merge the PR.

And why not read 8 bytes at once?

jinhyukify · 2026-03-15T17:01:40Z

@Apache9 Thank you for checking!

And why not read 8 bytes at once?

The existing hashes (Jenkins, Murmur, Murmur3) are all 4-byte based, so they operate on 4-byte chunks. There is currently no place in the algorithms where 8-byte reads are used.

In theory we could read 8 bytes and split them into two ints, but given the current structure it would not provide much benefit. If we add hash functions that operate on 8-byte words in the future, we could revisit this.

junegunn · 2026-03-16T08:06:50Z

hbase-common/src/main/java/org/apache/hadoop/hbase/util/RowColBloomHashKey.java

+    return (int) assembleCrossingLE(offset, Bytes.SIZEOF_INT);
+  }
+
+  private long assembleCrossingLE(int offset, int wordBytes) {


So we're adding a non-trivial amount of code here, and it seems to overlap with the existing get(int offset) implementation to some degree. Would it make sense to refactor get to call this new method instead, so we can reduce code duplication? We could consider that approach as long as the performance overhead is negligible.

Thank you for checking.
I changed to use assembleCrossingLE method in get method.

jinhyukify added 3 commits March 14, 2026 00:11

HBASE-29995 Add LittleEndianBytes utility for fast LE primitive access

3556aa1

HBASE-29995 Extend HashKey with bulk little-endian accessors for fast…

2bcb7fe

… hashing

HBASE-29995 Read little-endian int in 4-byte chunks instead of byte-b…

3a27bfc

…y-byte

junegunn reviewed Mar 16, 2026

View reviewed changes

HBASE-29995 Use assembleCrossingLE method in get method

9419723

jinhyukify force-pushed the HBASE-29995 branch from 402ff19 to 9419723 Compare March 18, 2026 01:56

HBASE-29995 Remove duplicate byte-access logic in RowColBloomHashKey

f8b98a4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HBASE-29995 Improve existing hash implementations by reading 4 bytes at once#7934

HBASE-29995 Improve existing hash implementations by reading 4 bytes at once#7934
jinhyukify wants to merge 5 commits intoapache:masterfrom
jinhyukify:HBASE-29995

jinhyukify commented Mar 13, 2026

Uh oh!

Apache9 commented Mar 15, 2026

Uh oh!

jinhyukify commented Mar 15, 2026

Uh oh!

junegunn Mar 16, 2026

Uh oh!

jinhyukify Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jinhyukify commented Mar 13, 2026

Uh oh!

Apache9 commented Mar 15, 2026

Uh oh!

jinhyukify commented Mar 15, 2026

Uh oh!

junegunn Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

jinhyukify Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants