HBASE-29995 Improve existing hash implementations by reading 4 bytes at once#7934
HBASE-29995 Improve existing hash implementations by reading 4 bytes at once#7934jinhyukify wants to merge 5 commits intoapache:masterfrom
Conversation
|
Please add a dummy change in pom.xml so we can run all the tests in HBase to see if there are any problems with the changes? once we confirm there is no problem, you can do a force push to remove the dummy commit and then merge the PR. And why not read 8 bytes at once? |
|
@Apache9 Thank you for checking!
The existing hashes (Jenkins, Murmur, Murmur3) are all 4-byte based, so they operate on 4-byte chunks. There is currently no place in the algorithms where 8-byte reads are used. In theory we could read 8 bytes and split them into two ints, but given the current structure it would not provide much benefit. If we add hash functions that operate on 8-byte words in the future, we could revisit this. |
| return (int) assembleCrossingLE(offset, Bytes.SIZEOF_INT); | ||
| } | ||
|
|
||
| private long assembleCrossingLE(int offset, int wordBytes) { |
There was a problem hiding this comment.
So we're adding a non-trivial amount of code here, and it seems to overlap with the existing get(int offset) implementation to some degree. Would it make sense to refactor get to call this new method instead, so we can reduce code duplication? We could consider that approach as long as the performance overhead is negligible.
There was a problem hiding this comment.
Thank you for checking.
I changed to use assembleCrossingLE method in get method.
Jira https://issues.apache.org/jira/browse/HBASE-29995