Inspired by https://users.rust-lang.org/t/hash-prefix-collisions/71823/10?u=scottmcm
Hash::hash_slice has a bunch of text clarifying that h.hash_slice(&[a, b]); h.hash_slice(&[c]); is not guaranteed to be the same as h.hash_slice(&[a]); h.hash_slice(&[b, c]);.
However, Hasher::write is unclear whether that same rule applies to it. It's very clear that .write(&[a]) is not the same as .write_u8(a), but not whether the same sequence of bytes to write is supposed to be the same thing, even if they're in different groupings, like h.write(&[a, b]); h.write(&[c]); vs h.write(&[a]); h.write(&[b, c]);.
This is important for the same kind of things as the VecDeque example mentioned on hash_slice. If I have a circular byte buffer, is it legal for its Hash to just .write the two parts? Or does it need to write_u8 all the individual bytes since two circular buffers should compare equal regardless of where the split happens to be?
Given that Hash for str and Hash for [T] are doing prefix-freedom already, it feels to me like write should not be doing it again.
Also, our SipHasher implementation is going out of its way to maintain the "different chunking of writes is fine":
|
fn write(&mut self, msg: &[u8]) { |
|
let length = msg.len(); |
|
self.length += length; |
|
|
|
let mut needed = 0; |
|
|
|
if self.ntail != 0 { |
|
needed = 8 - self.ntail; |
|
// SAFETY: `cmp::min(length, needed)` is guaranteed to not be over `length` |
|
self.tail |= unsafe { u8to64_le(msg, 0, cmp::min(length, needed)) } << (8 * self.ntail); |
|
if length < needed { |
|
self.ntail += length; |
|
return; |
|
} else { |
|
self.state.v3 ^= self.tail; |
|
S::c_rounds(&mut self.state); |
|
self.state.v0 ^= self.tail; |
|
self.ntail = 0; |
|
} |
|
} |
|
|
|
// Buffered tail is now flushed, process new input. |
|
let len = length - needed; |
|
let left = len & 0x7; // len % 8 |
|
|
|
let mut i = needed; |
|
while i < len - left { |
|
// SAFETY: because `len - left` is the biggest multiple of 8 under |
|
// `len`, and because `i` starts at `needed` where `len` is `length - needed`, |
|
// `i + 8` is guaranteed to be less than or equal to `length`. |
|
let mi = unsafe { load_int_le!(msg, i, u64) }; |
|
|
|
self.state.v3 ^= mi; |
|
S::c_rounds(&mut self.state); |
|
self.state.v0 ^= mi; |
|
|
|
i += 8; |
|
} |
|
|
|
// SAFETY: `i` is now `needed + len.div_euclid(8) * 8`, |
|
// so `i + left` = `needed + len` = `length`, which is by |
|
// definition equal to `msg.len()`. |
|
self.tail = unsafe { u8to64_le(msg, i, left) }; |
|
self.ntail = left; |
|
} |
So it seems to me like this has been the expected behaviour the whole time. And if not, we should optimize SipHasher to be faster.
cc #80303 which lead to this text in hash_slice.
Inspired by https://users.rust-lang.org/t/hash-prefix-collisions/71823/10?u=scottmcm
Hash::hash_slicehas a bunch of text clarifying thath.hash_slice(&[a, b]); h.hash_slice(&[c]);is not guaranteed to be the same ash.hash_slice(&[a]); h.hash_slice(&[b, c]);.However,
Hasher::writeis unclear whether that same rule applies to it. It's very clear that.write(&[a])is not the same as.write_u8(a), but not whether the same sequence of bytes towriteis supposed to be the same thing, even if they're in different groupings, likeh.write(&[a, b]); h.write(&[c]);vsh.write(&[a]); h.write(&[b, c]);.This is important for the same kind of things as the
VecDequeexample mentioned onhash_slice. If I have a circular byte buffer, is it legal for itsHashto just.writethe two parts? Or does it need towrite_u8all the individual bytes since two circular buffers should compare equal regardless of where the split happens to be?Given that
Hash for strandHash for [T]are doing prefix-freedom already, it feels to me likewriteshould not be doing it again.Also, our
SipHasherimplementation is going out of its way to maintain the "different chunking ofwrites is fine":rust/library/core/src/hash/sip.rs
Lines 264 to 308 in 6bf3008
So it seems to me like this has been the expected behaviour the whole time. And if not, we should optimize
SipHasherto be faster.cc #80303 which lead to this text in
hash_slice.