`Hasher::write` should clarify its "whole unit" behaviour

Inspired by https://users.rust-lang.org/t/hash-prefix-collisions/71823/10?u=scottmcm

[`Hash::hash_slice`](https://doc.rust-lang.org/std/hash/trait.Hash.html#method.hash_slice) has a bunch of text clarifying that `h.hash_slice(&[a, b]); h.hash_slice(&[c]);` is *not* guaranteed to be the same as `h.hash_slice(&[a]); h.hash_slice(&[b, c]);`.

However, [`Hasher::write`](https://doc.rust-lang.org/std/hash/trait.Hasher.html#tymethod.write) is unclear whether that same rule applies to it.  It's very clear that `.write(&[a])` is not the same as `.write_u8(a)`, but not whether *the same sequence of bytes* to `write` is supposed to be the same thing, even if they're in different groupings, like `h.write(&[a, b]); h.write(&[c]);` vs `h.write(&[a]); h.write(&[b, c]);`.

This is important for the same kind of things as the `VecDeque` example mentioned on `hash_slice`.  If I have a circular byte buffer, is it legal for its `Hash` to just `.write` the two parts?  Or does it need to `write_u8` all the individual bytes since two circular buffers should compare equal regardless of where the split happens to be?

Given that [`Hash for str`](https://doc.rust-lang.org/1.58.0/src/core/hash/mod.rs.html#696-702) and [`Hash for [T]`](https://doc.rust-lang.org/1.58.0/src/core/hash/mod.rs.html#754-760) are doing prefix-freedom already, it feels to me like `write` should **not** be doing it again.

Also, our `SipHasher` implementation is going out of its way to maintain the "different chunking of `write`s is fine":

https://github.com/rust-lang/rust/blob/6bf3008f0757c7c89c3f02e0e7eaac5ee30c1c6c/library/core/src/hash/sip.rs#L264-L308

So it seems to me like this has been the expected behaviour the whole time.  And if not, we should optimize `SipHasher` to be faster.

cc #80303 which lead to this text in `hash_slice`.

	fn write(&mut self, msg: &[u8]) {
	let length = msg.len();
	self.length += length;

	let mut needed = 0;

	if self.ntail != 0 {
	needed = 8 - self.ntail;
	// SAFETY: `cmp::min(length, needed)` is guaranteed to not be over `length`
	self.tail \|= unsafe { u8to64_le(msg, 0, cmp::min(length, needed)) } << (8 * self.ntail);
	if length < needed {
	self.ntail += length;
	return;
	} else {
	self.state.v3 ^= self.tail;
	S::c_rounds(&mut self.state);
	self.state.v0 ^= self.tail;
	self.ntail = 0;
	}
	}

	// Buffered tail is now flushed, process new input.
	let len = length - needed;
	let left = len & 0x7; // len % 8

	let mut i = needed;
	while i < len - left {
	// SAFETY: because `len - left` is the biggest multiple of 8 under
	// `len`, and because `i` starts at `needed` where `len` is `length - needed`,
	// `i + 8` is guaranteed to be less than or equal to `length`.
	let mi = unsafe { load_int_le!(msg, i, u64) };

	self.state.v3 ^= mi;
	S::c_rounds(&mut self.state);
	self.state.v0 ^= mi;

	i += 8;
	}

	// SAFETY: `i` is now `needed + len.div_euclid(8) * 8`,
	// so `i + left` = `needed + len` = `length`, which is by
	// definition equal to `msg.len()`.
	self.tail = unsafe { u8to64_le(msg, i, left) };
	self.ntail = left;
	}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`Hasher::write` should clarify its "whole unit" behaviour #94026

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Hasher::write should clarify its "whole unit" behaviour #94026

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

`Hasher::write` should clarify its "whole unit" behaviour #94026