Proposal
Problem statement
std::io::Read::take currently returns a new instance of Read but not Seek regardless of whether self implements Seek or not. This results in code that may work with self but not with the instance returned by .take because the former implements Seek while the latter doesn't. Moreover there is not reason for this to be the case; Take is just a wrapper around self that can read at most limit bytes. If self implements Seek then Take can also naturally implement Seek.
This proposal is to add a Seek instance for Take.
Motivating examples or use cases
For example, binary files parsers could benefits from this feature. Let's say that we have file format that is a list of records where each record is composed of the length of the record followed by the bytes of the record. It could be useful to read the length of the record and then use .take to delimit the bytes that the record parser should process. This removes the need to deal with trailing bytes from the record parser resulting in simpler code.
Now let's say that the parser needs to skip certain parts of the record because e.g. obsolete or not needed. seek is a natural fit for this task but currently Take doesn't support it even if File supports seek. The parser would have to read the bytes that it wants to skip and write them to a noop sink.
There is also a reusability reason: any generic code that works with bytes sequences should work with the result of .take too. For instance, let's write the interface of the parser above in a generic way:
fn record_parser<I: Read + Seek>(i: I) -> Result<Record> { ... }
this function should work for Take if it wraps an instance of Seek.
Solution sketch
seek can be expensive and therefore a requirement for the solution is that a call to Take.seek results in a single call to seek on the inner object of Take.
Take currently keeps track only of the remaining limit bytes that can be read from the inner object. This info alone is not enough to implement seek because e.g. we don't know when the Take "started" so seek(SeekFrom::Start) cannot be computed.
This solution proposes to keep track of the initial limit of the Take in a new field called len. This can then be used to compute the position of the cursor inside the Takeas len - limit. The reason to not track position directly is to avoid adding one more mutating variable that changes together with limit. len is constant (unless set_limit is called).
Once we know the position of the cursor inside Take, we can compute the offset from the current position based on the argument of seek. If the argument is SeekFrom::Start then we seek from -position, if the argument is SeekFrom::End then we seek from position+limit, if the argument is SeekFrom::Current then we seek from position.
API
Add a position method that returns the position within the Take
impl Take {
fn position(&self) -> u64 {
self.len - self.limit
}
}
Add the Seekimplementation for Take
impl<T: Seek> Seek for Take<T> {
...
}
Alternatives
The solution proposed in 97227 keeps track of the position instead of the original len. It is equivalent to the solution proposed here. The reason why I prefer the above solution is because it's simpler to maintain given that len never changes while cursor in 97227 must be updated together with limit leading to code like:
self.cursor += amt as u64;
self.limit -= amt as u64;
Links and related work
- 97227
- My implementation of the solution proposed here PR #138023
Proposal
Problem statement
std::io::Read::takecurrently returns a new instance ofReadbut notSeekregardless of whetherselfimplementsSeekor not. This results in code that may work withselfbut not with the instance returned by.takebecause the former implementsSeekwhile the latter doesn't. Moreover there is not reason for this to be the case;Takeis just a wrapper aroundselfthat can read at mostlimitbytes. IfselfimplementsSeekthenTakecan also naturally implementSeek.This proposal is to add a
Seekinstance forTake.Motivating examples or use cases
For example, binary files parsers could benefits from this feature. Let's say that we have file format that is a list of records where each record is composed of the length of the record followed by the bytes of the record. It could be useful to read the length of the record and then use
.taketo delimit the bytes that the record parser should process. This removes the need to deal with trailing bytes from the record parser resulting in simpler code.Now let's say that the parser needs to skip certain parts of the record because e.g. obsolete or not needed.
seekis a natural fit for this task but currentlyTakedoesn't support it even ifFilesupports seek. The parser would have to read the bytes that it wants to skip and write them to a noop sink.There is also a reusability reason: any generic code that works with bytes sequences should work with the result of
.taketoo. For instance, let's write the interface of the parser above in a generic way:this function should work for
Takeif it wraps an instance ofSeek.Solution sketch
seekcan be expensive and therefore a requirement for the solution is that a call toTake.seekresults in a single call toseekon theinnerobject ofTake.Takecurrently keeps track only of the remaininglimitbytes that can be read from theinnerobject. This info alone is not enough to implementseekbecause e.g. we don't know when theTake"started" soseek(SeekFrom::Start)cannot be computed.This solution proposes to keep track of the initial
limitof theTakein a new field calledlen. This can then be used to compute the position of the cursor inside theTakeaslen - limit. The reason to not trackpositiondirectly is to avoid adding one more mutating variable that changes together withlimit.lenis constant (unlessset_limitis called).Once we know the position of the cursor inside
Take, we can compute the offset from the current position based on the argument ofseek. If the argument isSeekFrom::Startthen we seek from-position, if the argument isSeekFrom::Endthen we seek fromposition+limit, if the argument isSeekFrom::Currentthen we seek fromposition.API
Add a
positionmethod that returns the position within theTakeAdd the
Seekimplementation forTakeAlternatives
The solution proposed in 97227 keeps track of the position instead of the original
len. It is equivalent to the solution proposed here. The reason why I prefer the above solution is because it's simpler to maintain given thatlennever changes whilecursorin 97227 must be updated together withlimitleading to code like:Links and related work