Conversation
|
Sorry for the delay, I'll do my best to find some time this weekend 🤞 |
elshize
left a comment
There was a problem hiding this comment.
Looks good, just some minor comments.
| // Map from an internal integer docid to a string "collection" docid. | ||
| let mut docid_map: HashMap<u32, String> = HashMap::new(); | ||
|
|
||
| // 1. Map the postings back into a forward index |
There was a problem hiding this comment.
Can we have each of 1-3 be a separate function? I think 1 could return forward index, 2 returns docid_map and 3 takes both to write, something like that?
There was a problem hiding this comment.
Good idea, agreed. I'll fix this up.
| let progress = ProgressBar::new(u64::from(header.num_postings_lists)); | ||
| progress.set_style(pb_style()); | ||
| progress.set_draw_delta(10); | ||
| for _ in 0..header.num_postings_lists { |
There was a problem hiding this comment.
When you have an iterator and by-one increment in the loop, you can import indicatif::ProgressIterator and do this:
for _ in (0..header.num_postings_lists).progress(progress);This means you skip progress.inc(1) and progress.finish() -- it's done for you.
There was a problem hiding this comment.
Note you can do this for all three loops.
| let mut docid_map: HashMap<u32, String> = HashMap::new(); | ||
|
|
||
| // 1. Map the postings back into a forward index | ||
| let progress = ProgressBar::new(u64::from(header.num_postings_lists)); |
There was a problem hiding this comment.
I think we could write a module-private function fn progress_bar(count: u64) -> ProgressBar because we use the same kind of progress 3 times.
This adds a program that can convert a CIFF blob into a jsonl file that follows the standard "learned sparse" format.
We probably should write some tests before merging. Also, not too sure how canonical my rust is these days, so it might be good to review carefully.