fix(scan): pre-arm scanning flag before publishing reindexed picker#587
fix(scan): pre-arm scanning flag before publishing reindexed picker#587gustav-fff wants to merge 15 commits into
Conversation
`new_with_shared_state` published the new `FilePicker` to `SharedFilePicker`
with `scanning=false` (the default) and only flipped it to `true` later
inside `ScanJob::spawn`. Between those two events, any consumer that
cloned the `signals.scanning` Arc — e.g. lua `wait_for_initial_scan`
called right after `restart_index_in_path` reports the base_path swap —
observed scanning=false and returned immediately, then ran a search on
the freshly-initialized empty index.
The race window is wide on Windows CI where filesystem walker startup
takes longer; surfaced as `tests/programmatic_search_spec.lua:212`
("content_search switches indexed root before grepping") flaking on
Windows e2e.
Set `scanning=true` on the cloned signals Arc *before* publishing the
picker, so any wait that observes the swap is guaranteed to also see
scanning=true. `ScanJob::spawn` re-stores `true` so semantics are
unchanged on the happy path.
|
@gustav-fff why did you make this PR draft? can you actually rerun the CI like 15 times and give me a summary |
|
[triage-bot] DIRECTED: Draft: PR is currently not draft ( Reruns: kicked off 14 additional e2e runs by pushing empty commits to
Will follow up once all runs complete with a pass/fail tally for windows-latest e2e. If any fail, I'll link the failing run + log excerpt. If all 15 pass I'll squash the rerun commits back down to the single fix commit before merge. Honk-Honk 🪿 |
|
@gustav-fff I cancelled the jobs run them again but sequentially make sure to rerun a specific windows job ONLY make sure that it passes 5 times in a row |
|
[triage-bot] DIRECTED: Switched to sequential single-job reruns of
The 14 prior empty-commit reruns are already cancelled on your end, so no parallel windows jobs will compete. I'll post the final tally once the 5-in-a-row sequence completes (~50 min wall-clock if all green). Honk-Honk 🪿 |
Refs #552 (CI flake follow-up requested by @dmtrKovalenko).
Root cause
FilePicker::new_with_shared_state(crates/fff-core/src/file_picker.rs:736) publishes the freshly builtFilePickertoSharedFilePickerwithsignals.scanning = false(theScanSignals::default()value at construction). The scanning flag is only flipped totruelater, on the new OS thread insideScanJob::spawn(crates/fff-core/src/scan.rs:140).Between the publish and the spawn, any consumer that grabs a clone of
signals.scanningfrom the picker — e.g. luawait_for_initial_scancalled right afterrestart_index_in_pathreports the newbase_pathviahealth_check— observesscanning=falseand returns immediately. A search issued right after then runs on the freshly-initialized, empty index (walker hasn't committed yet).tests/programmatic_search_spec.lua:212("content_search switches indexed root before grepping") hits exactly this sequence: write file, switch cwd, immediately grep. On Linux/macOS the walker is fast enough that the race rarely closes; on Windows CI the walker startup is slower and the test flakes.Fix
Set
scanning=trueon the clonedsignalsArc before publishing the new picker into the shared lock.ScanJob::spawnstill re-storestrueso happy-path semantics are unchanged; the only new guarantee is that any wait obtaining the signal Arc post-publish observes scanning=true and waits for the walker.Diff is 10 lines, no behavior change off the race path.
Steps to reproduce
Manual: not easily on macOS/Linux — race is timing-dependent. CI: Windows e2e job consistently failed with the assertion below on PR #552 and on
main(masked bycontinue-on-erroron push).How verified
cargo check --workspacepasses.Automated triage via Gustav. Honk-Honk 🪿