Fix(backend): Fix race condition in download queue when concurrent jobs share same destination#8931
Open
lstein wants to merge 1 commit intoinvoke-ai:mainfrom
Open
Conversation
…nation directory (#104) * Initial plan * Fix race condition in _do_download when scanning for .downloading files Co-authored-by: lstein <111189+lstein@users.noreply.github.com> * chore(backend): update copyright --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: lstein <111189+lstein@users.noreply.github.com> Co-authored-by: Lincoln Stein <lincoln.stein@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
When two download jobs target the same destination directory simultaneously, a TOCTOU race between
glob("*.downloading")and the subsequent.stat()call could cause aFileNotFoundErrorif a concurrent job completed and renamed its.downloadingfile in between. This surfaced as an intermittent test failure intest_errorswherebroken's job error wasFileNotFoundErrorinstead of the expectedHTTPError(NOT FOUND).Note that this scenario rarely (if ever) occurs in real life. However, it has been causing increasingly frequent test failures in the download and install managers unit tests.
Fix: In
_do_download, wrap thecandidates[0].stat().st_sizecall in atry-except FileNotFoundError. If the file disappears between glob and stat, resetjob.download_path = Noneand leaveresume_from = 0so the job proceeds as a fresh download.Related Issues / Discussions
QA Instructions
Run
tests/app/services/download/test_download_queue.py::test_errorsrepeatedly — it previously failed intermittently due to this race.Merge Plan
Simple merge.
Checklist
What's Newcopy (if doing a release after this PR)