Fix(llm): retry transient failures and make concurrency configurable#29
Open
jhamze7 wants to merge 4 commits into
Open
Fix(llm): retry transient failures and make concurrency configurable#29jhamze7 wants to merge 4 commits into
jhamze7 wants to merge 4 commits into
Conversation
|
Verified on a self-hosted OpenAI-compatible backend (vLLM serving
LGTM for the #10 retry/backoff + configurable concurrency goal. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request addresses the issue of LLM calls immediately failing due to a lack of retry logic.
Changes:
llm_utils.py:
Created helper functions for the sync and async batch runners that retry LLM calls 4 times by default (this number is configurable with the
max_attemptsargument) with exponential backoff timing. These functions string match the error message to keep the implementation universal across different providers.llm_analyzer_base.py:
Wrapped the LLM calls in the helper functions created in
llm_utils.py. Also set max_concurrency to the SKILLSPECTOR_MAX_CONCURRENCY environment variable, with the value defaulting to 5 if the variable is empty..env.example:
Created a
SKILLSPECTOR_MAX_CONCURRENCYenvironment variable. This change was also noted indocs/DEVELOPMENT.mdtest_llm_analyzer_base.py:
Added sync and async tests verifying that LLM calls are retried when a simulated 429 error occurs. The tests also verify that processing succeeds after a transient failure and raises an exception once the retry limit is exhausted.
Closes #10