fix(llm): isolate Stage 2 batch failures and keep unanalysed findings#32
Open
nyxst4ck wants to merge 1 commit into
Open
fix(llm): isolate Stage 2 batch failures and keep unanalysed findings#32nyxst4ck wants to merge 1 commit into
nyxst4ck wants to merge 1 commit into
Conversation
One exception anywhere in the arun_batches fan-out aborted the whole Stage 2 pass: asyncio.gather without return_exceptions cancelled the remaining batches, the meta-analyzer's blanket except caught the propagated error, and every file silently fell back to static-only results while the CLI still exited 0 (NVIDIA#9). arun_batches now isolates failures per batch: a transient error (timeout, 429, oversized-chunk 400) is logged and costs only its own batch. ValueError and NotImplementedError still propagate, since they signal misconfiguration rather than infra trouble. With partial results possible, apply_filter could no longer treat a missing confirmation as a rejection: a finding whose batch never returned would be silently dropped — a false negative manufactured by an infrastructure error (NVIDIA#11). The meta-analyzer now partitions findings by whether a returned batch actually carried them: analysed findings go through the normal confirm-or-drop filter, unanalysed ones are kept via the existing fallback path, and a WARNING logs how many findings were kept unfiltered so the gap is visible. Net effect: an infra failure can only ever cost enrichment on a file, never the finding itself, and one bad call no longer turns off the semantic filter for the whole scan. Fixes NVIDIA#9, fixes NVIDIA#11 Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Signed-off-by: nyxst4ck <289980115+nyxst4ck@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes the two coupled Stage 2 resilience bugs, which the reporter noted should be considered together because fixing the abort without fixing the drop would make the scanner less safe:
apply_filtersilently drops findings for files the LLM never analyzed (false negatives)Changes
llm_analyzer_base.py— per-batch failure isolation.arun_batchesnow gathers withreturn_exceptions=True: a transient error (timeout, 429, oversized-chunk 400) is logged with the batch label and costs only its own batch instead of cancelling the whole fan-out.ValueErrorandNotImplementedErrorstill propagate — they signal misconfiguration (missing API key, wrong response_schema), which skipping a batch cannot fix. This also benefits the two discovery analyzers (semantic_developer_intent,semantic_quality_policy), which previously lost all discovered findings when any single batch failed.meta_analyzer.py— no verdict means conservative keep, not delete. With partial results possible, the keep/drop loop must distinguish "the LLM rejected this" from "the LLM never saw this". The node now partitions findings by whether a returned batch actually carried them (eachBatchalready tracks its findings): analysed findings go through the normal confirm-or-drop filter; unanalysed ones are kept via the existing_fallback_filteredpath; a WARNING logs how many findings were kept unfiltered so the gap is visible rather than silent. When all batches fail, behaviour matches today's net result (everything kept via fallback).Net effect, as suggested in #11: an infra failure can only ever cost enrichment on a file, never the finding itself.
Tests
arun_batches: a failing batch doesn't abort the others; all-failed returns empty;ValueErrorstill propagates (5 new tests intest_llm_analyzer_base.py).tests/nodes/test_meta_analyzer.py: confirmed finding kept enriched + rejected finding dropped + unseen finding kept when one batch fails; all-batches-failed keeps everything; strict confirm-or-drop preserved when nothing fails.All commits are signed off per the DCO.