Skip to content

fix(llm): isolate Stage 2 batch failures and keep unanalysed findings#32

Open
nyxst4ck wants to merge 1 commit into
NVIDIA:mainfrom
nyxst4ck:fix/stage2-batch-isolation
Open

fix(llm): isolate Stage 2 batch failures and keep unanalysed findings#32
nyxst4ck wants to merge 1 commit into
NVIDIA:mainfrom
nyxst4ck:fix/stage2-batch-isolation

Conversation

@nyxst4ck

Copy link
Copy Markdown

Summary

Fixes the two coupled Stage 2 resilience bugs, which the reporter noted should be considered together because fixing the abort without fixing the drop would make the scanner less safe:

Changes

llm_analyzer_base.py — per-batch failure isolation. arun_batches now gathers with return_exceptions=True: a transient error (timeout, 429, oversized-chunk 400) is logged with the batch label and costs only its own batch instead of cancelling the whole fan-out. ValueError and NotImplementedError still propagate — they signal misconfiguration (missing API key, wrong response_schema), which skipping a batch cannot fix. This also benefits the two discovery analyzers (semantic_developer_intent, semantic_quality_policy), which previously lost all discovered findings when any single batch failed.

meta_analyzer.py — no verdict means conservative keep, not delete. With partial results possible, the keep/drop loop must distinguish "the LLM rejected this" from "the LLM never saw this". The node now partitions findings by whether a returned batch actually carried them (each Batch already tracks its findings): analysed findings go through the normal confirm-or-drop filter; unanalysed ones are kept via the existing _fallback_filtered path; a WARNING logs how many findings were kept unfiltered so the gap is visible rather than silent. When all batches fail, behaviour matches today's net result (everything kept via fallback).

Net effect, as suggested in #11: an infra failure can only ever cost enrichment on a file, never the finding itself.

Tests

  • arun_batches: a failing batch doesn't abort the others; all-failed returns empty; ValueError still propagates (5 new tests in test_llm_analyzer_base.py).
  • New tests/nodes/test_meta_analyzer.py: confirmed finding kept enriched + rejected finding dropped + unseen finding kept when one batch fails; all-batches-failed keeps everything; strict confirm-or-drop preserved when nothing fails.
  • The two node-level tests are red on main and green with this fix.
pytest tests/ --ignore=tests/integration   # 598 passed; failures identical to main (pre-existing, unrelated)
ruff check / ruff format                   # clean on touched files

All commits are signed off per the DCO.

One exception anywhere in the arun_batches fan-out aborted the whole
Stage 2 pass: asyncio.gather without return_exceptions cancelled the
remaining batches, the meta-analyzer's blanket except caught the
propagated error, and every file silently fell back to static-only
results while the CLI still exited 0 (NVIDIA#9).

arun_batches now isolates failures per batch: a transient error
(timeout, 429, oversized-chunk 400) is logged and costs only its own
batch. ValueError and NotImplementedError still propagate, since they
signal misconfiguration rather than infra trouble.

With partial results possible, apply_filter could no longer treat a
missing confirmation as a rejection: a finding whose batch never
returned would be silently dropped — a false negative manufactured by
an infrastructure error (NVIDIA#11). The meta-analyzer now partitions
findings by whether a returned batch actually carried them: analysed
findings go through the normal confirm-or-drop filter, unanalysed ones
are kept via the existing fallback path, and a WARNING logs how many
findings were kept unfiltered so the gap is visible.

Net effect: an infra failure can only ever cost enrichment on a file,
never the finding itself, and one bad call no longer turns off the
semantic filter for the whole scan.

Fixes NVIDIA#9, fixes NVIDIA#11

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Signed-off-by: nyxst4ck <289980115+nyxst4ck@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant