fix(llm): isolate Stage 2 batch failures and keep unanalysed findings by nyxst4ck · Pull Request #32 · NVIDIA/SkillSpector

nyxst4ck · 2026-06-12T12:22:05Z

Summary

Fixes the two coupled Stage 2 resilience bugs, which the reporter noted should be considered together because fixing the abort without fixing the drop would make the scanner less safe:

Stage 2: one failed batch aborts the entire meta-analyzer pass and silently falls back to static #9 — one failed batch aborts the entire meta-analyzer pass and silently falls back to static
Stage 2: apply_filter silently drops findings for files the LLM never analyzed (false negatives) #11 — apply_filter silently drops findings for files the LLM never analyzed (false negatives)

Changes

llm_analyzer_base.py — per-batch failure isolation. arun_batches now gathers with return_exceptions=True: a transient error (timeout, 429, oversized-chunk 400) is logged with the batch label and costs only its own batch instead of cancelling the whole fan-out. ValueError and NotImplementedError still propagate — they signal misconfiguration (missing API key, wrong response_schema), which skipping a batch cannot fix. This also benefits the two discovery analyzers (semantic_developer_intent, semantic_quality_policy), which previously lost all discovered findings when any single batch failed.

meta_analyzer.py — no verdict means conservative keep, not delete. With partial results possible, the keep/drop loop must distinguish "the LLM rejected this" from "the LLM never saw this". The node now partitions findings by whether a returned batch actually carried them (each Batch already tracks its findings): analysed findings go through the normal confirm-or-drop filter; unanalysed ones are kept via the existing _fallback_filtered path; a WARNING logs how many findings were kept unfiltered so the gap is visible rather than silent. When all batches fail, behaviour matches today's net result (everything kept via fallback).

Net effect, as suggested in #11: an infra failure can only ever cost enrichment on a file, never the finding itself.

Tests

arun_batches: a failing batch doesn't abort the others; all-failed returns empty; ValueError still propagates (5 new tests in test_llm_analyzer_base.py).
New tests/nodes/test_meta_analyzer.py: confirmed finding kept enriched + rejected finding dropped + unseen finding kept when one batch fails; all-batches-failed keeps everything; strict confirm-or-drop preserved when nothing fails.
The two node-level tests are red on main and green with this fix.

pytest tests/ --ignore=tests/integration   # 598 passed; failures identical to main (pre-existing, unrelated)
ruff check / ruff format                   # clean on touched files

All commits are signed off per the DCO.

One exception anywhere in the arun_batches fan-out aborted the whole Stage 2 pass: asyncio.gather without return_exceptions cancelled the remaining batches, the meta-analyzer's blanket except caught the propagated error, and every file silently fell back to static-only results while the CLI still exited 0 (NVIDIA#9). arun_batches now isolates failures per batch: a transient error (timeout, 429, oversized-chunk 400) is logged and costs only its own batch. ValueError and NotImplementedError still propagate, since they signal misconfiguration rather than infra trouble. With partial results possible, apply_filter could no longer treat a missing confirmation as a rejection: a finding whose batch never returned would be silently dropped — a false negative manufactured by an infrastructure error (NVIDIA#11). The meta-analyzer now partitions findings by whether a returned batch actually carried them: analysed findings go through the normal confirm-or-drop filter, unanalysed ones are kept via the existing fallback path, and a WARNING logs how many findings were kept unfiltered so the gap is visible. Net effect: an infra failure can only ever cost enrichment on a file, never the finding itself, and one bad call no longer turns off the semantic filter for the whole scan. Fixes NVIDIA#9, fixes NVIDIA#11 Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Signed-off-by: nyxst4ck <289980115+nyxst4ck@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(llm): isolate Stage 2 batch failures and keep unanalysed findings#32

fix(llm): isolate Stage 2 batch failures and keep unanalysed findings#32
nyxst4ck wants to merge 1 commit into
NVIDIA:mainfrom
nyxst4ck:fix/stage2-batch-isolation

nyxst4ck commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nyxst4ck commented Jun 12, 2026

Summary

Changes

Tests

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant