Fix Anthropic provider: strip unsupported numeric-bound keywords from tool schemas by blakestone-x · Pull Request #3 · NVIDIA/SkillSpector

blakestone-x · 2026-05-31T06:20:53Z

Problem

Under SKILLSPECTOR_PROVIDER=anthropic, every LLM analyzer silently falls back to static analysis. The Anthropic provider points langchain_openai.ChatOpenAI at https://api.anthropic.com/v1/ and drives analyzers through with_structured_output(<Pydantic model>).

The structured-output schemas use Pydantic numeric-bound constraints:

llm_analyzer_base.py — start_line: int = Field(ge=1, …), confidence: float = Field(ge=0.0, le=1.0, …)
nodes/meta_analyzer.py — confidence: float = Field(ge=0.0, le=1.0, …)

These emit the JSON-schema keywords minimum / maximum (and exclusiveMinimum / exclusiveMaximum) on the generated tool/function schema. OpenAI and the NVIDIA endpoints tolerate them, but Anthropic's tool-schema validator rejects them with HTTP 400:

Error code: 400 - {'error': {'code': 'invalid_request_error',
  'message': "For 'integer' type, property 'minimum' is not supported",
  'type': 'invalid_request_error', 'param': None}}

The 400 is swallowed by the analyzer's except Exception fallback, so the scan still "succeeds" — but with the LLM pass (including the meta-analyzer false-positive filter) entirely skipped. The failure is invisible unless you read debug logs.

Repro (live, against `api.anthropic.com`)

from skillspector.llm_analyzer_base import LLMAnalysisResult
from skillspector.llm_utils import get_chat_model

llm = get_chat_model(model="claude-sonnet-4-6")  # SKILLSPECTOR_PROVIDER=anthropic
llm.with_structured_output(LLMAnalysisResult).invoke("Return an empty findings list.")
# -> 400 "For 'integer' type, property 'minimum' is not supported"

Fix

A provider-aware structured-output builder, build_structured_llm(llm, schema) (new src/skillspector/structured_output.py):

Every provider except Anthropic → unchanged: llm.with_structured_output(schema), so the numeric-bound constraints are preserved in the request for endpoints that validate against them.
Anthropic provider only → the generated tool schema is recursively stripped of minimum / maximum / exclusiveMinimum / exclusiveMaximum before it is sent (strip_unsupported_numeric_bounds). Responses are still parsed and validated against the original Pydantic schema, so the bounds are enforced on output even though they are dropped from the request.

This keeps the constraints on the OpenAI/NVIDIA paths (the reason for not simply deleting the Field(ge=, le=) declarations) while unblocking Anthropic.

Provider detection is a new is_anthropic_provider() helper in skillspector.providers, gated on SKILLSPECTOR_PROVIDER=anthropic (the only way the Anthropic provider is selected).

Before / after (same live key, `claude-sonnet-4-6`)

=== BEFORE (unsanitized with_structured_output) ===
got error: Error code: 400 - … "For 'integer' type, property 'minimum' is not supported"

=== AFTER (build_structured_llm, Anthropic-sanitized) ===
type: LLMAnalysisResult
findings: []
OK: structured Anthropic call succeeded

A full skillspector scan … --format json against a small SKILL.md with SKILLSPECTOR_PROVIDER=anthropic now runs the semantic discovery analyzers and the meta-analyzer LLM filter to completion (exit 0) instead of falling back to static-only.

Tests

tests/unit/test_structured_output.py adds:

a regression guard asserting the unsanitized schemas really do emit minimum/maximum,
recursive-strip coverage (nested properties, array items, anyOf, exclusive variants, no input mutation, unrelated keys preserved),
full-schema sanitization of the real LLMAnalysisResult / MetaAnalyzerResult,
provider routing (non-Anthropic → with_structured_output; Anthropic → sanitized bind_tools),
proof that output validation is not relaxed — an out-of-bound confidence still raises ValidationError at parse time.

Existing llm_analyzer_base / provider suites pass unchanged (non-Anthropic path is byte-for-byte the prior behavior).

Note / follow-up (separate from this PR)

The discovery analyzers can overflow Anthropic's 1M-token context on large repos: per-file inputs are batched but not chunked the way the meta-analyzer chunks oversized files, so a single very large file (or aggregate discovery input) can exceed the cap and error out. That's a distinct issue from this 400 and isn't addressed here — flagging it for a follow-up (input chunking for the discovery pass under the Anthropic token budget).

Pydantic Field(ge=, le=) constraints emit JSON-schema minimum/maximum (and exclusiveMinimum/exclusiveMaximum) on the tool/function schema that with_structured_output sends. Anthropic's tool-schema validator rejects these with HTTP 400 ("For 'integer' type, property 'minimum' is not supported"), so under SKILLSPECTOR_PROVIDER=anthropic every structured LLM call fails and each analyzer silently falls back to static analysis. Add a provider-aware structured-output builder: for every provider that tolerates the keywords (OpenAI, NVIDIA) behavior is unchanged (with_structured_output keeps the constraints). For the Anthropic provider only, the generated tool schema is stripped of the numeric-bound keywords before it is sent, while responses are still parsed and validated against the original Pydantic schema, so output constraints are preserved even though they are dropped from the request. - providers: add is_anthropic_provider() - structured_output: strip_unsupported_numeric_bounds + build_structured_llm - llm_analyzer_base: route with_structured_output through build_structured_llm - tests: regression repro + sanitizer + provider-routing coverage Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: blakestone-x <blakeastone.frisco@gmail.com>

keshprad · 2026-06-10T00:56:39Z

This mosly looks good to me. However, I want to explore using langchain_anthropic.ChatAnthropic and see if it may directly handle this?

Currently providers simply expose credentials and an OpenAI compatible API, and SkillSpector continues to use langchain_openai.ChatOpenAI everywhere. Maybe this needs to change so providers return a ChatOpenAI/ChatAnthropic/etc. instance.

Essentially, I want to explore whether it makes sense to move get_chat_model into the responsibility of each provider implementation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Anthropic provider: strip unsupported numeric-bound keywords from tool schemas#3

Fix Anthropic provider: strip unsupported numeric-bound keywords from tool schemas#3
blakestone-x wants to merge 1 commit into
NVIDIA:mainfrom
blakestone-x:fix/anthropic-tool-schema-numeric-bounds

blakestone-x commented May 31, 2026

Uh oh!

keshprad commented Jun 10, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

blakestone-x commented May 31, 2026

Problem

Repro (live, against api.anthropic.com)

Fix

Before / after (same live key, claude-sonnet-4-6)

Tests

Note / follow-up (separate from this PR)

Uh oh!

keshprad commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Repro (live, against `api.anthropic.com`)

Before / after (same live key, `claude-sonnet-4-6`)

keshprad commented Jun 10, 2026 •

edited

Loading