Fix Anthropic provider: strip unsupported numeric-bound keywords from tool schemas#3
Open
blakestone-x wants to merge 1 commit into
Open
Conversation
Pydantic Field(ge=, le=) constraints emit JSON-schema minimum/maximum
(and exclusiveMinimum/exclusiveMaximum) on the tool/function schema that
with_structured_output sends. Anthropic's tool-schema validator rejects
these with HTTP 400 ("For 'integer' type, property 'minimum' is not
supported"), so under SKILLSPECTOR_PROVIDER=anthropic every structured
LLM call fails and each analyzer silently falls back to static analysis.
Add a provider-aware structured-output builder: for every provider that
tolerates the keywords (OpenAI, NVIDIA) behavior is unchanged
(with_structured_output keeps the constraints). For the Anthropic
provider only, the generated tool schema is stripped of the numeric-bound
keywords before it is sent, while responses are still parsed and
validated against the original Pydantic schema, so output constraints are
preserved even though they are dropped from the request.
- providers: add is_anthropic_provider()
- structured_output: strip_unsupported_numeric_bounds + build_structured_llm
- llm_analyzer_base: route with_structured_output through build_structured_llm
- tests: regression repro + sanitizer + provider-routing coverage
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: blakestone-x <blakeastone.frisco@gmail.com>
Member
|
This mosly looks good to me. However, I want to explore using langchain_anthropic.ChatAnthropic and see if it may directly handle this? Currently providers simply expose credentials and an OpenAI compatible API, and SkillSpector continues to use langchain_openai.ChatOpenAI everywhere. Maybe this needs to change so providers return a ChatOpenAI/ChatAnthropic/etc. instance. Essentially, I want to explore whether it makes sense to move |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Under
SKILLSPECTOR_PROVIDER=anthropic, every LLM analyzer silently falls back to static analysis. The Anthropic provider pointslangchain_openai.ChatOpenAIathttps://api.anthropic.com/v1/and drives analyzers throughwith_structured_output(<Pydantic model>).The structured-output schemas use Pydantic numeric-bound constraints:
llm_analyzer_base.py—start_line: int = Field(ge=1, …),confidence: float = Field(ge=0.0, le=1.0, …)nodes/meta_analyzer.py—confidence: float = Field(ge=0.0, le=1.0, …)These emit the JSON-schema keywords
minimum/maximum(andexclusiveMinimum/exclusiveMaximum) on the generated tool/function schema. OpenAI and the NVIDIA endpoints tolerate them, but Anthropic's tool-schema validator rejects them with HTTP 400:The 400 is swallowed by the analyzer's
except Exceptionfallback, so the scan still "succeeds" — but with the LLM pass (including the meta-analyzer false-positive filter) entirely skipped. The failure is invisible unless you read debug logs.Repro (live, against
api.anthropic.com)Fix
A provider-aware structured-output builder,
build_structured_llm(llm, schema)(newsrc/skillspector/structured_output.py):llm.with_structured_output(schema), so the numeric-bound constraints are preserved in the request for endpoints that validate against them.minimum/maximum/exclusiveMinimum/exclusiveMaximumbefore it is sent (strip_unsupported_numeric_bounds). Responses are still parsed and validated against the original Pydantic schema, so the bounds are enforced on output even though they are dropped from the request.This keeps the constraints on the OpenAI/NVIDIA paths (the reason for not simply deleting the
Field(ge=, le=)declarations) while unblocking Anthropic.Provider detection is a new
is_anthropic_provider()helper inskillspector.providers, gated onSKILLSPECTOR_PROVIDER=anthropic(the only way the Anthropic provider is selected).Before / after (same live key,
claude-sonnet-4-6)A full
skillspector scan … --format jsonagainst a smallSKILL.mdwithSKILLSPECTOR_PROVIDER=anthropicnow runs the semantic discovery analyzers and the meta-analyzer LLM filter to completion (exit 0) instead of falling back to static-only.Tests
tests/unit/test_structured_output.pyadds:minimum/maximum,items,anyOf, exclusive variants, no input mutation, unrelated keys preserved),LLMAnalysisResult/MetaAnalyzerResult,with_structured_output; Anthropic → sanitizedbind_tools),confidencestill raisesValidationErrorat parse time.Existing
llm_analyzer_base/ provider suites pass unchanged (non-Anthropic path is byte-for-byte the prior behavior).Note / follow-up (separate from this PR)
The discovery analyzers can overflow Anthropic's 1M-token context on large repos: per-file inputs are batched but not chunked the way the meta-analyzer chunks oversized files, so a single very large file (or aggregate discovery input) can exceed the cap and error out. That's a distinct issue from this 400 and isn't addressed here — flagging it for a follow-up (input chunking for the discovery pass under the Anthropic token budget).