Skip to content

feat: InferenceParams specific for backend#126

Merged
Madzionator merged 11 commits intomainfrom
feat/inference-params-per-provider
Mar 19, 2026
Merged

feat: InferenceParams specific for backend#126
Madzionator merged 11 commits intomainfrom
feat/inference-params-per-provider

Conversation

@Madzionator
Copy link
Collaborator

This pull request refactors how inference parameters are handled for chat and agent contexts, moving from a generic InferenceParams class to a backend-specific interface and classes. Each cloud service now receives and validates its own typed inference parameters that are actually sent in API requests — but only when explicitly set by the user.

Key changes

  • IBackendInferenceParams interface with BackendType Backend as the only contract — each provider defines its own fields independently
  • 8 concrete param classes: LocalInferenceParams (Self), OpenAiInferenceParams, DeepSeekInferenceParams, GeminiInferenceParams, AnthropicInferenceParams, GroqCloudInferenceParams, XaiInferenceParams, OllamaInferenceParams
  • Nullable properties on cloud paramsnull means "don't send, let the API use its own default". This avoids errors with models that reject certain parameters (e.g. OpenAI reasoning models rejecting top_p)
  • Dictionary<string, object>? AdditionalParams on all param classes — catch-all for provider-specific parameters not covered by typed properties (e.g. max_completion_tokens for newer OpenAI models)
  • BackendParamsFactory — creates default params for a given BackendType
  • ApplyBackendParams() template method in OpenAiCompatibleService — each service override maps its typed params to API request body fields, skipping nulls
  • InvalidBackendParamsException — thrown when wrong params type is passed (e.g. OpenAiInferenceParams to DeepSeek service)
  • Chat.BackendParams defaults to nullChatService auto-creates correct params via factory when not explicitly set
  • InferPage wired up — InitializeChatContext() now sets WithBackend() + WithInferenceParams() based on selected backend
  • Integration tests for all 7 cloud providers + local Self + param mismatch validation tests

What stays unchanged

  • MemoryParams — shared across all providers (RAG/kernel memory)
  • LocalInferenceParams — keeps concrete default values (llama.cpp needs them), only renamed from InferenceParams
  • No UI changes beyond wiring correct params on backend switch

Madzionator and others added 7 commits March 18, 2026 10:46
Introduce IProviderInferenceParams and provider-specific parameter types (LocalInferenceParams, AnthropicParams, OpenAiParams, GeminiParams, GroqCloudParams, DeepSeekParams, OllamaParams, XaiParams). Replace usages of the old InferenceParams with IProviderInferenceParams/LocalInferenceParams across the codebase: Chat now stores ProviderParams (with LocalParams and InferenceGrammar helpers), mappers updated to handle LocalInferenceParams, interfaces and AgentService adjusted to accept IProviderInferenceParams, and tests updated accordingly. LLM and OpenAI-compatible services now apply provider-specific settings (temperature, max_tokens, top_p, etc.) via ApplyProviderParams, and grammar handling was unified to use Chat.InferenceGrammar. Examples updated to pass LocalInferenceParams. These changes enable multi-backend provider support and provider-specific configuration handling.
Introduce InvalidProviderParamsException for clearer bad-request errors when provider params types mismatch. Add an ExpectedParamsType contract to OpenAiCompatibleService and implement it in several providers (Gemini, GroqCloud, Ollama, OpenAi, Xai, DeepSeek). Perform runtime type checks in OpenAiCompatibleService, LLMService (local inference), and AnthropicService to throw the new exception when chat.ProviderParams is the wrong type. Misc: remove TopK from GeminiParams and stop emitting top_k in Gemini requests; preserve Chat.Backend if already set; minor cleanup (pragmas/using/whitespace).
Add ProviderParamsTests integration test suite that verifies provider-specific inference parameters for OpenAI, Anthropic, Gemini, DeepSeek, GroqCloud, Xai, local (Gemma2), and Ollama models. Each passing test checks the model returns the expected answer when custom params are supplied; additional negative tests assert InvalidProviderParamsException is thrown when wrong provider params are used. Add Xunit.SkippableFact package reference to enable conditional skipping based on environment (API keys, local model file, or Ollama availability).
Introduce ProviderParamsFactory to create IProviderInferenceParams based on BackendType. Update Home.razor to set the backend and inference params on the new chat context (casting to IChatConfigurationBuilder) before preserving message history, ensuring the correct provider settings are applied when switching models. Also simplify ChatService by using the null-coalescing assignment (chat.Backend ??= settings.BackendType) to default the backend.
@Madzionator Madzionator linked an issue Mar 18, 2026 that may be closed by this pull request
Centralize message, image and tool handling into ChatHelper and update services to use it. Added ServiceConstants properties for ToolCalls/ToolCallId/ToolName and replaced hardcoded property keys in LLMService/OpenAiCompatibleService with ServiceConstants.Properties. Moved image extraction, message merging and message-array building logic out of Anthropic/OpenAi services into ChatHelper, made BuildAnthropicRequestBody asynchronous and now use ChatHelper.BuildMessagesArray. Removed duplicated helper methods and consolidated image MIME detection and message content construction.
wisedev-pstach
wisedev-pstach previously approved these changes Mar 19, 2026
@Madzionator Madzionator merged commit 93b9913 into main Mar 19, 2026
1 check passed
@Madzionator Madzionator deleted the feat/inference-params-per-provider branch March 19, 2026 11:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

InferenceParams backend specific

2 participants