Add a No-Op chat client fallback for degraded startup#1214
Open
codymullins wants to merge 7 commits into
Open
Conversation
OpenSpec change scaffolding the No-Op IChatClient fallback path so the daemon starts in a degraded-but-operational mode when no valid inference provider/model configuration is detected, instead of failing host startup. Includes proposal, design (tri-state validation outcome, provider-level branch, fixed-message contract, doctor warn item, restart-required recovery), spec deltas for netclaw-model-providers and netclaw-onboarding, and the implementation task list.
When Netclaw starts without a valid inference provider/model configuration, the daemon now launches in degraded mode with a No-Op IChatClient instead of failing host startup. Chat turns return a fixed banner beginning with "No valid model configuration detected." plus recovery steps (netclaw doctor, netclaw model, edit netclaw.json) so operators can discover and fix the configuration problem. Validation is now tri-state (Valid / NoProviderConfigured / Invalid): malformed configuration (declared provider missing credentials, schema violations, model points to an unconfigured provider) still fails startup loudly — only the genuinely-not-configured case selects the No-Op fallback. The previous silent local-ollama default in Program.cs is removed. - IChatClientProvider.IsDegraded default interface method (false on real providers, true on NoOpChatClientProvider) so doctor and diagnostics can query the state without sniffing concrete types - netclaw doctor: new Chat Client check reporting pass/warn/fail - Init wizard: provider health check emits warn-level item (not fail) when no provider is configured; daemon still starts in degraded mode - netclaw-operations skill, PRD-004, PRD-005 updated; skill version 2.9.0
Operator typo in Models:Main.Provider (e.g. "ollama-local1" vs configured "ollama-local") previously crashed the daemon with a raw ProviderPluginFactory stack trace because the validation classified the mismatch as Invalid but the DI extension fell through to lazy factory construction instead of pre-throwing. Two fixes: - ProviderRuntimeValidation: "model references unknown provider" now returns NoProviderConfigured. From the operator's standpoint this is the same remediation as genuinely-no-provider (fix the model section via `netclaw model` or edit netclaw.json); the No-Op banner's "Available providers:" line surfaces the configured names so the typo is obvious. The doctor check correspondingly reports warn instead of error for this case. - DaemonProviderServiceExtensions: when validation does return Invalid (currently unreachable in practice, but a real path for future stricter validation), register an IChatClientProvider factory that throws immediately with the validation reason — no more raw factory stack traces deep in the DI graph. Includes regression tests at the validation, DI-registration, and doctor layers exercising the exact "ollama-local1" scenario from the user report.
…value
When the daemon is running in No-Op degraded mode, `ncl chat` previously
showed the stale/invalid model name from netclaw.json (e.g. "gemma4" even
though that model lives on an unconfigured provider). The CLI builds its
own ModelCapabilities locally from netclaw.json without consulting the
daemon's actual chat-client state, so the broken config leaked into the
status bar.
CLI's ModelCapabilities factory now runs ProviderRuntimeValidation against
the same config it just loaded; when the validation outcome is non-Valid
it substitutes a sentinel ModelId ('(no model — run \`netclaw model\`)')
and zeros the context-window probe — matching the daemon's No-Op selection
without a round-trip.
…ealth ncl status previously reported overall=healthy and showed the configured (broken) model values when the daemon was running with the No-Op chat client — e.g. "model: gemma4 (provider: ollama-local1, ...)" — completely hiding the degraded state from operators. Wire-level: DaemonRuntimeStatus.Model gains Degraded + DegradedReason. DaemonRuntimeStatusService populates them from the running IChatClientProvider.IsDegraded and the ProviderRuntimeValidation singleton. ResolveOverallStatus now flips to "degraded" whenever the chat client is No-Op, regardless of other subsystem health (the daemon can't actually serve model responses). CLI formatter: when Degraded=true, the model section renders as "model: (none — No-Op chat client active)" with the validation reason and a recovery hint, instead of the stale ModelId/Provider line. Status service constructor takes the new dependencies as optional trailing parameters so the seven existing tests don't need touching.
Resolve conflicts: - CLI Program.cs: keep BuildModelCapabilities helper, retarget it to dev's new ContextWindowResolution.ResolveRuntimeAsync API. - DaemonProviderServiceExtensions: merge both new params (validation + retryPolicy) into AddDaemonLlmProviders. - Daemon Program.cs: register validation singleton and pass both validation and streamingRetryPolicy to AddDaemonLlmProviders. - netclaw-operations SKILL.md: bump version to 2.11.2 (above dev's 2.11.1). - DaemonRuntimeStatusServiceTests: port degraded-mode test onto dev's CreateService helper (IChannelRegistry); extend helper with chatClientProvider/providerValidation params.
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR introduces a No-Op chat client fallback and a startup-time validation result so the daemon can start in a degraded-but-operational mode when no inference provider/model is configured, while surfacing the state clearly via status/doctor/wizard UI.
Changes:
- Add
ProviderRuntimeValidation+NoOpChatClient/NoOpChatClientProviderand wire them into daemon provider registration. - Surface degraded chat state via runtime status (
Degraded+DegradedReason) and CLI rendering (netclaw status, doctor check). - Update onboarding wizard health checks to render “warn/degraded” distinct from failures.
Reviewed changes
Copilot reviewed 30 out of 30 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| src/Netclaw.Daemon/Program.cs | Removes silent provider fallback; evaluates runtime validation and skips capability probing when degraded. |
| src/Netclaw.Daemon/Gateway/DaemonRuntimeStatusService.cs | Threads degraded state into overall/runtime status and exposes degraded reason. |
| src/Netclaw.Daemon/Configuration/DaemonProviderServiceExtensions.cs | Branches DI registration to No-Op provider (degraded) vs real provider, based on validation. |
| src/Netclaw.Daemon.Tests/Gateway/DaemonRuntimeStatusServiceTests.cs | Adds test coverage for degraded chat client affecting status output. |
| src/Netclaw.Daemon.Tests/Configuration/DaemonProviderServiceExtensionsTests.cs | Adds DI tests for No-Op registration behavior. |
| src/Netclaw.Configuration/ProviderRuntimeValidation.cs | Introduces tri-state validation type used in composition/diagnostics. |
| src/Netclaw.Configuration/NoOpChatClientProvider.cs | Adds IChatClientProvider implementation for degraded mode. |
| src/Netclaw.Configuration/NoOpChatClient.cs | Adds deterministic banner-only IChatClient implementation. |
| src/Netclaw.Configuration/IChatClientProvider.cs | Adds IsDegraded default interface member for diagnostics. |
| src/Netclaw.Configuration/DaemonRuntimeStatus.cs | Extends wire model with degraded fields. |
| src/Netclaw.Configuration.Tests/ProviderRuntimeValidationTests.cs | Adds unit tests for validation outcomes. |
| src/Netclaw.Configuration.Tests/NoOpChatClientTests.cs | Adds unit tests for banner content, tool suppression, and streaming behavior. |
| src/Netclaw.Configuration.Tests/NoOpChatClientProviderTests.cs | Adds unit tests for provider degraded flag and stable client instance per role. |
| src/Netclaw.Cli/Tui/Wizard/Steps/ProviderStepViewModel.cs | Updates wizard health check semantics to emit warn item when provider not selected. |
| src/Netclaw.Cli/Tui/Wizard/Steps/HealthCheckStepView.cs | Renders warn/degraded items distinctly from pass/fail. |
| src/Netclaw.Cli/Tui/InitWizardViewModel.cs | Extends HealthCheckItem with IsWarning. |
| src/Netclaw.Cli/Program.cs | Updates netclaw status rendering for degraded model; centralizes CLI model capabilities building with degraded short-circuit. |
| src/Netclaw.Cli/Doctor/DoctorRegistrationExtensions.cs | Registers new doctor check for chat client degraded/valid state. |
| src/Netclaw.Cli/Doctor/ChatClientDoctorCheck.cs | Implements doctor check using runtime validation derived from config. |
| src/Netclaw.Cli.Tests/Tui/Wizard/ProviderStepViewModelTests.cs | Adds tests ensuring warn-level health item appears when no provider is selected. |
| src/Netclaw.Cli.Tests/Doctor/ChatClientDoctorCheckTests.cs | Adds tests for new doctor check scenarios. |
| openspec/changes/add-noop-chat-client-fallback/tasks.md | Tracks implementation tasks for the No-Op fallback change. |
| openspec/changes/add-noop-chat-client-fallback/specs/netclaw-onboarding/spec.md | Specifies onboarding behavior changes for degraded startup. |
| openspec/changes/add-noop-chat-client-fallback/specs/netclaw-model-providers/spec.md | Specifies No-Op client contract + validation tri-state requirements. |
| openspec/changes/add-noop-chat-client-fallback/proposal.md | Documents motivation and behavioral impact. |
| openspec/changes/add-noop-chat-client-fallback/design.md | Documents key design decisions and tradeoffs. |
| openspec/changes/add-noop-chat-client-fallback/.openspec.yaml | Adds openspec metadata for this change. |
| feeds/skills/.system/files/netclaw-operations/SKILL.md | Adds operator-facing documentation for degraded No-Op mode and recovery steps. |
| docs/prd/PRD-005-model-provider-strategy.md | Updates product requirements for degraded startup behavior. |
| docs/prd/PRD-004-cli-onboarding-and-config.md | Updates doctor requirements to include Chat Client check behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+14
to
+19
| public enum ProviderRuntimeStatus | ||
| { | ||
| Valid, | ||
| NoProviderConfigured, | ||
| Invalid, | ||
| } |
Comment on lines
+31
to
+39
| public static ProviderRuntimeValidation Evaluate( | ||
| IReadOnlyDictionary<string, ProviderEntry> providers, | ||
| ModelSelection models) | ||
| { | ||
| var available = providers.Keys.ToList(); | ||
| var main = models.Main; | ||
| var providersEmpty = providers.Count == 0; | ||
| var modelMissing = string.IsNullOrWhiteSpace(main.Provider) || | ||
| string.IsNullOrWhiteSpace(main.ModelId); |
Comment on lines
+79
to
+80
| return new(ProviderRuntimeStatus.Valid, null, available); | ||
| } |
Comment on lines
+38
to
+42
| return Task.FromResult(validation.Status switch | ||
| { | ||
| ProviderRuntimeStatus.Valid => DoctorCheckResult.Pass( | ||
| CheckName, | ||
| $"Real chat client configured for provider '{models.Main.Provider}' / model '{models.Main.ModelId}'."), |
Comment on lines
+63
to
+77
| private static Dictionary<string, ProviderEntry> ReadProviders(JsonObject? root) | ||
| { | ||
| var providers = new Dictionary<string, ProviderEntry>(StringComparer.OrdinalIgnoreCase); | ||
| if (root?["Providers"] is not JsonObject providersObj) | ||
| return providers; | ||
|
|
||
| foreach (var (name, value) in providersObj) | ||
| { | ||
| // We only need to know which provider keys exist for the validation | ||
| // outcome — credentials and types are checked elsewhere. | ||
| providers[name] = new ProviderEntry | ||
| { | ||
| Type = (value as JsonObject)?["Type"]?.GetValue<string>() ?? "", | ||
| }; | ||
| } |
Comment on lines
+1861
to
+1876
| static ModelCapabilities BuildModelCapabilities(IConfiguration configuration, DaemonApi daemonApi) | ||
| { | ||
| var providers = ProviderConfigurationLoader.Load(configuration.GetSection("Providers")); | ||
| var models = configuration.GetSection("Models") | ||
| .Get<ModelSelection>() ?? new ModelSelection(); | ||
| var validation = ProviderRuntimeValidation.Evaluate(providers, models); | ||
|
|
||
| if (validation.Status != ProviderRuntimeStatus.Valid) | ||
| { | ||
| return new ModelCapabilities | ||
| { | ||
| ModelId = "(no model — run `netclaw model`)", | ||
| ContextWindowTokens = 0, | ||
| CompactionModelId = null, | ||
| }; | ||
| } |
| IReadOnlyDictionary<string, ProviderEntry> providers, | ||
| ModelSelection models) | ||
| { | ||
| var available = providers.Keys.ToList(); |
Comment on lines
+75
to
+76
| $"model 'Main' references provider '{main.Provider}' which is not configured (available: {string.Join(", ", available)})", | ||
| available); |
Comment on lines
+40
to
+48
| public async IAsyncEnumerable<ChatResponseUpdate> GetStreamingResponseAsync( | ||
| IEnumerable<ChatMessage> messages, | ||
| ChatOptions? options = null, | ||
| [EnumeratorCancellation] CancellationToken cancellationToken = default) | ||
| { | ||
| // Single chunk — no simulated token streaming. | ||
| yield return new ChatResponseUpdate(ChatRole.Assistant, _banner); | ||
| await Task.CompletedTask; | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Netclaw used to crash at startup when no inference provider was set up, or when
Models:Main.Providerhad a typo. This change lets the daemon start anyway. Chat turns return a fixed message that tells the operator what to fix. Doctor, the chat status bar, andncl statusall report the degraded state instead of pretending a model is live.What changed
NoOpChatClientandNoOpChatClientProviderinNetclaw.Configuration.ProviderRuntimeValidationwith three outcomes:Valid,NoProviderConfigured,Invalid. Missing providers and dangling model references both pick the No-Op path. Malformed config (bad credentials, schema errors) still fails startup loudly.IChatClientProvider.IsDegraded(default interface method,falseon real providers).Program.cs.DaemonRuntimeStatus.ModelgainsDegradedandDegradedReason.ncl statusrenders a degraded model line and the overall flips to degraded.ModelIdwhen degraded instead of the broken config value.Test plan
dotnet testacrossNetclaw.Configuration.Tests,Netclaw.Daemon.Tests,Netclaw.Cli.Tests,Netclaw.Actors.Tests— all green, no regressions.dotnet slopwatch analyze, file-header verify,openspec validateall pass.Models:Main.Provider, confirmncl daemon startsucceeds,ncl chatshows the sentinel,ncl statusshowsoverall: degradedand the new model block,ncl doctorshows the warn item.