Skip to content

Add a No-Op chat client fallback for degraded startup#1214

Open
codymullins wants to merge 7 commits into
netclaw-dev:devfrom
codymullins:openspec/add-noop-chat-client-fallback
Open

Add a No-Op chat client fallback for degraded startup#1214
codymullins wants to merge 7 commits into
netclaw-dev:devfrom
codymullins:openspec/add-noop-chat-client-fallback

Conversation

@codymullins

@codymullins codymullins commented May 28, 2026

Copy link
Copy Markdown
Contributor

Netclaw used to crash at startup when no inference provider was set up, or when Models:Main.Provider had a typo. This change lets the daemon start anyway. Chat turns return a fixed message that tells the operator what to fix. Doctor, the chat status bar, and ncl status all report the degraded state instead of pretending a model is live.

What changed

  • New NoOpChatClient and NoOpChatClientProvider in Netclaw.Configuration.
  • New ProviderRuntimeValidation with three outcomes: Valid, NoProviderConfigured, Invalid. Missing providers and dangling model references both pick the No-Op path. Malformed config (bad credentials, schema errors) still fails startup loudly.
  • IChatClientProvider.IsDegraded (default interface method, false on real providers).
  • Removed the silent local-ollama default in Program.cs.
  • Doctor: new Chat Client check (pass / warn / fail).
  • Init wizard: warn-level item when no provider is picked, so the daemon still starts.
  • DaemonRuntimeStatus.Model gains Degraded and DegradedReason. ncl status renders a degraded model line and the overall flips to degraded.
  • CLI chat status bar shows a sentinel ModelId when degraded instead of the broken config value.
  • Updated PRD-004, PRD-005, and the netclaw-operations skill (version 2.9.0).

Test plan

  • dotnet test across Netclaw.Configuration.Tests, Netclaw.Daemon.Tests, Netclaw.Cli.Tests, Netclaw.Actors.Tests — all green, no regressions.
  • New tests cover all three validation outcomes, the No-Op banner shape, decorator interaction, doctor states, status wire fields, and the unknown-provider regression case.
  • dotnet slopwatch analyze, file-header verify, openspec validate all pass.
  • Smoke tape for the wizard no-provider path (deferred — wizard does not currently allow skipping the provider step, so the new warn rendering is unit-tested only).
  • Eval suite (no eval cases cover degraded mode, so no run is gated on this change).
  • Manual: start daemon with bad Models:Main.Provider, confirm ncl daemon start succeeds, ncl chat shows the sentinel, ncl status shows overall: degraded and the new model block, ncl doctor shows the warn item.

OpenSpec change scaffolding the No-Op IChatClient fallback path so the
daemon starts in a degraded-but-operational mode when no valid inference
provider/model configuration is detected, instead of failing host startup.

Includes proposal, design (tri-state validation outcome, provider-level
branch, fixed-message contract, doctor warn item, restart-required
recovery), spec deltas for netclaw-model-providers and netclaw-onboarding,
and the implementation task list.
When Netclaw starts without a valid inference provider/model configuration,
the daemon now launches in degraded mode with a No-Op IChatClient instead
of failing host startup. Chat turns return a fixed banner beginning with
"No valid model configuration detected." plus recovery steps (netclaw
doctor, netclaw model, edit netclaw.json) so operators can discover and
fix the configuration problem.

Validation is now tri-state (Valid / NoProviderConfigured / Invalid):
malformed configuration (declared provider missing credentials, schema
violations, model points to an unconfigured provider) still fails startup
loudly — only the genuinely-not-configured case selects the No-Op fallback.
The previous silent local-ollama default in Program.cs is removed.

- IChatClientProvider.IsDegraded default interface method (false on real
  providers, true on NoOpChatClientProvider) so doctor and diagnostics can
  query the state without sniffing concrete types
- netclaw doctor: new Chat Client check reporting pass/warn/fail
- Init wizard: provider health check emits warn-level item (not fail) when
  no provider is configured; daemon still starts in degraded mode
- netclaw-operations skill, PRD-004, PRD-005 updated; skill version 2.9.0
Operator typo in Models:Main.Provider (e.g. "ollama-local1" vs configured
"ollama-local") previously crashed the daemon with a raw
ProviderPluginFactory stack trace because the validation classified the
mismatch as Invalid but the DI extension fell through to lazy factory
construction instead of pre-throwing.

Two fixes:

- ProviderRuntimeValidation: "model references unknown provider" now
  returns NoProviderConfigured. From the operator's standpoint this is
  the same remediation as genuinely-no-provider (fix the model section
  via `netclaw model` or edit netclaw.json); the No-Op banner's
  "Available providers:" line surfaces the configured names so the typo
  is obvious. The doctor check correspondingly reports warn instead of
  error for this case.

- DaemonProviderServiceExtensions: when validation does return Invalid
  (currently unreachable in practice, but a real path for future stricter
  validation), register an IChatClientProvider factory that throws
  immediately with the validation reason — no more raw factory stack
  traces deep in the DI graph.

Includes regression tests at the validation, DI-registration, and doctor
layers exercising the exact "ollama-local1" scenario from the user report.
…value

When the daemon is running in No-Op degraded mode, `ncl chat` previously
showed the stale/invalid model name from netclaw.json (e.g. "gemma4" even
though that model lives on an unconfigured provider). The CLI builds its
own ModelCapabilities locally from netclaw.json without consulting the
daemon's actual chat-client state, so the broken config leaked into the
status bar.

CLI's ModelCapabilities factory now runs ProviderRuntimeValidation against
the same config it just loaded; when the validation outcome is non-Valid
it substitutes a sentinel ModelId ('(no model — run \`netclaw model\`)')
and zeros the context-window probe — matching the daemon's No-Op selection
without a round-trip.
…ealth

ncl status previously reported overall=healthy and showed the configured
(broken) model values when the daemon was running with the No-Op chat
client — e.g. "model: gemma4 (provider: ollama-local1, ...)" — completely
hiding the degraded state from operators.

Wire-level: DaemonRuntimeStatus.Model gains Degraded + DegradedReason.
DaemonRuntimeStatusService populates them from the running
IChatClientProvider.IsDegraded and the ProviderRuntimeValidation singleton.
ResolveOverallStatus now flips to "degraded" whenever the chat client is
No-Op, regardless of other subsystem health (the daemon can't actually
serve model responses).

CLI formatter: when Degraded=true, the model section renders as
"model: (none — No-Op chat client active)" with the validation reason
and a recovery hint, instead of the stale ModelId/Provider line.

Status service constructor takes the new dependencies as optional
trailing parameters so the seven existing tests don't need touching.
Resolve conflicts:
- CLI Program.cs: keep BuildModelCapabilities helper, retarget it to dev's
  new ContextWindowResolution.ResolveRuntimeAsync API.
- DaemonProviderServiceExtensions: merge both new params (validation +
  retryPolicy) into AddDaemonLlmProviders.
- Daemon Program.cs: register validation singleton and pass both validation
  and streamingRetryPolicy to AddDaemonLlmProviders.
- netclaw-operations SKILL.md: bump version to 2.11.2 (above dev's 2.11.1).
- DaemonRuntimeStatusServiceTests: port degraded-mode test onto dev's
  CreateService helper (IChannelRegistry); extend helper with
  chatClientProvider/providerValidation params.
@codymullins codymullins marked this pull request as ready for review June 11, 2026 15:14
Copilot AI review requested due to automatic review settings June 11, 2026 15:14

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR introduces a No-Op chat client fallback and a startup-time validation result so the daemon can start in a degraded-but-operational mode when no inference provider/model is configured, while surfacing the state clearly via status/doctor/wizard UI.

Changes:

  • Add ProviderRuntimeValidation + NoOpChatClient/NoOpChatClientProvider and wire them into daemon provider registration.
  • Surface degraded chat state via runtime status (Degraded + DegradedReason) and CLI rendering (netclaw status, doctor check).
  • Update onboarding wizard health checks to render “warn/degraded” distinct from failures.

Reviewed changes

Copilot reviewed 30 out of 30 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
src/Netclaw.Daemon/Program.cs Removes silent provider fallback; evaluates runtime validation and skips capability probing when degraded.
src/Netclaw.Daemon/Gateway/DaemonRuntimeStatusService.cs Threads degraded state into overall/runtime status and exposes degraded reason.
src/Netclaw.Daemon/Configuration/DaemonProviderServiceExtensions.cs Branches DI registration to No-Op provider (degraded) vs real provider, based on validation.
src/Netclaw.Daemon.Tests/Gateway/DaemonRuntimeStatusServiceTests.cs Adds test coverage for degraded chat client affecting status output.
src/Netclaw.Daemon.Tests/Configuration/DaemonProviderServiceExtensionsTests.cs Adds DI tests for No-Op registration behavior.
src/Netclaw.Configuration/ProviderRuntimeValidation.cs Introduces tri-state validation type used in composition/diagnostics.
src/Netclaw.Configuration/NoOpChatClientProvider.cs Adds IChatClientProvider implementation for degraded mode.
src/Netclaw.Configuration/NoOpChatClient.cs Adds deterministic banner-only IChatClient implementation.
src/Netclaw.Configuration/IChatClientProvider.cs Adds IsDegraded default interface member for diagnostics.
src/Netclaw.Configuration/DaemonRuntimeStatus.cs Extends wire model with degraded fields.
src/Netclaw.Configuration.Tests/ProviderRuntimeValidationTests.cs Adds unit tests for validation outcomes.
src/Netclaw.Configuration.Tests/NoOpChatClientTests.cs Adds unit tests for banner content, tool suppression, and streaming behavior.
src/Netclaw.Configuration.Tests/NoOpChatClientProviderTests.cs Adds unit tests for provider degraded flag and stable client instance per role.
src/Netclaw.Cli/Tui/Wizard/Steps/ProviderStepViewModel.cs Updates wizard health check semantics to emit warn item when provider not selected.
src/Netclaw.Cli/Tui/Wizard/Steps/HealthCheckStepView.cs Renders warn/degraded items distinctly from pass/fail.
src/Netclaw.Cli/Tui/InitWizardViewModel.cs Extends HealthCheckItem with IsWarning.
src/Netclaw.Cli/Program.cs Updates netclaw status rendering for degraded model; centralizes CLI model capabilities building with degraded short-circuit.
src/Netclaw.Cli/Doctor/DoctorRegistrationExtensions.cs Registers new doctor check for chat client degraded/valid state.
src/Netclaw.Cli/Doctor/ChatClientDoctorCheck.cs Implements doctor check using runtime validation derived from config.
src/Netclaw.Cli.Tests/Tui/Wizard/ProviderStepViewModelTests.cs Adds tests ensuring warn-level health item appears when no provider is selected.
src/Netclaw.Cli.Tests/Doctor/ChatClientDoctorCheckTests.cs Adds tests for new doctor check scenarios.
openspec/changes/add-noop-chat-client-fallback/tasks.md Tracks implementation tasks for the No-Op fallback change.
openspec/changes/add-noop-chat-client-fallback/specs/netclaw-onboarding/spec.md Specifies onboarding behavior changes for degraded startup.
openspec/changes/add-noop-chat-client-fallback/specs/netclaw-model-providers/spec.md Specifies No-Op client contract + validation tri-state requirements.
openspec/changes/add-noop-chat-client-fallback/proposal.md Documents motivation and behavioral impact.
openspec/changes/add-noop-chat-client-fallback/design.md Documents key design decisions and tradeoffs.
openspec/changes/add-noop-chat-client-fallback/.openspec.yaml Adds openspec metadata for this change.
feeds/skills/.system/files/netclaw-operations/SKILL.md Adds operator-facing documentation for degraded No-Op mode and recovery steps.
docs/prd/PRD-005-model-provider-strategy.md Updates product requirements for degraded startup behavior.
docs/prd/PRD-004-cli-onboarding-and-config.md Updates doctor requirements to include Chat Client check behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +14 to +19
public enum ProviderRuntimeStatus
{
Valid,
NoProviderConfigured,
Invalid,
}
Comment on lines +31 to +39
public static ProviderRuntimeValidation Evaluate(
IReadOnlyDictionary<string, ProviderEntry> providers,
ModelSelection models)
{
var available = providers.Keys.ToList();
var main = models.Main;
var providersEmpty = providers.Count == 0;
var modelMissing = string.IsNullOrWhiteSpace(main.Provider) ||
string.IsNullOrWhiteSpace(main.ModelId);
Comment on lines +79 to +80
return new(ProviderRuntimeStatus.Valid, null, available);
}
Comment on lines +38 to +42
return Task.FromResult(validation.Status switch
{
ProviderRuntimeStatus.Valid => DoctorCheckResult.Pass(
CheckName,
$"Real chat client configured for provider '{models.Main.Provider}' / model '{models.Main.ModelId}'."),
Comment on lines +63 to +77
private static Dictionary<string, ProviderEntry> ReadProviders(JsonObject? root)
{
var providers = new Dictionary<string, ProviderEntry>(StringComparer.OrdinalIgnoreCase);
if (root?["Providers"] is not JsonObject providersObj)
return providers;

foreach (var (name, value) in providersObj)
{
// We only need to know which provider keys exist for the validation
// outcome — credentials and types are checked elsewhere.
providers[name] = new ProviderEntry
{
Type = (value as JsonObject)?["Type"]?.GetValue<string>() ?? "",
};
}
Comment on lines +1861 to +1876
static ModelCapabilities BuildModelCapabilities(IConfiguration configuration, DaemonApi daemonApi)
{
var providers = ProviderConfigurationLoader.Load(configuration.GetSection("Providers"));
var models = configuration.GetSection("Models")
.Get<ModelSelection>() ?? new ModelSelection();
var validation = ProviderRuntimeValidation.Evaluate(providers, models);

if (validation.Status != ProviderRuntimeStatus.Valid)
{
return new ModelCapabilities
{
ModelId = "(no model — run `netclaw model`)",
ContextWindowTokens = 0,
CompactionModelId = null,
};
}
IReadOnlyDictionary<string, ProviderEntry> providers,
ModelSelection models)
{
var available = providers.Keys.ToList();
Comment on lines +75 to +76
$"model 'Main' references provider '{main.Provider}' which is not configured (available: {string.Join(", ", available)})",
available);
Comment on lines +40 to +48
public async IAsyncEnumerable<ChatResponseUpdate> GetStreamingResponseAsync(
IEnumerable<ChatMessage> messages,
ChatOptions? options = null,
[EnumeratorCancellation] CancellationToken cancellationToken = default)
{
// Single chunk — no simulated token streaming.
yield return new ChatResponseUpdate(ChatRole.Assistant, _banner);
await Task.CompletedTask;
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants