Skip to content

🤖 feat: support all-namespaces LIST across multiple CoderControlPlane instances#85

Merged
ThomasK33 merged 2 commits intomainfrom
query-ynej
Feb 13, 2026
Merged

🤖 feat: support all-namespaces LIST across multiple CoderControlPlane instances#85
ThomasK33 merged 2 commits intomainfrom
query-ynej

Conversation

@ThomasK33
Copy link
Member

Summary

Enable all-namespaces LIST to aggregate results across every eligible CoderControlPlane instance (one per namespace) for CoderTemplate and CoderWorkspace resources. Previously, kubectl get codertemplates -A failed with:

multiple eligible CoderControlPlane instances across namespaces; multi-instance support is planned

Background

The aggregated API server's storage assumed exactly one eligible CoderControlPlane when handling all-namespaces LIST (request namespace is empty). When multiple eligible control planes exist across namespaces, client list-watch loops fail, blocking kubectl get codertemplates -A, kubectl get coderworkspaces -A, and controllers/informers that use list+watch.

Implementation

New interfacecoder.NamespaceLister with EligibleNamespaces(ctx) ([]string, error):

  • Opt-in capability that lets storage enumerate namespaces a provider can serve.
  • Keeps storage decoupled from concrete provider types.

Provider implementations:

  • ControlPlaneClientProvider.EligibleNamespaces — discovers eligible CPs via findEligibleControlPlanes, groups by namespace, rejects duplicates within a namespace, returns sorted namespace list.
  • StaticClientProvider.EligibleNamespaces — returns the pinned namespace.

Storage fan-out in TemplateStorage.List and WorkspaceStorage.List:

  • When request namespace is empty and provider implements NamespaceLister: fan out across all eligible namespaces, query each, convert with correct per-namespace metadata, and return aggregated results sorted by (namespace, name).
  • Namespaced requests and providers without NamespaceLister keep existing behavior unchanged.
  • Error semantics: fail-fast (if any namespace fails, return error immediately).

Validation

  • make verify-vendor
  • make test ✅ (8 new tests + no regressions)
  • make build
  • make lint

Risks

Low risk — the fan-out path only activates for all-namespaces LIST when the provider supports NamespaceLister. Single-namespace requests and the static provider path are unchanged. Multiple eligible CPs within the same namespace are still explicitly rejected.


📋 Implementation Plan

Plan: Support querying multiple CoderControlPlane instances (multi-namespace aggregation)

Context / Why

Today, the aggregated API server’s CoderTemplate / CoderWorkspace storage assumes exactly one eligible CoderControlPlane when handling all-namespaces LIST (i.e. request namespace is empty). When multiple eligible control planes exist across namespaces, client list-watch loops fail with:

  • multiple eligible CoderControlPlane instances across namespaces; multi-instance support is planned

This blocks workflows like:

  • kubectl get codertemplates -A
  • controllers/informers that use list+watch against aggregation.coder.com/v1alpha1

Goal

Enable multi-instance querying by making all-namespaces LIST aggregate results across every eligible CoderControlPlane (one per namespace) for:

  • aggregation.coder.com/v1alpha1/CoderTemplate
  • aggregation.coder.com/v1alpha1/CoderWorkspace

Namespaced requests (-n <namespace>) must keep current behavior.

Non-goals (v1)

  • Supporting multiple eligible CoderControlPlane objects within the same Kubernetes namespace.
  • Implementing a “true” upstream-backed watch (watch events that reflect changes that occur directly in Coder without going through the aggregated API server).

Acceptance Criteria

  • kubectl get codertemplates -A returns templates from all eligible CoderControlPlane namespaces.
  • kubectl get coderworkspaces -A returns workspaces from all eligible CoderControlPlane namespaces.
  • kubectl get codertemplates -A --watch no longer fails due to the multi-instance discovery error (because the initial LIST succeeds).
  • Standalone aggregated-apiserver mode (static provider pinned to --coder-namespace) continues working unchanged.

Evidence (code references)

  • Single-instance guard + error message: internal/aggregated/coder/controlplane_provider.go
    • ClientForNamespace errors on len(eligible) > 1
    • DefaultNamespace errors on multiple eligible across namespaces
    • multipleEligibleControlPlaneMessage("") returns the exact string shown in the screenshot
  • LIST path uses default namespace + empty-namespace client resolution:
    • internal/aggregated/storage/template.go: (*TemplateStorage).List
    • internal/aggregated/storage/workspace.go: (*WorkspaceStorage).List
    • both call namespaceForListConversion(...) and then clientForNamespace(ctx, requestNamespace) where requestNamespace=="" triggers the provider’s “pick exactly one CP” logic
  • WATCH path is an in-memory broadcaster: internal/aggregated/storage/watch.go, template.go, workspace.go
    • the screenshot’s “Watcher failed …” is consistent with list-watch clients failing during the initial LIST phase

Implementation Details

1) Add an optional provider capability: list eligible namespaces

File: internal/aggregated/coder/provider.go

Add a small, opt-in interface that lets storage enumerate the set of namespaces it can serve.

// NamespaceLister can enumerate namespaces served by a ClientProvider.
// Used to implement all-namespaces LIST by fanning out across instances.
//
// Implementations should only return namespaces that are eligible/ready.
// Returned namespaces must be non-empty and should be deterministic (sorted).
type NamespaceLister interface {
    EligibleNamespaces(ctx context.Context) ([]string, error)
}

Rationale: keeps storage decoupled from concrete provider types (StaticClientProvider vs ControlPlaneClientProvider).

2) Implement NamespaceLister

2a) Dynamic control-plane provider

File: internal/aggregated/coder/controlplane_provider.go

Implement:

  • func (p *ControlPlaneClientProvider) EligibleNamespaces(ctx context.Context) ([]string, error)

Algorithm:

  1. eligibleCPs, err := p.findEligibleControlPlanes(ctx, "")
  2. If len(eligibleCPs) == 0: return ServiceUnavailable(noEligibleControlPlaneMessage("")).
  3. Group eligible CPs by cp.Namespace.
  4. If any namespace has >1 eligible CP: return BadRequest(multipleEligibleControlPlaneMessage(namespace)) (still not supported).
  5. Return the set of namespaces, sorted for determinism.

Defensive programming:

  • Assert/validate non-nil receiver and non-nil context.
  • Ensure returned namespaces are not empty; crash/assert if an eligible CP has an empty namespace (should be impossible).

2b) Static provider

File: internal/aggregated/coder/provider.go

Implement:

  • func (p *StaticClientProvider) EligibleNamespaces(ctx context.Context) ([]string, error)

Behavior:

  • If p.Namespace == "": return ServiceUnavailable("static provider has no default namespace") (consistent with existing behavior).
  • Else return []string{p.Namespace}.

3) Update storage LIST to fan out when request namespace is empty

3a) Templates

File: internal/aggregated/storage/template.go

Update func (s *TemplateStorage) List(ctx context.Context, _ *metainternalversion.ListOptions) ...:

  • If request namespace is non-empty: keep existing behavior.
  • If request namespace is empty:
    • If s.provider implements coder.NamespaceLister:
      • namespaces := lister.EligibleNamespaces(ctx)
      • For each namespace:
        • sdk := s.clientForNamespace(ctx, namespace)
        • templates := sdk.Templates(ctx, codersdk.TemplateFilter{})
        • Convert each template using convert.TemplateToK8s(namespace, template) and append.
      • Sort list.Items by (namespace, name) for deterministic output.
    • Else: fallback to existing namespaceForListConversion + clientForNamespace(ctx, "") behavior.

3b) Workspaces

File: internal/aggregated/storage/workspace.go

Update func (s *WorkspaceStorage) List(ctx context.Context, _ *metainternalversion.ListOptions) ... with the same fan-out strategy.

Concurrency (recommended, not required for correctness):

  • Use an errgroup.Group + a small semaphore/limit (e.g. 4) so large numbers of namespaces don’t produce N sequential slow requests.

Error semantics (v1):

  • Fail-fast: if any namespace fails to resolve a client or list objects, return an error (preserves today’s “LIST is authoritative” behavior).

4) Tests

4a) Provider unit tests

File: internal/aggregated/coder/controlplane_provider_test.go

Add tests for EligibleNamespaces:

  • Returns namespaces for multiple eligible CPs across namespaces.
  • Returns BadRequest when a single namespace has multiple eligible CPs.
  • Returns ServiceUnavailable when no eligible CPs exist.

4b) Storage aggregation tests

File: internal/aggregated/storage/storage_test.go

Add tests verifying all-namespaces aggregation:

  • Create two mock Coder servers with distinct seeded templates/workspaces.
  • Create a test provider implementing:
    • coder.ClientProvider (map namespace → client)
    • coder.NamespaceLister (returns both namespaces)
  • Assert:
    • TemplateStorage.List(context.Background(), nil) returns items from both namespaces and each item has the correct metadata.namespace.
    • same for WorkspaceStorage.List.

5) Docs / behavior notes

File: internal/aggregated/storage/doc.go

Update the “v1 semantics” comment to note:

  • all-namespaces LIST aggregates across eligible CoderControlPlane namespaces when the provider supports it.

6) Validation

Run locally after implementation:

  • make test
  • make build
  • make lint
  • make verify-vendor (expected no-op; only if deps are added)

Optional follow-ups (not required to fix the observed error)
  1. Upstream-backed watch: implement per-control-plane polling/websocket to generate watch events even when changes happen directly in Coder.

  2. Partial failure mode: consider returning partial results for all-namespaces LIST when one instance is down (would require a clear API/UX decision; Kubernetes APIs generally favor fail-fast).

  3. Performance & caching: optionally cache per-namespace SDK clients/tokens (with invalidation on secret changes) to reduce per-request secret reads.


Generated with mux • Model: anthropic:claude-opus-4-6 • Thinking: xhigh • Cost: $3.28

@ThomasK33
Copy link
Member Author

@codex review

@chatgpt-codex-connector
Copy link

Codex Review: Didn't find any major issues. Keep it up!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@ThomasK33 ThomasK33 added this pull request to the merge queue Feb 13, 2026
@ThomasK33
Copy link
Member Author

Merged via the queue into main with commit 0cdc937 Feb 13, 2026
8 checks passed
@ThomasK33 ThomasK33 deleted the query-ynej branch February 13, 2026 20:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant