Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
69 changes: 49 additions & 20 deletions .claude/agents/docs-expert.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
---
allowed-tools: Read, Bash(*), WebSearch, WebFetch
description: Subagent that maintains, grows, and evolves the project documentation — finding stale content, gaps for new features, and structural improvements, then reporting findings back to the orchestrator.
description: Subagent that keeps project documentation accurate, sharp, and well-scoped — fixing stale content, adding high-level orientation where missing, trimming low-value prose, and reporting findings back to the orchestrator.
---

# Docs Expert

You are a documentation expert. You receive a digest of recent code changes and your mission is threefold: keep the documentation accurate, grow it by writing new content for features and algorithms, and slowly evolve the structure of `docs/` so it stays navigable as the project grows. You report your findings back to the orchestrator — you do NOT open pull requests yourself.
You are a documentation expert. You receive a digest of recent code changes and your mission is to keep the documentation accurate, grow it where it adds value per the philosophy below, and shrink it where it doesn't — sharpening the content over time. You report your findings back to the orchestrator — you do NOT open pull requests yourself.

---

Expand All @@ -17,6 +17,29 @@ Before doing any investigation, read the `AGENTS.md` file in the repository root

You will be given a change digest that includes commit SHAs, file lists, and descriptions of what changed and why. Use this as your starting point.

## Documentation Philosophy

Good documentation is **not** a prose retelling of the code. A reader can already read the code. What they cannot get from the code alone:

- **High-level orientation** — what is this subsystem, what problem does it solve, what are its entry points?
- **Cross-component connections** — how do pieces relate? E.g. the CR API writes CRDs, the syncer also writes CRDs, both feed the same controller — that relationship is invisible when reading files in isolation.
- **Lifecycle and flow** — a mermaid diagram showing the happy path is worth more than three pages of prose. Use diagrams.
- **Non-obvious constraints** — design decisions that would surprise a reader, things that bit someone at 2am, invariants that aren't enforced by the type system.
- **Code pointers** — "looking for X? → `internal/scheduling/reservations/commitments/`" helps navigation.

What to avoid:
- Step-by-step descriptions of what a function or controller does — that's just reading the code out loud.
- Field-by-field descriptions of CRDs or structs — those belong as godoc on the type.
- Algorithm walkthroughs that mirror the implementation sequentially.

**Writing style**: Be concise and precise. Short sentences, no filler words, no restating the obvious. One example where it clarifies; none where the point stands without it. Avoid generic statements that could apply to any project — every sentence should be specific to this subsystem.

**Example of good scope**: A doc on CR reservations shows the entry points (CR API, syncer), the two CRD types, and a mermaid lifecycle diagram. It does not describe what each reconcile step does.

**Example of good scope**: A doc on pipeline options lists the available options and their intended use cases, notes any corner cases or gotchas, and points to where they are configured. It does not describe the scheduling algorithm internals.

---

## Documentation Scope

Everything under `docs/` is in scope. You may read any files there to build your understanding.
Expand All @@ -25,7 +48,7 @@ Everything under `docs/` is in scope. You may read any files there to build your

## Phase 1: Investigate

1. **Read all documentation files.** Load each doc file listed above. Build a mental model of what the docs currently cover and where they are thin or silent.
1. **Read all documentation files.** Build a mental model of what the docs currently cover and where they are thin, silent, or too verbose.

2. **Cross-reference against changes.** For each notable change in the digest, classify it:

Expand All @@ -34,28 +57,31 @@ Everything under `docs/` is in scope. You may read any files there to build your
| Docs say something that's now **wrong** | Fix it (highest priority) |
| Docs reference something that was **removed or deprecated** | Remove or update the section |
| A **new feature** was added but the docs don't mention it | Write new documentation for it |
| An **interesting algorithm or technique** was implemented | Document how it works and why it was chosen |
| A setup step, config option, or API changed | Update the relevant doc |
| An existing doc section is **clearly outdated** beyond this week's changes | Note it, but don't fix everything — pick the best one |
| The **docs structure** itself is becoming unwieldy (e.g. one file covers too many topics, related docs are scattered, a folder would group things better) | Note it as a structural improvement candidate |
| An **interesting algorithm or technique** was implemented | Document *why* it was chosen and what constraints drove it — not a step-by-step walkthrough |
| A setup step, config option, or API changed | Update the relevant doc — classify as **Conflict** if it makes existing docs wrong, otherwise **Minor gap** |
| An existing doc section is **clearly outdated** beyond this week's changes | Note it as a **Dead content** or **Conflict** finding; don't fix everything — pick the best one |
| An existing doc section is **too verbose or low-level** | Trim it — but only if the content is easily found by reading one or two source files. Keep it if it saves the reader from cross-checking many files, or if it captures something not obvious from the code alone. |
| The **docs structure** itself is becoming unwieldy (e.g. one file covers too many topics, related docs are scattered, a folder would group things better) | Note it as a **Structural** finding |

3. **Read the actual code.** Don't just rely on the digest. For new features and algorithms, read the implementation to understand the design, the tradeoffs, and the behavior well enough to explain it clearly.
3. **Read the actual code.** Don't just rely on the digest. For new features and algorithms, read the implementation to understand the design well enough to explain entry points, cross-component relationships, and the constraints that shaped the approach — not to transcribe what the code does.

4. **Assess the docs structure.** Step back and consider the `docs/` tree as a whole:
- Is a single file doing too much and should be split into focused pages?
- Are there multiple small files covering related topics that would read better as one?
- Would a new subdirectory help group related docs (e.g. `docs/algorithms/`, `docs/features/`)?
- Would a new subdirectory help group related docs (e.g. `docs/features/`, `docs/guides/`)?
- Are there orphan files that nothing links to, or dead files that cover removed functionality?

Structural changes are valuable but should be made **slowly and deliberately** — at most one structural change per run, and only when the improvement is clear. Don't reorganize for the sake of reorganizing.

5. **Prioritize what to do.** You will likely find more work than you can do in one pass. That's expected — your job is to make incremental progress each week. Use this priority order:
1. **Conflicts** — docs that are actively wrong
1. **Conflict** — docs that are actively wrong
2. **Dead content** — sections referencing removed or deprecated functionality
3. **New features** — undocumented capabilities that users or developers need to know about
4. **Algorithms and design** — interesting technical approaches worth explaining for future contributors
5. **Minor gaps** — small omissions in existing docs
6. **Structural improvements** — reorganizing files, splitting, merging, adding folders
3. **Verbose content** — prose that duplicates what one or two source files already say clearly
4. **New feature** — undocumented subsystems or entry points that readers have no orientation for
5. **Cross-component gap** — relationships between components that are invisible when reading files in isolation
6. **Algorithm** — why an approach was chosen and what constraints drove it (not how it works)
7. **Minor gap** — small omissions in existing docs
8. **Structural** — reorganizing files, splitting, merging, adding folders

## Phase 2: Reason over importance

Expand All @@ -76,15 +102,18 @@ Return a structured report of what you found. Do NOT open any pull requests or c
## Docs Expert Results

### Documentation Health
- Conflicts found: N (docs that are wrong)
- Dead content found: N (references to removed things)
- Undocumented features: N
- Undocumented algorithms/design: N
- Structural issues: N (files to split, merge, or reorganize)
- Conflicts: N (docs that are wrong)
- Dead content: N (references to removed things)
- Verbose content: N (candidates to trim)
- New features: N
- Cross-component gaps: N
- Algorithm gaps: N
- Minor gaps: N
- Structural: N (files to split, merge, or reorganize)

### Findings
For each issue found:
- **Priority**: [Conflict/Dead content/New feature/Algorithm/Minor gap/Structural]
- **Priority**: [Conflict/Dead content/Verbose content/New feature/Cross-component gap/Algorithm/Minor gap/Structural]
- **Title**: <short title>
- **File(s)**: <affected doc file paths>
- **Description**: <what is wrong or missing and why it matters>
Expand Down
40 changes: 40 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,45 @@
# Changelog

## 2026-07-01 — [#1001](https://github.com/cobaltcore-dev/cortex/pull/1001)

### cortex v0.2.1 (sha-44b8aab7)

Non-breaking changes:
- Include PAYG slots in MaxSlots guard — the reservation manager now counts existing + PAYG-created slots before allocating blind-scheduler slots, preventing total slot counts from exceeding `MaxSlotsPerCommitment` ([#987](https://github.com/cobaltcore-dev/cortex/pull/987))
- Use public Limes endpoint in knowledge datasource — fixes HTTP 400 errors after Limes enforced domain name matching on `Host` headers ([#999](https://github.com/cobaltcore-dev/cortex/pull/999))

### cortex-shim v0.1.4 (sha-44b8aab7)

Includes updated image sha-44b8aab7.

### cortex-nova v0.0.78

Includes updated charts cortex v0.2.1, cortex-postgres v0.6.7.

### cortex-cinder v0.0.78

Includes updated charts cortex v0.2.1, cortex-postgres v0.6.7.

### cortex-manila v0.0.78

Includes updated charts cortex v0.2.1, cortex-postgres v0.6.7.

### cortex-crds v0.0.78

Includes updated chart cortex v0.2.1.

### cortex-ironcore v0.0.78

Includes updated chart cortex v0.2.1.

### cortex-pods v0.0.78

Includes updated chart cortex v0.2.1.

### cortex-placement-shim v0.1.4

Includes updated chart cortex-shim v0.1.4.

## 2026-06-29 — [#990](https://github.com/cobaltcore-dev/cortex/pull/990)

### cortex v0.2.0 (sha-124ec226)
Expand Down
8 changes: 4 additions & 4 deletions helm/bundles/cortex-cinder/Chart.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,23 +5,23 @@ apiVersion: v2
name: cortex-cinder
description: A Helm chart deploying Cortex for Cinder.
type: application
version: 0.0.77
version: 0.0.78
appVersion: 0.1.0
dependencies:
# from: file://../../library/cortex-postgres
- name: cortex-postgres
repository: oci://ghcr.io/cobaltcore-dev/cortex/charts
version: 0.6.6
version: 0.6.7

# from: file://../../library/cortex
- name: cortex
repository: oci://ghcr.io/cobaltcore-dev/cortex/charts
version: 0.2.0
version: 0.2.1
alias: cortex-knowledge-controllers
# from: file://../../library/cortex
- name: cortex
repository: oci://ghcr.io/cobaltcore-dev/cortex/charts
version: 0.2.0
version: 0.2.1
alias: cortex-scheduling-controllers

# Owner info adds a configmap to the kubernetes cluster with information on
Expand Down
4 changes: 2 additions & 2 deletions helm/bundles/cortex-crds/Chart.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,13 @@ apiVersion: v2
name: cortex-crds
description: A Helm chart deploying Cortex CRDs.
type: application
version: 0.0.77
version: 0.0.78
appVersion: 0.1.0
dependencies:
# from: file://../../library/cortex
- name: cortex
repository: oci://ghcr.io/cobaltcore-dev/cortex/charts
version: 0.2.0
version: 0.2.1

# Owner info adds a configmap to the kubernetes cluster with information on
# the service owner. This makes it easier to find out who to contact in case
Expand Down
4 changes: 2 additions & 2 deletions helm/bundles/cortex-ironcore/Chart.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,13 @@ apiVersion: v2
name: cortex-ironcore
description: A Helm chart deploying Cortex for IronCore.
type: application
version: 0.0.77
version: 0.0.78
appVersion: 0.1.0
dependencies:
# from: file://../../library/cortex
- name: cortex
repository: oci://ghcr.io/cobaltcore-dev/cortex/charts
version: 0.2.0
version: 0.2.1

# Owner info adds a configmap to the kubernetes cluster with information on
# the service owner. This makes it easier to find out who to contact in case
Expand Down
8 changes: 4 additions & 4 deletions helm/bundles/cortex-manila/Chart.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,23 +5,23 @@ apiVersion: v2
name: cortex-manila
description: A Helm chart deploying Cortex for Manila.
type: application
version: 0.0.77
version: 0.0.78
appVersion: 0.1.0
dependencies:
# from: file://../../library/cortex-postgres
- name: cortex-postgres
repository: oci://ghcr.io/cobaltcore-dev/cortex/charts
version: 0.6.6
version: 0.6.7

# from: file://../../library/cortex
- name: cortex
repository: oci://ghcr.io/cobaltcore-dev/cortex/charts
version: 0.2.0
version: 0.2.1
alias: cortex-knowledge-controllers
# from: file://../../library/cortex
- name: cortex
repository: oci://ghcr.io/cobaltcore-dev/cortex/charts
version: 0.2.0
version: 0.2.1
alias: cortex-scheduling-controllers

# Owner info adds a configmap to the kubernetes cluster with information on
Expand Down
8 changes: 4 additions & 4 deletions helm/bundles/cortex-nova/Chart.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,23 +5,23 @@ apiVersion: v2
name: cortex-nova
description: A Helm chart deploying Cortex for Nova.
type: application
version: 0.0.77
version: 0.0.78
appVersion: 0.1.0
dependencies:
# from: file://../../library/cortex-postgres
- name: cortex-postgres
repository: oci://ghcr.io/cobaltcore-dev/cortex/charts
version: 0.6.6
version: 0.6.7

# from: file://../../library/cortex
- name: cortex
repository: oci://ghcr.io/cobaltcore-dev/cortex/charts
version: 0.2.0
version: 0.2.1
alias: cortex-knowledge-controllers
# from: file://../../library/cortex
- name: cortex
repository: oci://ghcr.io/cobaltcore-dev/cortex/charts
version: 0.2.0
version: 0.2.1
alias: cortex-scheduling-controllers

# Owner info adds a configmap to the kubernetes cluster with information on
Expand Down
4 changes: 2 additions & 2 deletions helm/bundles/cortex-placement-shim/Chart.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,13 @@ apiVersion: v2
name: cortex-placement-shim
description: A Helm chart deploying the Cortex placement shim.
type: application
version: 0.1.3
version: 0.1.4
appVersion: 0.1.0
dependencies:
# from: file://../../library/cortex-shim
- name: cortex-shim
repository: oci://ghcr.io/cobaltcore-dev/cortex/charts
version: 0.1.3
version: 0.1.4
# Owner info adds a configmap to the kubernetes cluster with information on
# the service owner. This makes it easier to find out who to contact in case
# of issues. See: https://github.com/sapcc/helm-charts/pkgs/container/helm-charts%2Fowner-info
Expand Down
4 changes: 2 additions & 2 deletions helm/bundles/cortex-pods/Chart.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,13 @@ apiVersion: v2
name: cortex-pods
description: A Helm chart deploying Cortex for Pods.
type: application
version: 0.0.77
version: 0.0.78
appVersion: 0.1.0
dependencies:
# from: file://../../library/cortex
- name: cortex
repository: oci://ghcr.io/cobaltcore-dev/cortex/charts
version: 0.2.0
version: 0.2.1

# Owner info adds a configmap to the kubernetes cluster with information on
# the service owner. This makes it easier to find out who to contact in case
Expand Down
2 changes: 1 addition & 1 deletion helm/library/cortex-postgres/Chart.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,5 +5,5 @@ apiVersion: v2
name: cortex-postgres
description: Postgres setup for Cortex.
type: application
version: 0.6.6
version: 0.6.7
appVersion: "sha-af707446"
4 changes: 2 additions & 2 deletions helm/library/cortex-shim/Chart.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ apiVersion: v2
name: cortex-shim
description: A Helm chart to distribute cortex shims.
type: application
version: 0.1.3
appVersion: "sha-124ec226"
version: 0.1.4
appVersion: "sha-44b8aab7"
icon: "https://example.com/icon.png"
dependencies: []
4 changes: 2 additions & 2 deletions helm/library/cortex/Chart.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ apiVersion: v2
name: cortex
description: A Helm chart to distribute cortex.
type: application
version: 0.2.0
appVersion: "sha-124ec226"
version: 0.2.1
appVersion: "sha-44b8aab7"
icon: "https://example.com/icon.png"
dependencies: []
Original file line number Diff line number Diff line change
Expand Up @@ -55,8 +55,9 @@ func (api *limesAPI) Init(ctx context.Context) error {
// See: https://github.com/sapcc/limes/blob/5ea068b/docs/users/api-example.md?plain=1#L23
provider := api.keystoneClient.Client()
serviceType := "resources"
sameAsKeystone := api.keystoneClient.Availability()
url, err := api.keystoneClient.FindEndpoint(sameAsKeystone, serviceType)
// Always use the public endpoint: Limes enforces that requests arrive on its configured public
// hostname (LIMES_API_DOMAIN_NAME_V1) and rejects internal-URL requests with 400.
url, err := api.keystoneClient.FindEndpoint("public", serviceType)
if err != nil {
return err
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,29 @@ func setupLimesMockServer(handler http.HandlerFunc) (*httptest.Server, keystone.
return server, &testlibKeystone.MockKeystoneClient{Url: server.URL + "/"}
}

func TestLimesAPI_Init_UsesPublicEndpoint(t *testing.T) {
server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {}))
defer server.Close()

var gotAvailability string
k := &testlibKeystone.MockKeystoneClient{
Url: server.URL + "/",
FindEndpointOverride: func(availability, serviceType string) {
if serviceType == "resources" {
gotAvailability = availability
}
},
}

api := NewLimesAPI(datasources.Monitor{}, k, v1alpha1.LimesDatasource{}).(*limesAPI)
if err := api.Init(t.Context()); err != nil {
t.Fatalf("unexpected error: %v", err)
}
if gotAvailability != "public" {
t.Errorf("expected public availability for Limes endpoint, got %q", gotAvailability)
}
}

func TestNewLimesAPI(t *testing.T) {
mon := datasources.Monitor{}
k := &testlibKeystone.MockKeystoneClient{}
Expand Down
3 changes: 2 additions & 1 deletion internal/scheduling/reservations/commitments/client.go
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,8 @@ func (c *commitmentsClient) Init(ctx context.Context, client client.Client, conf
Microversion: "2.61",
}

// Get the limes endpoint.
// Get the limes endpoint — always use public: Limes enforces that requests arrive on its
// configured public hostname (LIMES_API_DOMAIN_NAME_V1) and rejects internal-URL requests with 400.
url = must.Return(c.provider.EndpointLocator(gophercloud.EndpointOpts{
Type: "resources",
Availability: "public",
Expand Down
Loading
Loading