From 726cbaee4a56d987313eb6077a3564e54e59b044 Mon Sep 17 00:00:00 2001
From: Jesper Kristensen <cableman@linuxdev.dk>
Date: Mon, 18 May 2026 16:13:42 +0200
Subject: [PATCH 1/2] Added docs tool development

---
 _config.yml               |    5 +
 _config_dev.yml           |    5 +
 technical/agentic_tool.md | 1305 +++++++++++++++++++++++++++++++++++++
 technical/tool.md         |  739 +++++++++++++++++++++
 4 files changed, 2054 insertions(+)
 create mode 100644 technical/agentic_tool.md
 create mode 100644 technical/tool.md
diff --git a/_config.yml b/_config.yml
index 26bc95b..8fa04bc 100644
--- a/_config.yml
+++ b/_config.yml
@@ -37,5 +37,10 @@ callouts:
 # Makes Aux links open in a new tab. Default is false
 aux_links_new_tab: true
 
+# Enable Mermaid diagrams in fenced code blocks tagged `mermaid`.
+# https://just-the-docs.com/docs/ui-components/code/#mermaid-diagram-code-blocks
+mermaid:
+  version: "11.4.1"
+
 kramdown:
   syntax_highlighter: coderay
\ No newline at end of file
diff --git a/_config_dev.yml b/_config_dev.yml
index ee5b694..130f13b 100644
--- a/_config_dev.yml
+++ b/_config_dev.yml
@@ -40,5 +40,10 @@ callouts:
 # Makes Aux links open in a new tab. Default is false
 aux_links_new_tab: true
 
+# Enable Mermaid diagrams in fenced code blocks tagged `mermaid`.
+# https://just-the-docs.com/docs/ui-components/code/#mermaid-diagram-code-blocks
+mermaid:
+  version: "11.4.1"
+
 kramdown:
   syntax_highlighter: coderay
\ No newline at end of file
diff --git a/technical/agentic_tool.md b/technical/agentic_tool.md
new file mode 100644
index 0000000..f52e441
--- /dev/null
+++ b/technical/agentic_tool.md
@@ -0,0 +1,1305 @@
+---
+title: Build agentic tool
+parent: Technical documentation
+---
+
+# Building a new agent
+
+A developer guide for creating LLM-backed microservices that live alongside
+the [AarhusAI docker](https://github.com/AarhusAI/aarhusai-docker) stack.
+The [retrieval-agent](https://github.com/AarhusAI/retrieval-agent) is the worked example throughout - every pattern
+documented here is taken from there.
+
+## Table of contents
+
+1. [What this guide is for](#1-what-this-guide-is-for)
+2. [Prerequisites](#2-prerequisites)
+3. [Glossary](#3-glossary)
+4. [Overview](#4-overview)
+5. [Recommended stack](#5-recommended-stack)
+6. [Repository layout](#6-repository-layout)
+7. [Step-by-step setup](#7-step-by-step-setup)
+8. [PydanticAI patterns](#8-pydanticai-patterns)
+9. [Production - docker compose server](#9-production---docker-compose-sever)
+10. [Production builds (multi-arch)](#10-production-builds-multi-arch)
+11. [Verification checklist](#11-verification-checklist)
+12. [Common pitfalls / FAQ](#12-common-pitfalls--faq)
+13. [Reference index](#13-reference-index)
+
+## 1. What this guide is for
+
+This guide is for a developer **new to the project** who's been asked to build an agent-based tool that plugs into the
+AarhusAI platform. If you've never seen this codebase before, start here.
+
+[AarhusAI docker](https://github.com/AarhusAI/aarhusai-docker) is the Docker orchestration around a customised fork
+of [Open WebUI](https://github.com/open-webui/open-webui) - the AI chat platform users interact with. "Agents" in this
+org are **standalone FastAPI microservices** that extend the platform with LLM-backed workflows: retrieval,
+summarisation, classification, multi-step reasoning, anything that benefits from running outside the main UI process.
+The canonical example shipped today is [retrieval-agent](https://github.com/AarhusAI/retrieval-agent) - a sibling repo
+to `AarhusAI-docker/` - every pattern in this guide is taken from it.
+
+This guide focuses on building a *new* agent and is not a tour of the existing stack.
+
+## 2. Prerequisites
+
+### Install on your machine
+
+- **Docker** with buildx (recent Docker Desktop, or `docker-buildx-plugin` on Linux).
+- **Go Task** `brew install go-task` (macOS) or `apt install go-task` (Debian/Ubuntu).
+  Commands are invoked as `task <name>`.
+- An editor of choice. Python 3.11+ locally is helpful for IDE integration, but **all Python tooling actually runs
+  inside the container** - no local venv required.
+
+### Access you'll need
+
+- A local clone of [AarhusAI docker](https://github.com/AarhusAI/aarhusai-docker) ideally as a sibling directory to
+  where you'll create your new agent.
+- A **ghcr.io Personal Access Token** with `write:packages` scope, for publishing production images.
+- A **LiteLLM virtual key** if your agent actually calls models - ask whoever owns the stack.
+- Read access to the AarhusAI GitHub org for the existing agent repos.
+
+### Knowledge level assumed
+
+- Comfortable with Python and `async`/`await`.
+- Have used Docker and docker-compose before (you don't need to be an expert).
+- Have read or skimmed FastAPI's tutorial. PydanticAI is introduced with snippets in §8 - no prior exposure needed.
+
+## 3. Glossary
+
+The terms below show up throughout the rest of the doc. Skim once and refer back as needed.
+
+| Term               | Means                                                                                                           | Where you'll see it             |
+|--------------------|-----------------------------------------------------------------------------------------------------------------|---------------------------------|
+| AarhusAi docker    | This monorepo of Docker orchestration; the "parent stack".                                                      | Throughout                      |
+| Open WebUI         | The AI chat platform (forked at AarhusAI/open-webui), the user-facing UI.                                       | Parent stack; not modified here |
+| LiteLLM            | LLM proxy at `http://litellm:4000/v1` inside the stack; agents route every LLM call through it.                 | §4, §7, §8, §9                  |
+| Qdrant             | Vector database; used by retrieval-agent for RAG.                                                               | §9, retrieval-agent             |
+| Traefik            | Reverse proxy / ingress that fronts every public-facing service.                                                | §7.4, §9                        |
+| `frontend` network | External Docker network Traefik reads from; every service that takes user traffic joins it.                     | §7.4, §9                        |
+| `app` network      | Internal Docker network where services talk to each other (LiteLLM, Qdrant, Postgres, Redis).                   | §7.4, §9                        |
+| retrieval-agent    | Canonical example agent. Repo: <https://github.com/AarhusAI/retrieval-agent>.                                   | Throughout                      |
+| PydanticAI         | LLM agent framework - bounded loops, tool calls, structured output.                                             | §5, §8                          |
+| Go Task            | Task runner (`Taskfile.yml`) used across the stack. `task <name>` typically runs commands inside the container. | §7.5                            |
+| ghcr.io            | GitHub Container Registry. Production images publish to `ghcr.io/aarhusai/<repo>`.                              | §10                             |
+
+## 4. Overview
+
+A "new agent" in this stack is a **standalone FastAPI microservice** that:
+
+- Lives in **its own git repository**, sibling to `aarhusai-docker/`.
+- Builds a **multi-arch Docker image** (`linux/amd64,linux/arm64`) pushed to **`ghcr.io/aarhusai/<repo-name>`**.
+- Runs locally from its own `docker-compose.yml` (optional), joined to the parent stack's `app` network (for
+  service-to-service calls like LiteLLM, Qdrant, Postgres, Redis) and Traefik's `frontend` network (for ingress).
+- In production, is **referenced as a pre-built image** from the parent stack's `docker-compose.yml` /
+  `docker-compose.server.yml` - the parent stack pulls the image, no source mount.
+
+```mermaid
+flowchart LR
+    user([user])
+
+    subgraph frontend["frontend network"]
+        traefik[Traefik]
+    end
+
+    subgraph app["app network"]
+        agent["my-agent<br/>(FastAPI :8000)"]
+        litellm["litellm :4000"]
+        qdrant["qdrant :6333"]
+        postgres["postgres :5432"]
+        redis["redis :6379"]
+    end
+
+    user --> traefik --> agent
+    agent --> litellm
+    agent --> qdrant
+    agent -.-> postgres
+    agent -.-> redis
+```
+
+Solid arrows are typical dependencies (almost every agent calls LiteLLM; many call Qdrant). Dashed arrows are optional -
+most agents don't need Postgres or Redis directly. The parent stack provides all four; you join the `app` network and
+pick what you actually use.
+
+**Use this template when you need:**
+
+- An LLM-backed HTTP endpoint (summarization, classification, extraction, agentic workflows).
+- Structured output via PydanticAI.
+- Tool-calling against LiteLLM-proxied models.
+- A long-lived stateful client (vector DB, blob store, cache).
+
+**Don't use it for:**
+
+- Long-running batch jobs - use a task runner / cron worker pattern instead.
+- Pure frontend changes - patch Open WebUI directly via the parent stack's patch system.
+- Anything that needs Open WebUI's session/auth context. Open WebUI integration patterns (RAG external retrieval,
+  OpenAI-compatible endpoints, tool servers) are out of scope for this guide - wire that up per agent type once the
+  service exists.
+
+## 5. Recommended stack
+
+Use this stack unless you have a specific reason not to - sticking with them means future maintainers (and Claude) can
+read your repo without context-switching.
+
+| Concern                   | Choice                                                                       | Why                                                                                                  |
+|---------------------------|------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------|
+| Web framework             | `fastapi>=0.115` + `uvicorn[standard]>=0.30`                                 | Async, OpenAPI docs for free, mature dep-injection                                                   |
+| Config                    | `pydantic-settings>=2.0`                                                     | Env-first, typed, fails fast at import time                                                          |
+| Validation                | `pydantic>=2.0`                                                              | Already a peer dep; use the same models for I/O and config                                           |
+| Agent loop / tool-calling | `pydantic-ai-slim[openai]>=0.2.0,<1.0`                                       | Bounded iteration, structured output, tool-call retry                                                |
+| Single-shot LLM calls     | `httpx>=0.27`                                                                | Talk directly to the OpenAI-compatible API; cheaper than spinning up a PydanticAI Agent for one call |
+| HTTP client               | `httpx`                                                                      | Async, used for all outbound HTTP                                                                    |
+| Tests                     | `pytest>=8` + `pytest-asyncio>=0.25` (`asyncio_mode=auto`) + `pytest-cov>=7` | Async-native, mature fixtures                                                                        |
+| Lint + format             | `ruff>=0.9`                                                                  | One tool, replaces black/isort/flake8                                                                |
+| Python                    | `3.12-slim` container, `requires-python = ">=3.11"` in `pyproject.toml`      | Container parity, allows local 3.11 for tooling                                                      |
+| Container build           | Docker buildx + QEMU                                                         | Multi-arch (`linux/amd64,linux/arm64`); devs on Apple Silicon need arm64                             |
+| Task runner               | Go Task (`Taskfile.yml`)                                                     | Matches parent stack convention                                                                      |
+
+**PydanticAI vs raw httpx - rule of thumb:** if a request hits the LLM more than once (tool-calling, retries, multi-step
+reasoning), use PydanticAI. If it's a single completion (extract → respond), use plain `httpx` against the
+OpenAI-compatible endpoint. The [retrieval-agent](https://github.com/AarhusAI/retrieval-agent) does both:
+[`app/services/agent.py`][s5-agent] for the agent loop, and [`app/services/query_generation.py`][s5-qg] for
+single-shot calls.
+
+[s5-agent]: https://github.com/AarhusAI/retrieval-agent/blob/main/app/services/agent.py
+
+[s5-qg]: https://github.com/AarhusAI/retrieval-agent/blob/main/app/services/query_generation.py
+
+## 6. Repository layout
+
+Every file below has a job. The "routes are thin, services hold logic" convention is what keeps an agent readable as it
+grows - you can scan the routes to learn the public surface, then dive into services for behaviour.
+
+The canonical sibling-repo structure, mirroring [retrieval-agent](https://github.com/AarhusAI/retrieval-agent):
+
+```text
+my-agent/
+├── app/
+│   ├── __init__.py
+│   ├── main.py            # FastAPI app, lifespan, health endpoints
+│   ├── config.py          # pydantic-settings - env-driven config
+│   ├── auth.py            # Bearer-token middleware
+│   ├── models.py          # Request/response schemas (pydantic)
+│   ├── routes/            # Route handlers - thin shells, auth + validation only
+│   │   ├── __init__.py
+│   │   └── <endpoint>.py
+│   └── services/          # Business logic - one module per concern
+│       ├── __init__.py
+│       └── <service>.py
+├── tests/
+│   ├── __init__.py
+│   ├── conftest.py        # Env setup BEFORE app imports + autouse cleanup
+│   ├── test_health.py
+│   ├── test_auth.py
+│   └── services/
+│       └── test_<service>.py
+├── Dockerfile             # Multi-stage: base → dev → prod
+├── docker-compose.yml     # Standalone dev compose
+├── Taskfile.yml           # task wrapper (build, test, lint, build:image)
+├── pyproject.toml         # deps, ruff, pytest config
+├── .env.example           # Required + optional env vars, no secrets
+├── .dockerignore          # Allowlist style
+├── .gitignore             # Standard Python + .env
+├── README.md
+└── CLAUDE.md              # Agent-specific Claude Code instructions
+```
+
+**Convention: routes are thin, services hold the logic.** A route does (1) auth via `Depends(verify_api_key)`, (2)
+request validation via the pydantic body model, (3) one call into `app/services/...`. Everything else lives in services.
+Reference: [`retrieval-agent/app/routes/search.py`](https://github.com/AarhusAI/retrieval-agent/blob/main/app/routes/search.py)
+is a 20-line shell over [`retrieval-agent/app/services/pipeline.py`](https://github.com/AarhusAI/retrieval-agent/blob/main/app/services/pipeline.py).
+
+## 7. Step-by-step setup
+
+Each subsection introduces *what the file is for* before showing the template. Replace `my-agent` with your repo name
+throughout.
+
+### 7.1 Bootstrap the repo
+
+This is the housekeeping that keeps secrets and build artefacts out of git and the Docker build context.
+
+```bash
+mkdir my-agent && cd my-agent
+git init
+```
+
+Once `app/` and `tests/` exist (after §7.6 and §7.10), make sure each Python package directory has an empty
+`__init__.py`: `app/`, `app/routes/`, `app/services/`, `tests/`, `tests/services/`. Easy to forget - imports start
+failing silently otherwise.
+
+Create `.gitignore`:
+
+```gitignore
+__pycache__/
+*.py[cod]
+*.egg-info/
+.pytest_cache/
+.ruff_cache/
+.coverage
+htmlcov/
+.env
+.env.local
+.venv/
+```
+
+Create `.dockerignore` (allowlist style - only what the Dockerfile needs):
+
+```dockerfile
+# Exclude everything by default (allowlist approach)
+*
+
+# Allow only what the Dockerfile needs
+!pyproject.toml
+!app/
+
+# Deny patterns inside allowed dirs
+app/**/__pycache__
+app/**/*.pyc
+app/**/*.pyo
+```
+
+Reference: [`retrieval-agent/.dockerignore`](https://github.com/AarhusAI/retrieval-agent/blob/main/.dockerignore).
+
+### 7.2 `pyproject.toml`
+
+This is the single source of truth for dependencies, the lint/format config, and the test runner config. Container
+builds install from this file, so changes here mean rebuilding the container (see §12).
+
+Drop in this template, change `name`/`description`, then add agent-specific deps:
+
+```toml
+[project]
+name = "my-agent"
+version = "0.1.0"
+description = "Short description of what this agent does"
+requires-python = ">=3.11"
+dependencies = [
+    "fastapi>=0.115.0",
+    "uvicorn[standard]>=0.30.0",
+    "pydantic>=2.0",
+    "pydantic-settings>=2.0",
+    "httpx>=0.27.0",
+    "pydantic-ai-slim[openai]>=0.2.0,<1.0",
+]
+
+[project.optional-dependencies]
+dev = [
+    "ruff>=0.9.0",
+    "pytest>=8.0",
+    "pytest-asyncio>=0.25.0",
+    "pytest-cov>=7.0",
+]
+
+[build-system]
+requires = ["setuptools>=68.0"]
+build-backend = "setuptools.build_meta"
+
+[tool.ruff]
+target-version = "py311"
+line-length = 99
+exclude = ["build"]
+
+[tool.ruff.lint]
+select = ["E", "W", "F", "I", "UP", "B", "SIM", "RUF"]
+ignore = ["B008"]  # Allow Depends() in function defaults (FastAPI pattern)
+
+[tool.ruff.lint.isort]
+known-first-party = ["app"]
+
+[tool.pytest.ini_options]
+asyncio_mode = "auto"
+testpaths = ["tests"]
+```
+
+Reference: [`retrieval-agent/pyproject.toml`](https://github.com/AarhusAI/retrieval-agent/blob/main/pyproject.toml).
+
+### 7.3 `Dockerfile`
+
+Multi-stage build: `base` (shared deps) → `dev` (adds lint/test tools - the running container doubles as your test
+environment) → `prod` (slimmer image you ship). Both runnable targets run as non-root and expose 8000.
+
+```dockerfile
+FROM python:3.12-slim AS base
+
+WORKDIR /app
+
+RUN apt-get update \
+ && apt-get install -y --no-install-recommends curl \
+ && rm -rf /var/lib/apt/lists/*
+
+COPY pyproject.toml .
+
+# --- Dev target: includes test/lint tools ---
+FROM base AS dev
+RUN pip install --no-cache-dir ".[dev]"
+COPY app/ app/
+RUN adduser --system --no-create-home appuser
+USER appuser
+
+EXPOSE 8000
+HEALTHCHECK CMD curl -f http://localhost:8000/health || exit 1
+CMD ["python", "-m", "uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]
+
+# --- Prod target: runtime deps only ---
+FROM base AS prod
+RUN pip install --no-cache-dir .
+COPY app/ app/
+RUN adduser --system --no-create-home appuser
+USER appuser
+
+EXPOSE 8000
+HEALTHCHECK CMD curl -f http://localhost:8000/health || exit 1
+CMD ["python", "-m", "uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]
+```
+
+**If your agent needs an on-disk model cache** (HuggingFace, fastembed, etc.), add these lines to *both* `dev` and
+`prod` after `COPY app/ app/`, and mount a named volume to `/cache` in `docker-compose.yml`:
+
+```dockerfile
+RUN adduser --system --no-create-home appuser \
+ && mkdir -p /cache/hf /cache/fastembed \
+ && chown -R appuser /cache
+USER appuser
+```
+
+Docker's named-volume first-mount copies the directory's ownership into the volume, so creating `/cache` as `appuser` is
+what lets the non-root process write to the mounted volume. Reference:
+[`retrieval-agent/Dockerfile`](https://github.com/AarhusAI/retrieval-agent/blob/main/Dockerfile) (which uses this
+pattern for the BM42 sparse model cache).
+
+### 7.4 Docker compose
+
+In development (local), the agent is mounted and built from source.
+The [retrieval-agent](https://github.com/AarhusAI/retrieval-agent)'s existing entry (around
+`aarhusai-docker/docker-compose.yml:517-560`) is the template.
+
+Add this block to `docker-compose.yml`:
+
+```yaml
+services:
+  my-agent:
+    build:
+      dockerfile: Dockerfile
+      context: ./my-agent
+      target: ${TARGET:-dev}
+    command: [ "python", "-m", "uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000", "--reload" ]
+    networks:
+      - app
+      - frontend
+    extra_hosts:
+      - "host.docker.internal:host-gateway"
+    ports:
+      - "8000"
+    healthcheck:
+      test: [ "CMD", "curl", "-f", "http://localhost:8000/health" ]
+      interval: 30s
+      timeout: 10s
+      retries: 3
+      start_period: 10s
+    environment:
+      API_KEY: ${API_KEY:-CHANGE_ME_NOW}
+      AGENT_MODEL: ${AGENT_MODEL:-gpt-4o-mini}
+      AGENT_API_BASE_URL: ${AGENT_API_BASE_URL:-http://litellm:4000/v1}
+      AGENT_API_KEY: ${AGENT_API_KEY:-}
+      DEBUG: ${DEBUG:-false}
+    volumes:
+      - ./my-agent:/app
+    labels:
+      - "traefik.enable=true"
+      - "traefik.docker.network=frontend"
+      - "traefik.http.routers.${COMPOSE_PROJECT_NAME}.rule=Host(`${COMPOSE_DOMAIN}`)"
+      - "traefik.http.routers.${COMPOSE_PROJECT_NAME}.middlewares=redirect-to-https"
+      - "traefik.http.middlewares.redirect-to-https.redirectscheme.scheme=https"
+```
+
+Reference:
+[`retrieval-agent/docker-compose.yml`](https://github.com/AarhusAI/retrieval-agent/blob/main/docker-compose.yml).
+
+### 7.5 `Taskfile.yml`
+
+Go Task is the dev-experience layer over Docker. `task up`, `task test`, `task lint` etc. all wrap
+`docker compose exec my-agent …` under the hood, so every Python command runs *inside* the running container - no local
+venv needed. If `task` isn't on your PATH yet, see §2.
+
+```yaml
+# https://taskfile.dev
+version: "3"
+
+dotenv: [ ".env.local", ".env" ]
+
+vars:
+  CONTAINER_RUNTIME: '{{.CONTAINER_RUNTIME | default "docker"}}'
+  DOCKER_COMPOSE: '{{.CONTAINER_COMPOSE | default "docker compose"}}'
+  SERVICE: my-agent
+  PYTHON: "{{.DOCKER_COMPOSE}} exec {{.SERVICE}}"
+
+tasks:
+  default:
+    desc: Show available tasks
+    cmds:
+      - task --list
+
+  # --- Code quality ---
+  lint:
+    desc: Run all linters
+    cmds:
+      - task: lint:check
+      - task: lint:format:check
+
+  lint:check:
+    desc: Check code with ruff
+    cmds:
+      - "{{.PYTHON}} ruff check ."
+
+  lint:fix:
+    desc: Fix code issues with ruff
+    cmds:
+      - "{{.PYTHON}} ruff check --fix ."
+
+  lint:format:
+    desc: Format code with ruff
+    cmds:
+      - "{{.PYTHON}} ruff format ."
+
+  lint:format:check:
+    desc: Check code formatting with ruff
+    cmds:
+      - "{{.PYTHON}} ruff format --check ."
+
+  # --- Testing ---
+  test:
+    desc: Run all tests
+    cmds:
+      - "{{.PYTHON}} pytest -v"
+
+  test:coverage:
+    desc: Run tests with coverage report
+    cmds:
+      - "{{.PYTHON}} pytest --cov=app --cov-report=term-missing -v"
+
+  # --- CI ---
+  ci:
+    desc: Run all CI checks (lint + test)
+    cmds:
+      - task: lint
+      - task: test
+
+  # --- Build & push production image (multi-arch) ---
+  build:image:
+    desc: "Build and push production image to ghcr.io (multi-arch). Override PLATFORMS to build one arch (e.g. PLATFORMS=linux/amd64) for faster local builds."
+    vars:
+      IMAGE: ghcr.io/aarhusai/my-agent
+      TAG: '{{.TAG | default "latest"}}'
+      PLATFORMS: '{{.PLATFORMS | default "linux/amd64,linux/arm64"}}'
+    cmds:
+      - task: build:image:builder
+      - docker buildx build --builder my-agent-builder --platform {{.PLATFORMS}} --target prod -t {{.IMAGE}}:{{.TAG}} --push .
+
+  build:image:builder:
+    desc: "Ensure the buildx builder + QEMU binfmt handlers exist (idempotent first-run setup)."
+    internal: true
+    silent: true
+    cmds:
+      - cmd: |
+          if ! docker buildx inspect my-agent-builder >/dev/null 2>&1; then
+            echo "First-run setup: registering QEMU binfmt handlers (cross-arch emulation)..."
+            docker run --privileged --rm tonistiigi/binfmt --install all
+            echo "Creating buildx builder 'my-agent-builder' (docker-container driver)..."
+            docker buildx create --name my-agent-builder --driver docker-container --bootstrap
+          fi
+```
+
+Reference: [`retrieval-agent/Taskfile.yml`](https://github.com/AarhusAI/retrieval-agent/blob/main/Taskfile.yml).
+
+### 7.6 `app/main.py`
+
+The entry point: FastAPI instance, a `lifespan` context manager for client setup/teardown, and two health endpoints.
+Liveness (`/health`) always returns 200 if the process is up. Readiness (`/health/ready`) actually probes downstreams -
+that's what the platform's orchestrator uses to decide if your service is ready for traffic.
+
+```python
+import logging
+from contextlib import asynccontextmanager
+
+from fastapi import FastAPI
+from fastapi.responses import JSONResponse
+
+from app.config import settings
+
+# from app.routes.<your_route> import router as <your_router>
+# from app.services import <external_clients>
+
+logging.basicConfig(
+    level=logging.INFO,
+    format="%(asctime)s [%(levelname)s] %(name)s: %(message)s",
+)
+if settings.debug:
+    logging.getLogger("app").setLevel(logging.DEBUG)
+log = logging.getLogger(__name__)
+
+
+@asynccontextmanager
+async def lifespan(app: FastAPI):
+    log.info("Starting %s", app.title)
+    log.info("Agent model: %s (%s)", settings.agent_model, settings.agent_api_base_url)
+
+    # Eagerly initialize long-lived clients here so first-request latency
+    # doesn't include connection setup. Example:
+    # await some_service.preload()
+
+    yield
+
+    # Shutdown - close every long-lived client.
+    # await some_service.close()
+    log.info("%s shut down", app.title)
+
+
+app = FastAPI(
+    title="My Agent",
+    description="What this agent does",
+    version="0.1.0",
+    lifespan=lifespan,
+)
+
+
+# app.include_router(<your_router>)
+
+
+@app.get("/health")
+async def health():
+    return {"status": "ok"}
+
+
+@app.get("/health/ready")
+async def health_ready():
+    """Readiness probe - verifies downstream connectivity."""
+    try:
+        # Probe each downstream you depend on. Example:
+        # await some_service.ping()
+        return {"status": "ok"}
+    except Exception as exc:
+        log.warning("Readiness check failed: %s", exc)
+        return JSONResponse(
+            status_code=503,
+            content={"status": "error", "detail": str(exc)},
+        )
+```
+
+Reference: [`retrieval-agent/app/main.py`](https://github.com/AarhusAI/retrieval-agent/blob/main/app/main.py).
+
+**Why eager init in lifespan, not lazy in services?** The first request shouldn't pay startup cost (cold model load, DNS
+resolution, vector-store capability probes). Lifespan runs once per process; failures here surface as clear startup
+errors instead of a confusing 500 on the first request.
+
+### 7.7 `app/config.py`
+
+All runtime config comes from environment variables - no `config.yaml`, no command-line flags. `pydantic-settings` turns
+env vars into a typed object you import everywhere. Define every setting with a type and (where reasonable) a default,
+and **instantiate `settings` at module level** so `Settings()` runs at import time and crashes fast on missing required
+vars.
+
+```python
+from pydantic_settings import BaseSettings
+
+
+class Settings(BaseSettings):
+    model_config = {"env_file": ".env", "env_file_encoding": "utf-8", "extra": "ignore"}
+
+    # --- Auth ---
+    api_key: str  # required - no default
+
+    # --- LLM (OpenAI-compatible API, typically LiteLLM proxy) ---
+    agent_model: str = "gpt-4o-mini"
+    agent_api_base_url: str = "http://litellm:4000/v1"
+    agent_api_key: str = ""
+    agent_timeout: int = 60
+
+    # --- Debug ---
+    debug: bool = False
+
+    # --- Server ---
+    host: str = "0.0.0.0"
+    port: int = 8000
+
+
+settings = Settings()
+```
+
+Things to know:
+
+- `extra: "ignore"` lets the same `.env` carry vars for other tools without choking validation.
+- A field without a default is **required** - missing it crashes at import time, which is what you want.
+- Document any **cross-system contracts** with an inline comment (e.g., "must match the value the X service was
+  configured with"). The [retrieval-agent](https://github.com/AarhusAI/retrieval-agent)'s `embedding_prefix_query` is
+  the canonical example.
+
+Reference: [`retrieval-agent/app/config.py`](https://github.com/AarhusAI/retrieval-agent/blob/main/app/config.py).
+
+### 7.8 `app/auth.py`
+
+Bearer-token auth - whoever calls your agent (Open WebUI, another service, a curl command) must include the key in an
+`Authorization: Bearer <key>` header. FastAPI's `HTTPBearer` extracts the credential; `hmac.compare_digest` does a
+constant-time compare to prevent timing attacks against the key. The dependency returns the validated key so routes can
+declare `Depends(verify_api_key)` to gate themselves.
+
+```python
+import hmac
+
+from fastapi import Depends, HTTPException, status
+from fastapi.security import HTTPAuthorizationCredentials, HTTPBearer
+
+from app.config import settings
+
+_bearer = HTTPBearer()
+
+
+async def verify_api_key(
+        credentials: HTTPAuthorizationCredentials = Depends(_bearer),
+) -> str:
+    if not hmac.compare_digest(credentials.credentials, settings.api_key):
+        raise HTTPException(
+            status_code=status.HTTP_401_UNAUTHORIZED,
+            detail="Invalid API key",
+        )
+    return credentials.credentials
+```
+
+Reference: [`retrieval-agent/app/auth.py`](https://github.com/AarhusAI/retrieval-agent/blob/main/app/auth.py).
+Usage on a route:
+
+```python
+@router.post("/do-thing")
+async def do_thing(req: MyRequest, _api_key: str = Depends(verify_api_key)) -> MyResponse:
+    return await my_service.handle(req)
+```
+
+### 7.9 Routes and services
+
+This is where your agent's actual behaviour lives. **Routes do auth + request validation only; services hold the work
+** - every agent in this org follows that split, so sticking to it keeps your code familiar to anyone reading it later.
+A minimal "echo with LLM completion" example:
+
+`app/models.py`:
+
+```python
+from pydantic import BaseModel, Field
+
+
+class EchoRequest(BaseModel):
+    text: str = Field(..., min_length=1, max_length=10_000)
+
+
+class EchoResponse(BaseModel):
+    original: str
+    rewritten: str
+```
+
+`app/routes/echo.py`:
+
+```python
+import logging
+
+from fastapi import APIRouter, Depends
+
+from app.auth import verify_api_key
+from app.models import EchoRequest, EchoResponse
+from app.services.rewrite import rewrite
+
+log = logging.getLogger(__name__)
+router = APIRouter()
+
+
+@router.post("/echo", response_model=EchoResponse)
+async def echo(req: EchoRequest, _api_key: str = Depends(verify_api_key)) -> EchoResponse:
+    log.info("Echo request: %d chars", len(req.text))
+    rewritten = await rewrite(req.text)
+    return EchoResponse(original=req.text, rewritten=rewritten)
+```
+
+`app/services/rewrite.py` - single-shot LLM call via `httpx` (no PydanticAI needed for one round-trip):
+
+```python
+import httpx
+
+from app.config import settings
+
+_client: httpx.AsyncClient | None = None
+
+
+def _get_client() -> httpx.AsyncClient:
+    global _client
+    if _client is None:
+        _client = httpx.AsyncClient(
+            headers={"Authorization": f"Bearer {settings.agent_api_key}"},
+            timeout=settings.agent_timeout,
+        )
+    return _client
+
+
+async def close_client() -> None:
+    global _client
+    if _client is not None:
+        await _client.aclose()
+        _client = None
+
+
+async def rewrite(text: str) -> str:
+    client = _get_client()
+    resp = await client.post(
+        f"{settings.agent_api_base_url}/chat/completions",
+        json={
+            "model": settings.agent_model,
+            "messages": [
+                {"role": "system", "content": "Rewrite the user message more concisely."},
+                {"role": "user", "content": text},
+            ],
+        },
+    )
+    resp.raise_for_status()
+    return resp.json()["choices"][0]["message"]["content"]
+```
+
+> **Don't use `httpx.AsyncClient(base_url=…)` here with a leading-slash path.** httpx follows RFC 3986 URL joining:
+`AsyncClient(base_url="http://litellm:4000/v1").post("/chat/completions")` resolves to
+`http://litellm:4000/chat/completions` - the `/v1` is dropped, LiteLLM returns 404. Either pass the full URL (as above)
+> or use a trailing slash on `base_url` *and* no leading slash on the path.
+
+Don't forget to wire the router in `app/main.py`:
+
+```python
+from app.routes.echo import router as echo_router
+
+# inside the file:
+app.include_router(echo_router)
+```
+
+And close the client on shutdown in `lifespan`:
+
+```python
+from app.services import rewrite as rewrite_service
+
+# inside lifespan after `yield`:
+await rewrite_service.close_client()
+```
+
+**Pattern: module-level client + `close_client()`.** Long-lived async clients are cached at module level (lazy init via
+`_get_client()`). Lifespan calls `close_client()` on shutdown. Tests reset `_client = None` between tests (see §7.10).
+
+For multi-step agentic flows, swap `rewrite.py` for a PydanticAI Agent - see §8.
+
+### 7.10 `tests/conftest.py`
+
+Tests run inside the same container as the app, but need to override env vars to point at fake downstreams. **The single
+most important rule:** set env vars *before any `app` import*. `Settings()` runs at import time, so by the time
+`from app.main import app` finishes, every config decision is baked in - late-set env vars are silently ignored. See §12
+if you hit this.
+
+```python
+import os
+
+# Override env vars BEFORE any app imports (Settings() runs at import time)
+os.environ["API_KEY"] = "test-api-key"
+os.environ["AGENT_API_BASE_URL"] = "http://fake-agent:4000/v1"
+os.environ["AGENT_API_KEY"] = "fake-agent-key"
+
+import pytest
+from httpx import ASGITransport, AsyncClient
+
+from app.config import settings
+from app.main import app
+from app.services import rewrite
+
+# Belt-and-braces: also force-overwrite attributes in case the container's
+# compose env disagrees with what tests expect.
+settings.api_key = "test-api-key"
+
+
+@pytest.fixture
+def api_headers():
+    return {"Authorization": "Bearer test-api-key"}
+
+
+@pytest.fixture
+async def client():
+    transport = ASGITransport(app=app)
+    async with AsyncClient(transport=transport, base_url="http://test") as c:
+        yield c
+
+
+@pytest.fixture(autouse=True)
+def reset_clients():
+    yield
+    # Reset module-level clients between tests so a mock in one test doesn't
+    # leak into the next. Add every service that caches a client.
+    rewrite._client = None
+```
+
+Reference: [`retrieval-agent/tests/conftest.py`](https://github.com/AarhusAI/retrieval-agent/blob/main/tests/conftest.py).
+
+Mock external services (the LLM endpoint, vector DBs, …) at the call boundary - typically by patching the module-level
+client. `respx` is a useful drop-in for mocking `httpx` calls.
+
+### 7.11 `.env.example`
+
+Human-readable documentation of your agent's runtime knobs. Real secrets go in `.env` (gitignored); `.env.example` is
+committed and shows newcomers what they need to set. Document every variable, split required vs optional, never commit a
+real `.env`:
+
+```bash
+# --- Required ---
+API_KEY=change-me
+
+# --- LLM (OpenAI-compatible, typically LiteLLM proxy) ---
+AGENT_MODEL=gpt-4o-mini
+AGENT_API_BASE_URL=http://litellm:4000/v1
+AGENT_API_KEY=
+AGENT_TIMEOUT=60
+
+# --- Local dev plumbing ---
+COMPOSE_PROJECT_NAME=my-agent
+COMPOSE_DOMAIN=my-agent.local.itkdev.dk
+DEBUG=false
+```
+
+Generate the production `API_KEY` with:
+
+```bash
+python -c "import secrets; print(secrets.token_urlsafe(32))"
+```
+
+### 7.12 `README.md`
+
+`README.md the public-facing API doc. Cover:
+
+- One-paragraph what-it-does.
+- Quick start (`task setup`, env vars to set).
+- Endpoint list: method, path, request body, response, auth header.
+- `curl` examples for each endpoint.
+- Full env var reference (with defaults).
+
+## 8. PydanticAI patterns
+
+**Skip this section on your first build if you only need a single LLM call per request** - the §7.9 `httpx` example is
+enough. Come back here when you're adding tool-calling, multi-step reasoning, or structured output that justifies a real
+agent loop.
+
+PydanticAI is the recommended framework for any agent that hits the LLM more than once per request (tool-calling,
+retries, multi-step). For single-shot calls, stay on `httpx` (§7.9).
+
+The [retrieval-agent](https://github.com/AarhusAI/retrieval-agent)'s
+[`app/services/agent.py`](https://github.com/AarhusAI/retrieval-agent/blob/main/app/services/agent.py) is the
+canonical worked example for the patterns below.
+
+### 8.1 Single-shot structured output
+
+```python
+from pydantic import BaseModel
+from pydantic_ai import Agent
+from pydantic_ai.models.openai import OpenAIModel
+from pydantic_ai.providers.openai import OpenAIProvider
+
+from app.config import settings
+
+
+class Classification(BaseModel):
+    label: str
+    confidence: float
+
+
+_agent: Agent[None, Classification] | None = None
+
+
+def _get_agent() -> Agent[None, Classification]:
+    global _agent
+    if _agent is None:
+        model = OpenAIModel(
+            settings.agent_model,
+            provider=OpenAIProvider(
+                base_url=settings.agent_api_base_url,
+                api_key=settings.agent_api_key,
+            ),
+        )
+        _agent = Agent(model, output_type=Classification, system_prompt="Classify the input.")
+    return _agent
+
+
+async def classify(text: str) -> Classification:
+    result = await _get_agent().run(text)
+    return result.output
+```
+
+### 8.2 Tool-calling loop with per-request state
+
+Use a frozen dataclass for per-request dependencies and tool-side state. Tools mutate the deps to accumulate
+side-channel results without dumping the full payload back into the agent's context.
+
+```python
+from dataclasses import dataclass, field
+from typing import Any
+
+from pydantic_ai import Agent, RunContext
+from pydantic_ai.models.openai import OpenAIModel
+from pydantic_ai.providers.openai import OpenAIProvider
+
+from app.config import settings
+
+
+@dataclass
+class AgentDeps:
+    user_id: str
+    full_results: list[dict[str, Any]] = field(default_factory=list)
+
+
+model = OpenAIModel(
+    settings.agent_model,
+    provider=OpenAIProvider(
+        base_url=settings.agent_api_base_url,
+        api_key=settings.agent_api_key,
+    ),
+)
+
+agent = Agent(
+    model,
+    deps_type=AgentDeps,
+    system_prompt="Use the lookup tool to answer the user's question.",
+)
+
+
+@agent.tool
+async def lookup(ctx: RunContext[AgentDeps], query: str) -> list[dict]:
+    docs = await my_search(query)  # full results
+    ctx.deps.full_results.extend(docs)  # side channel - full payload
+    return [{"id": d["id"], "preview": d["text"][:200]} for d in docs]  # truncated for LLM
+
+
+async def handle(user_id: str, question: str) -> list[dict]:
+    deps = AgentDeps(user_id=user_id)
+    await agent.run(question, deps=deps)
+    return deps.full_results  # return the full payload, NOT the agent's reply
+```
+
+**Why side-channel results?** Token budgets. The tool returns the minimal preview the LLM needs to grade relevance; the
+full doc bodies / metadata never enter the conversation. The
+[retrieval-agent](https://github.com/AarhusAI/retrieval-agent)'s `AGENT_TOOL_PREVIEW_CHARS` (default 200)
+caps the per-doc preview, and `AGENT_PREVIEW_K` caps how many previews per tool call. The final API response is built
+from `deps.full_results`, not from anything the model saw.
+
+### 8.3 Wall-clock timeout with partial-result return
+
+The agent loop has two budgets: `AGENT_MAX_ITERATIONS` (logical iterations) and a wall-clock timeout. On timeout, return
+whatever the tools already wrote to deps.
+
+```python
+import asyncio
+from pydantic_ai.exceptions import UnexpectedModelBehavior
+
+
+async def handle(user_id: str, question: str) -> list[dict]:
+    deps = AgentDeps(user_id=user_id)
+    try:
+        await asyncio.wait_for(
+            agent.run(question, deps=deps),
+            timeout=settings.agent_timeout,
+        )
+    except (asyncio.TimeoutError, UnexpectedModelBehavior) as exc:
+        log.warning("Agent run cut short: %s", exc)
+    return deps.full_results
+```
+
+### 8.4 Strict-tools toggle
+
+Some models (Mistral, older Llamas) don't support OpenAI's strict tool schemas. Drive strict-mode from an env flag so
+you can flip it without code changes:
+
+```python
+# app/config.py
+agent_strict_tools: bool = True
+
+
+# app/services/<your_agent>.py
+@agent.tool(strict=settings.agent_strict_tools)
+async def lookup(ctx: RunContext[AgentDeps], query: str) -> list[dict]:
+    ...
+```
+
+Set `AGENT_STRICT_TOOLS=false` in `.env` for models that need it. Apply the same `strict=settings.agent_strict_tools` to
+every tool the agent owns.
+
+### 8.5 Fallback parser for off-protocol output
+
+When the model emits tool calls as plain text or vendor-specific syntax (e.g. Mistral's `[TOOL_CALLS]`) instead of an
+OpenAI tool call, parse it manually. Reference: `_parse_fallback_queries()` in
+[`retrieval-agent/app/services/agent.py`](https://github.com/AarhusAI/retrieval-agent/blob/main/app/services/agent.py).
+Pattern: try JSON, then regex for vendor syntax, then bail; wrap downstream calls in try/except so embedding/DB failures
+inside the fallback return empty results, not 500s.
+
+### 8.6 Pointing at LiteLLM
+
+From inside a container on the `app` network: `AGENT_API_BASE_URL=http://litellm:4000/v1`. From outside (e.g. `task` on
+the host, or a non-stack environment): `https://litellm.itkdev.dk/v1`.
+
+## 9. Production - docker compose sever
+
+In production, the agent is **not built from source**.
+The [retrieval-agent](https://github.com/AarhusAI/retrieval-agent)'s existing entry (around
+`aarhusai-docker/docker-compose.server.yml:517-560`) is the template. Add this block to `docker-compose.server.yml`:
+
+```yaml
+  my-agent:
+    image: ghcr.io/aarhusai/my-agent:${MY_AGENT_VERSION:-latest}
+    restart: unless-stopped   # production only - omit in docker-compose.yml
+    command:
+      - "python"
+      - "-m"
+      - "uvicorn"
+      - "app.main:app"
+      - "--host"
+      - "0.0.0.0"
+      - "--port"
+      - "8000"
+    networks:
+      - app
+      - frontend
+    ports:
+      - "8000"
+    environment:
+      API_KEY: ${MY_AGENT_API_KEY:?}
+      AGENT_MODEL: ${MY_AGENT_MODEL:-gpt-4o-mini}
+      AGENT_API_BASE_URL: ${MY_AGENT_API_BASE_URL:-http://litellm:4000/v1}
+      AGENT_API_KEY: ${MY_AGENT_API_KEY_LLM:?}
+      DEBUG: ${MY_AGENT_DEBUG:-false}
+    healthcheck:
+      test: [ "CMD", "curl", "-f", "http://localhost:8000/health" ]
+      interval: 30s
+      timeout: 10s
+      retries: 3
+      start_period: 10s
+    depends_on:
+      # Only declare deps for services your agent actually needs.
+      litellm:
+        condition: service_started
+    labels:
+      - "traefik.enable=true"
+      - "traefik.docker.network=frontend"
+      - "traefik.http.routers.my-agent.rule=Host(`my-agent.${COMPOSE_SERVER_DOMAIN}`)"
+      - "traefik.http.routers.my-agent.middlewares=redirect-to-https"
+      - "traefik.http.middlewares.redirect-to-https.redirectscheme.scheme=https"
+```
+
+Then add the env vars to the parent stack's `.env` (or `.env.default` if they have sensible defaults):
+
+```bash
+MY_AGENT_VERSION=v0.1.0
+MY_AGENT_API_KEY=<secrets.token_urlsafe(32)>
+MY_AGENT_API_KEY_LLM=<LiteLLM virtual key>
+```
+
+Things to know:
+
+- Sanity-check the diff with `docker compose -f docker-compose.server.yml config` before merging.
+
+## 10. Production builds (multi-arch)
+
+You'll push to ghcr.io, which requires a Personal Access Token (one-time setup) - see §2 Prerequisites for what scope it
+needs. There is no GitHub Actions pipeline in this org; releases are triggered by hand via `task build:image`.
+
+The agent is built and pushed to **`ghcr.io/aarhusai/<repo-name>`** for both `linux/amd64` (production servers) and
+`linux/arm64` (Apple Silicon devs).
+
+### 10.1 One-time setup
+
+You need a ghcr.io Personal Access Token with `write:packages`:
+
+```bash
+echo "$GHCR_TOKEN" | docker login ghcr.io -u <github-username> --password-stdin
+```
+
+The buildx builder + QEMU binfmt handlers are bootstrapped automatically on the first `task build:image` (see the
+`build:image:builder` task in §7.5).
+
+### 10.2 Build & push
+
+```bash
+# Tagged release (recommended)
+task build:image TAG=v0.1.0
+
+# Latest only
+task build:image
+
+# Faster local build - single arch, useful for iterating
+task build:image TAG=v0.1.0-dev PLATFORMS=linux/amd64
+```
+
+Important details:
+
+- **`--target prod`** - never push the dev image. Prod target has no test/lint tools and a smaller surface.
+- **`--push`** - multi-arch manifests can only be pushed; Docker's local image store doesn't hold them.
+  `task build:image` hardcodes `--push`. To test a built image locally *without* pushing, bypass the task and run
+  `docker buildx build --target prod --platform linux/amd64 --load -t my-agent:test .` directly (single-arch only -
+  `--load` doesn't work with multi-arch).
+- **Two tags per release:** push both `vX.Y.Z` and `latest`. The Taskfile only pushes one tag at a time; run it twice:
+
+  ```bash
+  task build:image TAG=v0.1.0
+  task build:image TAG=latest
+  ```
+
+## 11. Verification checklist
+
+Walk through this before declaring your new agent done.
+
+- [ ] **Skeleton builds.** `task build` succeeds from a clean checkout.
+- [ ] **Service starts.** `task up` brings the container healthy (`docker compose ps` shows `(healthy)`).
+- [ ] **Health endpoints respond.** From inside the container itself - `task shell`, then:
+
+  ```bash
+  curl http://localhost:8000/health
+  curl http://localhost:8000/health/ready
+  ```
+
+  Or via the public Traefik URL from your host: `curl -k https://${COMPOSE_DOMAIN}/health` (uses your `.env`).
+- [ ] **Auth works.** From `task shell`, against the `/echo` example endpoint:
+
+  ```bash
+  # Wrong key → 401
+  curl -H "Authorization: Bearer wrong" -H "Content-Type: application/json" \
+       -d '{"text":"hi"}' http://localhost:8000/echo
+  # Correct key → 200
+  curl -H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json" \
+       -d '{"text":"hi"}' http://localhost:8000/echo
+  ```
+
+- [ ] **`task ci` is green.** Lint + format check + tests pass.
+- [ ] **`task test:coverage`** reports coverage and prints the missing-lines table.
+- [ ] **Single-arch build works.** `task build:image TAG=v0.0.1-test PLATFORMS=linux/amd64` - verify against a registry
+  you control, then untag.
+- [ ] **Multi-arch build works.** `task build:image TAG=v0.0.1-test` produces a manifest list. Inspect with
+  `docker buildx imagetools inspect ghcr.io/aarhusai/my-agent:v0.0.1-test` - should show both `linux/amd64` and
+  `linux/arm64`.
+- [ ] **Parent compose parses.** Add the service block from §9b to a working copy of the parent compose; run
+  `docker compose -f docker-compose.yml -f docker-compose.server.yml config` - exits 0 and emits the resolved YAML.
+- [ ] **README documents the public API.** Endpoints, auth header, request/response examples, env var reference. Use the
+  [retrieval-agent's `README.md`](https://github.com/AarhusAI/retrieval-agent/blob/main/README.md) as a model.
+
+## 12. Common pitfalls / FAQ
+
+The traps below catch nearly every newcomer at least once. Skim now; come back when something breaks.
+
+**1. `task up` fails with "frontend network does not exist."**
+The parent `openwebui-docker` stack (which owns Traefik) isn't running, or you haven't created the network yourself.
+Either start the parent stack first, or `docker network create frontend` if you're running standalone.
+
+**2. `task: command not found`.**
+Go Task isn't installed - and it's *not* GNU `make`. Install it: `brew install go-task` (macOS) or
+`apt install go-task` (Debian/Ubuntu). See §2.
+
+**3. Env var changes aren't picked up.**
+Task loads `.env.local` then `.env` via the `dotenv:` directive at the top of `Taskfile.yml`. After editing `.env`,
+recreate the container (`task restart`) - `up -d` alone won't push new env into a running container.
+
+**4. Hot reload doesn't pick up file changes.**
+Check that `docker-compose.yml` has *both* `volumes: ./:/app` *and* `--reload` on the uvicorn command. Editing files
+outside `./` won't propagate. On macOS, large `node_modules`-style directories also slow the file-event delivery - keep
+them out of the mounted tree.
+
+**5. Tests can't see my env overrides - `Settings()` still has the container env.**
+`Settings()` runs at import time. `os.environ[...] = ...` must happen *before* `from app.main import app`. The
+`tests/conftest.py` template (§7.10) puts env writes at the top of the file for this reason.
+
+**6. `task build:image` fails: `denied: permission_denied` to ghcr.io.**
+You haven't logged in. `echo $GHCR_TOKEN | docker login ghcr.io -u <github-username> --password-stdin`. The PAT needs
+`write:packages` scope.
+
+**7. `task build:image` fails: "multiple platforms feature is currently not supported for docker driver."**
+The buildx builder didn't get created (or wasn't selected). Run `task build:image:builder` directly, or delete the
+existing builder and retry: `docker buildx rm my-agent-builder`.
+
+**8. On Apple Silicon, the multi-arch build is painfully slow.**
+The `linux/amd64` half builds under QEMU emulation. For local iteration, build only your native arch:
+`task build:image PLATFORMS=linux/arm64`. The full multi-arch build is only needed for the actual release push.
+
+**9. Agent can't reach LiteLLM.**
+Container-internal URL is `http://litellm:4000/v1`; external is `https://litellm.itkdev.dk/v1`. If you point the
+container at the external URL by mistake, you'll either hit the public internet (and probably hit a firewall) or fail
+TLS verification. The same virtual key works at both URLs.
+
+**10. Healthcheck shows `(unhealthy)`.**
+`curl` must be available inside the image. The Dockerfile template installs it in the `base` stage (
+`apt-get install curl`). If you copied a slimmer base or removed the install line, the healthcheck silently fails and
+the container stays `(unhealthy)`.
+
+**11. `pip install` from your laptop fails or hangs.**
+Don't install Python deps on the host. Every Python tool runs inside the container: `task install`, or `task shell` then
+`pip ...`. The whole point of the Taskfile wrapper is that you never set up a local venv.
+
+**12. Added a dep to `pyproject.toml` but `task test` still errors with `ImportError`.**
+The container only re-installs deps when rebuilt. After editing `pyproject.toml`: `task build && task restart`, or just
+`task install` to install into the *running* container without a rebuild (faster for iteration).
+
+**13. `task lint` / `task test` says "service my-agent is not running."**
+Most tasks shell into a *running* container via `docker compose exec`. You need `task up` first.
+
+**14. Traefik shows "no route" or 404 for your agent's host.**
+Check that `${COMPOSE_PROJECT_NAME}` and `${COMPOSE_DOMAIN}` are set (Taskfile loads them via dotenv) and that they
+match the labels in your compose. Verify with `docker compose config` - if the rendered `Host(...)` is empty, the env
+var didn't propagate.
+
+**15. Works locally, won't start in production: `Permission denied: '/cache/...'`.**
+The non-root `appuser` doesn't own a volume that was mounted before the ownership fix. Recreate the named volume (
+`docker volume rm <vol>`), and double-check that the Dockerfile `chown -R appuser /cache` runs in the *prod* target
+too - easy to add to dev only and forget.
+
+---
+
+## 13. Reference index
+
+Every file in [retrieval-agent](https://github.com/AarhusAI/retrieval-agent) this guide leans on - open these when
+you're stuck. All `retrieval-agent/...` paths below link to `main` on GitHub.
+
+| File                                                              | What to look at                                                                                     |
+|-------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------|
+| [`retrieval-agent/Dockerfile`][ra-dockerfile]                     | Multi-stage base → dev → prod, non-root user, model cache pattern                                   |
+| [`retrieval-agent/docker-compose.yml`][ra-compose]                | Networks, healthcheck, Traefik labels, `target: ${TARGET:-dev}`                                     |
+| [`retrieval-agent/Taskfile.yml`][ra-taskfile]                     | Full task catalogue; `build:image` + `build:image:builder` for multi-arch                           |
+| [`retrieval-agent/pyproject.toml`][ra-pyproject]                  | Deps, optional dev deps, ruff + pytest config                                                       |
+| [`retrieval-agent/.dockerignore`][ra-dockerignore]                | Allowlist-style exclude-everything-then-allow pattern                                               |
+| [`retrieval-agent/app/main.py`][ra-main]                          | FastAPI app, lifespan eager init, health + readiness endpoints                                      |
+| [`retrieval-agent/app/config.py`][ra-config]                      | pydantic-settings pattern, cross-system contract comments                                           |
+| [`retrieval-agent/app/auth.py`][ra-auth]                          | `HTTPBearer` + `hmac.compare_digest` constant-time check                                            |
+| [`retrieval-agent/app/routes/search.py`][ra-search]               | Thin route shell - auth + validation + service call                                                 |
+| [`retrieval-agent/app/services/pipeline.py`][ra-pipeline]         | Where the actual work lives                                                                         |
+| [`retrieval-agent/app/services/agent.py`][ra-agent]               | PydanticAI agent loop, side-channel results, fallback parser, timeout handling, strict-tools toggle |
+| [`retrieval-agent/app/services/query_generation.py`][ra-qg]       | Single-shot LLM call via httpx (no PydanticAI)                                                      |
+| [`retrieval-agent/tests/conftest.py`][ra-conftest]                | Env-before-import setup + autouse client reset fixture                                              |
+| [`retrieval-agent/.env.example`][ra-envexample]                   | Documented env var template                                                                         |
+| [`retrieval-agent/README.md`][ra-readme]                          | Public-facing API docs and config reference                                                         |
+| [`retrieval-agent/CLAUDE.md`][ra-claude]                          | Non-obvious internals worth capturing for future maintainers                                        |
+| `aarhusai-docker/docker-compose.yml` (service `retrieval`)        | Dev embedding pattern in the parent stack                                                           |
+| `aarhusai-docker/docker-compose.server.yml` (service `retrieval`) | Production embedding pattern with `restart: unless-stopped`                                         |
+
+[ra-dockerfile]: https://github.com/AarhusAI/retrieval-agent/blob/main/Dockerfile
+
+[ra-compose]: https://github.com/AarhusAI/retrieval-agent/blob/main/docker-compose.yml
+
+[ra-taskfile]: https://github.com/AarhusAI/retrieval-agent/blob/main/Taskfile.yml
+
+[ra-pyproject]: https://github.com/AarhusAI/retrieval-agent/blob/main/pyproject.toml
+
+[ra-dockerignore]: https://github.com/AarhusAI/retrieval-agent/blob/main/.dockerignore
+
+[ra-main]: https://github.com/AarhusAI/retrieval-agent/blob/main/app/main.py
+
+[ra-config]: https://github.com/AarhusAI/retrieval-agent/blob/main/app/config.py
+
+[ra-auth]: https://github.com/AarhusAI/retrieval-agent/blob/main/app/auth.py
+
+[ra-search]: https://github.com/AarhusAI/retrieval-agent/blob/main/app/routes/search.py
+
+[ra-pipeline]: https://github.com/AarhusAI/retrieval-agent/blob/main/app/services/pipeline.py
+
+[ra-conftest]: https://github.com/AarhusAI/retrieval-agent/blob/main/tests/conftest.py
+
+[ra-envexample]: https://github.com/AarhusAI/retrieval-agent/blob/main/.env.example
+
+[ra-readme]: https://github.com/AarhusAI/retrieval-agent/blob/main/README.md
+
+[ra-claude]: https://github.com/AarhusAI/retrieval-agent/blob/main/CLAUDE.md
diff --git a/technical/tool.md b/technical/tool.md
new file mode 100644
index 0000000..723fc0a
--- /dev/null
+++ b/technical/tool.md
@@ -0,0 +1,739 @@
+---
+title: Build a tool
+parent: Technical documentation
+---
+
+# Building an MCP tool
+
+Taking an external HTTP API and exposing it as a **Streamable HTTP MCP server** that Open WebUI (or any MCP client) can
+call as a tool. The tool function is a deterministic wrapper around one API call.
+
+## Table of contents
+
+1. [What this guide is for](#1-what-this-guide-is-for)
+2. [Prerequisites](#2-prerequisites)
+3. [Mental model](#3-mental-model)
+4. [The minimal MCP tool](#4-the-minimal-mcp-tool)
+5. [A real-world tool](#5-a-real-world-tool)
+6. [FastAPI coexistence — health and MCP at the same port](#6-fastapi-coexistence--health-and-mcp-at-the-same-port)
+7. [Plugging into Open WebUI](#7-plugging-into-open-webui)
+8. [Caching (when it's worth it)](#8-caching-when-its-worth-it)
+9. [Plugging into the parent stack](#9-plugging-into-the-parent-stack)
+10. [Testing](#10-testing)
+11. [Verification checklist](#11-verification-checklist)
+12. [Common pitfalls / FAQ](#12-common-pitfalls--faq)
+13. [Reference index](#13-reference-index)
+
+---
+
+## 1. What this guide is for
+
+You've got an external API — a weather service, an address-lookup endpoint, an internal CRM — and you want **Open WebUI
+** (or any other MCP client) to call it as a tool. The vehicle for that is an **MCP server**: your code exposes one or
+more *tools* over the Model Context Protocol; the MCP client (Open WebUI, Claude Desktop, etc.) discovers them via
+`tools/list` and invokes them via `tools/call`.
+
+The canonical real-world example in this org is [AarhusAI/search-agent](https://github.com/AarhusAI/search-agent),
+which wraps a SearXNG search endpoint as an MCP server. This guide is the same shape with a generic weather example.
+
+**Not in scope:**
+
+- LLM-backed workflows or agentic loops — start with [the agent guide](./agentic_tool.md). The `@agent.tool` decorator
+  from PydanticAI is a different "tool" concept and lives inside an agent process; this guide is about
+  *MCP-protocol* tools that external clients call.
+- The older HTTP+SSE MCP transport — superseded by Streamable HTTP; not covered.
+- Long-running batch jobs — a task runner / cron suits those better.
+
+## 2. Prerequisites
+
+- Read [the agent guide](./agentic_tool.md) §1–§7 first. This guide reuses the same skeleton (FastAPI, Taskfile, Docker,
+  networks) and only covers what's MCP-specific.
+- A target external API with a documented HTTP contract: known endpoint, request shape, response shape, error codes,
+  auth method.
+- A clone of `openwebui-docker` running locally so your tool sits on the `app` and `frontend` networks.
+- One sentence of MCP background: an MCP server exposes named **tools** (functions with typed parameters); clients
+  list and call them over JSON-RPC. The official spec is at [modelcontextprotocol.io](https://modelcontextprotocol.io/).
+
+## 3. Mental model
+
+The MCP client speaks to a single endpoint on your server using JSON-RPC over HTTP. Your server registers one or more *
+*tool functions**; each call routes to the matching function, which calls the upstream API and returns the result.
+
+```mermaid
+flowchart LR
+    subgraph stack["Docker app network"]
+        caller([Client])
+        tool["my-tool<br/>FastAPI + FastMCP<br/>http:\/\/my-tool:8000\/"]
+    end
+
+    api["api.example.com"]
+
+    caller -->|"MCP: initialize, tools/list, tools/call"| tool
+    tool -->|"GET /weather + auth header"| api
+    api -->|"JSON"| tool
+    tool -->|"MCP result (JSON)"| caller
+```
+
+The wrapper exists to centralise three things — upstream auth (your tool holds the API key, not the client), error
+normalisation (every caller sees the same error shape), and observability (you log and metric in one place).
+
+> **"Tool" is overloaded.** This guide is about *MCP* tools — functions exposed to external clients over the MCP
+> protocol. [The agent guide](./agentic_tool.md) §8 covers *PydanticAI* tools — `@agent.tool` functions called from
+> inside an agent's own LLM loop. They are unrelated despite the shared word.
+
+## 4. The minimal MCP tool
+
+The smallest working MCP server — one tool that calls the API and returns a result. Drop it into the §7 skeleton from
+[the agent guide](./agentic_tool.md), with one extra dependency.
+
+Add `mcp[cli]>=1.0` to `pyproject.toml`:
+
+```toml
+[project]
+dependencies = [
+    "fastapi>=0.115.0",
+    "uvicorn[standard]>=0.30.0",
+    "pydantic>=2.0",
+    "pydantic-settings>=2.0",
+    "httpx>=0.27.0",
+    "mcp[cli]>=1.0",
+]
+```
+
+`app/mcp_server.py`:
+
+```python
+import json
+
+import httpx
+from mcp.server.fastmcp import FastMCP
+
+mcp = FastMCP("my-tool")
+
+
+@mcp.tool()
+async def get_weather(city: str) -> str:
+    """Get the current weather for a city.
+
+    Use this when the user asks about current conditions, temperature, or
+    precipitation. Returns a JSON string with city, temperature in Celsius,
+    and a short description of conditions.
+    """
+    async with httpx.AsyncClient(timeout=10) as client:
+        resp = await client.get(
+            "https://api.example.com/weather",
+            params={"city": city},
+        )
+        resp.raise_for_status()
+        payload = resp.json()
+
+    return json.dumps({
+        "city": payload["city"],
+        "temperature_c": payload["current"]["temp_c"],
+        "conditions": payload["current"]["conditions"],
+    })
+```
+
+`app/main.py` - mount the MCP server on FastAPI and wrap the lifespan with the MCP session manager:
+
+```python
+from contextlib import asynccontextmanager
+
+from fastapi import FastAPI
+
+from app.mcp_server import mcp
+
+
+@asynccontextmanager
+async def lifespan(app: FastAPI):
+    async with mcp.session_manager.run():
+        yield
+
+
+app = FastAPI(title="My Tool", lifespan=lifespan)
+
+
+@app.get("/health")
+async def health():
+    return {"status": "ok"}
+
+
+app.mount("/", mcp.streamable_http_app())
+```
+
+That's enough to run. `task up`, then verify with an `initialize` call:
+
+```bash
+curl -X POST http://localhost:8000/ \
+     -H "Content-Type: application/json" \
+     -H "Accept: application/json, text/event-stream" \
+     -d '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{},"clientInfo":{"name":"curl","version":"0"}}}'
+```
+
+You should get back a JSON-RPC response listing the server's capabilities. The rest of this guide makes the tool
+production-ready.
+
+---
+
+## 5. A real-world tool
+
+### 5.1 Module-level httpx client
+
+Creating an `httpx.AsyncClient` per call wastes connections. Lift it to module scope with lazy init, mirroring the
+pattern from [the agent guide §7.9](./new-agent.md#79-routes-and-services). Put it in `app/services/weather.py`:
+
+```python
+import httpx
+
+from app.config import settings
+
+_client: httpx.AsyncClient | None = None
+
+
+def _get_client() -> httpx.AsyncClient:
+    global _client
+    if _client is None:
+        _client = httpx.AsyncClient(
+            base_url=settings.weather_api_base_url,
+            headers={"Authorization": f"Bearer {settings.weather_api_key}"},
+            timeout=httpx.Timeout(10.0, connect=3.0),
+        )
+    return _client
+
+
+async def close_client() -> None:
+    global _client
+    if _client is not None:
+        await _client.aclose()
+        _client = None
+```
+
+Close the client on shutdown by extending the lifespan from §4 (full version shown in §6).
+
+### 5.2 Service layer
+
+Move the HTTP call out of the tool function into `app/services/weather.py`. The MCP tool stays thin (signature plus
+shaping the return value); the service holds the actual work.
+
+```python
+# app/services/weather.py
+async def fetch_weather(city: str, units: str) -> dict:
+    client = _get_client()
+    resp = await client.get("/weather", params={"city": city, "units": units})
+    resp.raise_for_status()
+    return resp.json()
+```
+
+### 5.3 The MCP tool
+
+Update the tool to call the service and return a shaped JSON string. The function signature is the **public contract**:
+parameter names, type hints, and the docstring all become part of the schema MCP clients see.
+
+```python
+# app/mcp_server.py
+import json
+from typing import Literal
+
+from mcp.server.fastmcp import FastMCP
+
+from app.services.weather import fetch_weather
+
+mcp = FastMCP("my-tool")
+
+
+@mcp.tool()
+async def get_weather(
+    city: str,
+    units: Literal["metric", "imperial"] = "metric",
+) -> str:
+    """Get the current weather for a city.
+
+    Use this when the user asks about current conditions, temperature, or
+    precipitation. Do not use it for forecasts more than 24 hours out — this
+    API only returns current observations.
+
+    Args:
+        city: City name in English (e.g. "Aarhus", "Copenhagen").
+        units: "metric" for Celsius/km, "imperial" for Fahrenheit/miles.
+    """
+    payload = await fetch_weather(city, units)
+    return json.dumps({
+        "city": payload["city"],
+        "temperature_c": payload["current"]["temp_c"],
+        "conditions": payload["current"]["conditions"],
+        "observed_at": payload["current"]["observed_at"],
+    })
+```
+
+Two rules:
+
+- **Precise types in the signature.** `Literal["metric", "imperial"]` becomes an enum in the JSON schema clients see;
+  callers (or the LLM driving them) can't invent values outside the set.
+- **The docstring is the model's instruction manual.** First line becomes the tool description; the rest tells the
+  client *when* to call it. Write it like help text for a teammate.
+
+The MCP SDK accepts several return types (string, dict, list, pydantic model). Returning a **JSON string** is the most
+portable shape across SDK versions and matches what search-agent does.
+
+### 5.4 Inbound transport security
+
+MCP's Streamable HTTP transport doesn't use bearer tokens by default — it uses **host validation**: the server only
+accepts requests whose `Host` header is on an allowlist. This prevents DNS-rebinding attacks against MCP servers that
+sit on `localhost` or internal hostnames.
+
+Configure it on the `FastMCP` instance:
+
+```python
+# app/mcp_server.py
+from mcp.server.fastmcp import FastMCP
+from mcp.server.transport_security import TransportSecuritySettings
+
+from app.config import settings
+
+mcp = FastMCP(
+    "my-tool",
+    transport_security=TransportSecuritySettings(
+        allowed_hosts=settings.mcp_allowed_hosts,
+    ),
+)
+```
+
+And in `app/config.py`:
+
+```python
+class Settings(BaseSettings):
+    mcp_allowed_hosts: list[str] = [
+        "my-tool:8000",        # service-to-service inside the app network
+        "localhost:8000",      # local curl/dev
+    ]
+```
+
+When Open WebUI calls your tool, it sends the upstream `Host` header it was configured to use; that host **must** be
+on the allowlist. Production deployments add their public hostname (e.g. `my-tool.itkdev.dk`) here too.
+
+If you also need bearer-token auth on the inbound side (to prevent any other container on the same network from
+calling your tool), wrap the MCP mount in a FastAPI middleware that checks an `Authorization` header before the
+request reaches the MCP app. The transport-security allowlist is a baseline; layered auth is fine.
+
+### 5.5 Outbound auth (upstream API)
+
+Same pattern as a regular HTTP service. The upstream API key lives in `Settings`:
+
+```python
+# app/config.py
+class Settings(BaseSettings):
+    weather_api_key: str  # required — no default
+    weather_api_base_url: str = "https://api.example.com"
+    mcp_allowed_hosts: list[str] = ["my-tool:8000", "localhost:8000"]
+```
+
+It's injected on the long-lived client (§5.1). Document everything in `.env.example`:
+
+```bash
+# Outbound — upstream API
+WEATHER_API_KEY=
+WEATHER_API_BASE_URL=https://api.example.com
+
+# Inbound — MCP transport security
+MCP_ALLOWED_HOSTS=["my-tool:8000","localhost:8000"]
+```
+
+### 5.6 Error handling
+
+The MCP SDK surfaces any exception raised inside a `@mcp.tool()` function as a JSON-RPC error to the client. Don't use
+FastAPI's `HTTPException` — it only works for HTTP routes, not for MCP tool functions.
+
+Raise a plain exception with a useful message:
+
+```python
+# app/services/weather.py
+import httpx
+import logging
+
+log = logging.getLogger(__name__)
+
+
+class WeatherError(RuntimeError):
+    """Raised when the upstream weather API can't fulfil a request."""
+
+
+async def fetch_weather(city: str, units: str) -> dict:
+    client = _get_client()
+    try:
+        resp = await client.get("/weather", params={"city": city, "units": units})
+        resp.raise_for_status()
+    except httpx.HTTPStatusError as exc:
+        if exc.response.status_code == 404:
+            raise WeatherError(f"City not found: {city}") from exc
+        if 400 <= exc.response.status_code < 500:
+            raise WeatherError("Invalid request to weather API") from exc
+        log.warning("Weather API 5xx for %s: %s", city, exc)
+        raise WeatherError("Upstream weather service unavailable") from exc
+    except (httpx.TimeoutException, httpx.TransportError) as exc:
+        log.warning("Weather API transport error for %s: %s", city, exc)
+        raise WeatherError("Weather service timed out") from exc
+
+    return resp.json()
+```
+
+The MCP client receives a JSON-RPC error with `code` and `message` fields. The message becomes user-visible in
+the client, so keep it short and human-friendly.
+
+### 5.7 Timeouts
+
+Two distinct timeouts, plus a third that's not yours to set:
+
+| Layer       | Set via                              | Typical value | What it bounds                          |
+|-------------|--------------------------------------|---------------|-----------------------------------------|
+| Per request | `httpx.Timeout` on the client (§5.1) | 10s           | One outbound HTTP call to the upstream  |
+| Server-side | Uvicorn keep-alive / Traefik         | 60–120s       | Total request handling time             |
+| Client-side | Client's MCP request timeout         | 60s default   | What the caller waits before giving up  |
+
+Rule of thumb: a single upstream call should be ≤ 10s. If the API is regularly slower, either cache (§8) or rethink
+whether this needs to be a synchronous MCP tool at all.
+
+## 6. FastAPI coexistence — health and MCP at the same port
+
+Your service is one FastAPI app: regular routes (health, debug) coexist with the MCP mount. The lifespan **must** wrap
+`yield` with `async with mcp.session_manager.run():` — without it the MCP endpoint accepts requests but can't handle
+them, and every call returns a 500.
+
+```python
+# app/main.py
+import logging
+from contextlib import asynccontextmanager
+
+from fastapi import FastAPI
+
+from app.config import settings
+from app.mcp_server import mcp
+from app.services import weather
+
+logging.basicConfig(level=logging.INFO)
+log = logging.getLogger(__name__)
+
+
+@asynccontextmanager
+async def lifespan(app: FastAPI):
+    log.info("Starting %s with allowed hosts %s", app.title, settings.mcp_allowed_hosts)
+    async with mcp.session_manager.run():
+        yield
+    await weather.close_client()
+    log.info("%s shut down", app.title)
+
+
+app = FastAPI(title="My Tool", lifespan=lifespan)
+
+
+@app.get("/health")
+async def health():
+    return {"status": "ok"}
+
+
+@app.get("/health/ready")
+async def health_ready():
+    try:
+        client = weather._get_client()
+        await client.head("/")
+        return {"status": "ok"}
+    except Exception as exc:
+        log.warning("Readiness check failed: %s", exc)
+        return {"status": "error", "detail": str(exc)}
+
+
+app.mount("/", mcp.streamable_http_app())
+```
+
+**Mount order matters.** `app.mount("/", ...)` claims the root path. Add any non-MCP routes (`/health`,
+`/health/ready`, `/debug/...`) **before** the mount. Routes registered before a mount on `/` continue to win for
+their specific paths.
+
+---
+
+## 7. Plugging into Open WebUI
+
+Open WebUI's MCP client speaks Streamable HTTP, which is what this guide builds — so registration is
+configuration-only, no code changes.
+
+What Open WebUI needs:
+
+- **The MCP endpoint URL.** Inside the `app` network it's `http://my-tool:8000/`. From a host browser or a remote
+  client, use the Traefik-routed public URL.
+- **The host you registered with Open WebUI must be on the allowlist** (§5.4). If Open WebUI hits
+  `http://my-tool:8000/`, then `my-tool:8000` must be in `MCP_ALLOWED_HOSTS`. If it hits a Traefik hostname, add that
+  too.
+
+In Open WebUI, register the tool under **Settings → Tools → MCP servers** (exact path moves between releases — check
+the running version). The fields you typically need:
+
+| Field         | Value                                                |
+|---------------|------------------------------------------------------|
+| Name          | `my-tool`                                            |
+| Transport     | Streamable HTTP                                      |
+| URL           | `http://my-tool:8000/`                               |
+| Auth (if any) | Bearer token, only if you layered one on top of §5.4 |
+
+Open WebUI calls `initialize` and `tools/list` on registration. If the host is allowlisted and the lifespan is wrapped
+correctly, your `get_weather` tool appears immediately and can be enabled per-conversation.
+
+---
+
+## 8. Caching (when it's worth it)
+
+Cache only if **all three** hold:
+
+- The API is **idempotent** for the same arguments.
+- The API is **slow** or **rate-limited** — caching avoids the cost.
+- Stale-by-a-bit data is **acceptable** for your use case.
+
+Cache at the **service layer** (`fetch_weather`), not at the MCP layer — the service is the only place that knows
+which upstream calls are expensive.
+
+`functools.lru_cache` doesn't work with `async def` — it caches coroutine objects, not their results. Use a small
+TTL-aware dict, or `async-lru`:
+
+```python
+import time
+
+_cache: dict[tuple, tuple[float, dict]] = {}
+_TTL_SECONDS = 600
+
+
+def _cached(key: tuple) -> dict | None:
+    entry = _cache.get(key)
+    if entry is None:
+        return None
+    expires_at, value = entry
+    if time.time() > expires_at:
+        del _cache[key]
+        return None
+    return value
+
+
+async def fetch_weather(city: str, units: str) -> dict:
+    key = ("weather", city.lower(), units)
+    if (hit := _cached(key)) is not None:
+        return hit
+
+    # ... existing fetch logic ...
+    _cache[key] = (time.time() + _TTL_SECONDS, payload)
+    return payload
+```
+
+Caveats:
+
+- **Cache key must include every argument.** Forgetting `units` means metric callers get imperial cached results.
+- **Never cache errors.** A transient 503 will poison the cache for the whole TTL.
+- For multi-replica deployments, an in-process cache means cache misses on most requests. If that matters, use Redis
+  on the `app` network instead.
+
+---
+
+## 9. Plugging into the parent stack
+
+The FastAPI skeleton, Dockerfile, Taskfile, networks, and parent-stack integration are **identical** to
+[the agent guide](./agentic_tool.md) — same boilerplate, same conventions. Specifically:
+
+- **Skeleton + `pyproject.toml`** — [§7](./new-agent.md#7-step-by-step-setup). Drop `pydantic-ai-slim` from
+  `dependencies`; add `mcp[cli]>=1.0`.
+- **Dockerfile** — [§7.3](./new-agent.md#73-dockerfile). No model cache needed.
+- **docker-compose.yml** — [§7.4](./new-agent.md#74-docker-composeyml). Same networks, same Traefik labels.
+- **Taskfile.yml** — [§7.5](./new-agent.md#75-taskfileyml). Verbatim copy.
+- **Lifespan + health** — see §6 above; the lifespan must wrap `yield` with `mcp.session_manager.run()`.
+- **Embedding in the parent stack** — [§9](./new-agent.md#9-integrating-with-the-parent-stack). Same dev/prod split,
+  same `image:` reference pattern. **Add `MCP_ALLOWED_HOSTS` to the env block**, including every hostname Open WebUI
+  will use to reach you.
+- **Multi-arch image build** — [§10](./new-agent.md#10-production-builds-multi-arch). Verbatim.
+
+The only delta is what's in `app/mcp_server.py` and `app/services/` — the FastMCP setup, the tool registration, and
+the service-layer API wrapper.
+
+---
+
+## 10. Testing
+
+Two layers: service-layer with `respx`, MCP-layer with a direct call to the tool function.
+
+Add `respx` to dev deps in `pyproject.toml`:
+
+```toml
+[project.optional-dependencies]
+dev = [
+    # ... existing ...
+    "respx>=0.21",
+]
+```
+
+### Service-layer tests
+
+`tests/services/test_weather.py`:
+
+```python
+import pytest
+import respx
+from httpx import Response
+
+from app.services import weather
+from app.services.weather import WeatherError, fetch_weather
+
+
+@pytest.fixture(autouse=True)
+def _reset_client():
+    yield
+    weather._client = None
+
+
+@respx.mock
+async def test_fetch_weather_happy_path():
+    respx.get("https://api.example.com/weather").mock(
+        return_value=Response(200, json={
+            "city": "Aarhus",
+            "current": {
+                "temp_c": 14,
+                "conditions": "Partly cloudy",
+                "observed_at": "2026-05-18T12:00:00Z",
+            },
+        }),
+    )
+
+    result = await fetch_weather("Aarhus", "metric")
+
+    assert result["city"] == "Aarhus"
+
+
+@respx.mock
+async def test_fetch_weather_404_raises_weather_error():
+    respx.get("https://api.example.com/weather").mock(return_value=Response(404))
+
+    with pytest.raises(WeatherError, match="City not found"):
+        await fetch_weather("Atlantis", "metric")
+
+
+@respx.mock
+async def test_fetch_weather_5xx_raises_weather_error():
+    respx.get("https://api.example.com/weather").mock(return_value=Response(503))
+
+    with pytest.raises(WeatherError, match="unavailable"):
+        await fetch_weather("Aarhus", "metric")
+```
+
+### MCP-layer tests
+
+Invoke the tool function directly. FastMCP exposes the underlying coroutine via `.fn` on the decorated object:
+
+```python
+import json
+
+import respx
+from httpx import Response
+
+from app.mcp_server import get_weather
+
+
+@respx.mock
+async def test_get_weather_returns_json_string():
+    respx.get("https://api.example.com/weather").mock(
+        return_value=Response(200, json={
+            "city": "Aarhus",
+            "current": {
+                "temp_c": 14,
+                "conditions": "Partly cloudy",
+                "observed_at": "2026-05-18T12:00:00Z",
+            },
+        }),
+    )
+
+    result_json = await get_weather.fn("Aarhus", "metric")
+    result = json.loads(result_json)
+
+    assert result == {
+        "city": "Aarhus",
+        "temperature_c": 14,
+        "conditions": "Partly cloudy",
+        "observed_at": "2026-05-18T12:00:00Z",
+    }
+```
+
+For full integration tests (Streamable HTTP transport, real JSON-RPC framing) drive the FastAPI app with
+`httpx.AsyncClient` against the mounted endpoint — same approach as the agent guide §7.10 ASGI fixture. Most
+projects find the unit tests above sufficient.
+
+The `conftest.py` env-before-import rule from [the agent guide §7.10](./new-agent.md#710-testsconftestpy) still
+applies — set `WEATHER_API_KEY` and `MCP_ALLOWED_HOSTS` before `from app.main import app`.
+
+---
+
+## 11. Verification checklist
+
+Before declaring the tool done:
+
+- [ ] **Skeleton builds.** `task build` from a clean checkout.
+- [ ] **Service starts healthy.** `task up`; `docker compose ps` shows `(healthy)`.
+- [ ] **Health endpoints respond.** `/health` always 200; `/health/ready` 200 when upstream is reachable,
+  503 otherwise.
+- [ ] **MCP `initialize` works.** From `task shell`, the `curl` from §4 returns a JSON-RPC result with the server's
+  capabilities.
+- [ ] **MCP `tools/list` shows your tool.** Schema matches the function signature (parameters, enum values,
+  description).
+- [ ] **MCP `tools/call` succeeds.** Calling `get_weather` with a valid city returns the expected JSON payload.
+- [ ] **Transport security blocks wrong hosts.** A request with a `Host` header not on `MCP_ALLOWED_HOSTS` is
+  rejected.
+- [ ] **Upstream 4xx → meaningful error.** Unknown city → JSON-RPC error with the `City not found` message.
+- [ ] **Upstream 5xx / timeout → meaningful error.** Point `WEATHER_API_BASE_URL` at `http://127.0.0.1:1` and confirm
+  the MCP error mentions unavailability or timeout.
+- [ ] **`close_client()` is wired in `lifespan`.** Otherwise CI leaks connections.
+- [ ] **`.env.example` documents every env var.** `WEATHER_API_KEY`, `WEATHER_API_BASE_URL`, `MCP_ALLOWED_HOSTS`.
+- [ ] **Open WebUI registers the tool.** With the URL and allowlisted host, the tool appears in Open WebUI's tool
+  list and answers a real prompt end-to-end.
+- [ ] **`task ci` is green.** Lint + format + tests.
+
+---
+
+## 12. Common pitfalls / FAQ
+
+**1. Tool calls return 500, never reach my function.**
+The lifespan is missing `async with mcp.session_manager.run():` around `yield`. The Streamable HTTP transport needs
+that session manager to dispatch JSON-RPC calls. See §6.
+
+**2. Open WebUI says "MCP server unreachable" but `curl` works.**
+Host mismatch: Open WebUI sends a `Host` header your `MCP_ALLOWED_HOSTS` doesn't include. Add Open WebUI's hostname
+(the one it uses to reach you, not its own public hostname) to the allowlist and restart.
+
+**3. `HTTPException` from the tool isn't reaching the client.**
+`HTTPException` is FastAPI-only — it works for HTTP routes, not for MCP tool functions. Raise a plain exception
+instead; the MCP SDK turns it into a JSON-RPC error (§5.6).
+
+**4. The tool returns 200 but the response body is empty / weird.**
+Almost always a return-type issue. Return a JSON-serialisable **string** or **dict** from `@mcp.tool()` functions —
+returning a pydantic model directly works on newer SDK versions but is fragile across upgrades. Use `json.dumps(...)`
+to be safe.
+
+**5. Mount path collisions.**
+`app.mount("/", mcp.streamable_http_app())` claims the root path; any non-MCP route you want (health, debug) must be
+registered **before** the mount. If a route is registered after the mount, the mount wins.
+
+**6. "Why do I see two kinds of 'tool' in our docs?"**
+[The agent guide §8](./new-agent.md#8-pydanticai-patterns) covers PydanticAI `@agent.tool` — Python functions called
+from inside an *agent's own LLM loop*. This guide covers MCP tools — functions exposed to *external clients* over the
+MCP protocol. Different concepts, same word. A service can do both (search-agent does), but for most cases you want
+one or the other.
+
+**7. Tests pass locally, fail in CI with `ConnectError`.**
+`respx` only intercepts inside its context manager (`@respx.mock` or `with respx.mock():`). If a test forgets the
+decorator the real HTTP call fires against `api.example.com` and you get a transport error. The autouse fixture
+resetting `_client` to `None` (§10) is essential.
+
+---
+
+## 13. Reference index
+
+| Resource                                                                 | What it gives you                              |
+|--------------------------------------------------------------------------|------------------------------------------------|
+| [The agent guide §6](./new-agent.md#6-repository-layout)                 | Routes/services split; project layout          |
+| [The agent guide §7](./new-agent.md#7-step-by-step-setup)                | Reusable FastAPI/Docker/Taskfile skeleton      |
+| [The agent guide §7.10](./new-agent.md#710-testsconftestpy)              | Env-before-import test setup                   |
+| [The agent guide §9](./new-agent.md#9-integrating-with-the-parent-stack) | Networking, parent-stack embedding             |
+| [AarhusAI/search-agent](https://github.com/AarhusAI/search-agent)        | Canonical real-world MCP server in this org    |
+| [Python MCP SDK](https://github.com/modelcontextprotocol/python-sdk)     | API reference for `FastMCP`, transports, tools |
+| [MCP spec](https://modelcontextprotocol.io/)                             | Transport-level details and JSON-RPC framing   |
+| [respx](https://lundberg.github.io/respx/)                               | httpx mocking library used in tests            |

From 4ee4c1f2ad590dae7e7c515938909c5eecb9b6be Mon Sep 17 00:00:00 2001
From: Jesper Kristensen <cableman@linuxdev.dk>
Date: Tue, 19 May 2026 15:25:17 +0200
Subject: [PATCH 2/2] Fixed dead links

---
 technical/agentic_tool.md | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/technical/agentic_tool.md b/technical/agentic_tool.md
index f52e441..acda573 100644
--- a/technical/agentic_tool.md
+++ b/technical/agentic_tool.md
@@ -1273,8 +1273,8 @@ you're stuck. All `retrieval-agent/...` paths below link to `main` on GitHub.
 | [`retrieval-agent/.env.example`][ra-envexample]                   | Documented env var template                                                                         |
 | [`retrieval-agent/README.md`][ra-readme]                          | Public-facing API docs and config reference                                                         |
 | [`retrieval-agent/CLAUDE.md`][ra-claude]                          | Non-obvious internals worth capturing for future maintainers                                        |
-| `aarhusai-docker/docker-compose.yml` (service `retrieval`)        | Dev embedding pattern in the parent stack                                                           |
-| `aarhusai-docker/docker-compose.server.yml` (service `retrieval`) | Production embedding pattern with `restart: unless-stopped`                                         |
+| [`aarhusai-docker/docker-compose.yml`][ad-compose]                | Dev embedding pattern in the parent stack (service `retrieval`)                                     |
+| [`aarhusai-docker/docker-compose.server.yml`][ad-compose-server]  | Production embedding pattern with `restart: unless-stopped` (service `retrieval`)                   |
 
 [ra-dockerfile]: https://github.com/AarhusAI/retrieval-agent/blob/main/Dockerfile
 
@@ -1296,6 +1296,10 @@ you're stuck. All `retrieval-agent/...` paths below link to `main` on GitHub.
 
 [ra-pipeline]: https://github.com/AarhusAI/retrieval-agent/blob/main/app/services/pipeline.py
 
+[ra-agent]: https://github.com/AarhusAI/retrieval-agent/blob/main/app/services/agent.py
+
+[ra-qg]: https://github.com/AarhusAI/retrieval-agent/blob/main/app/services/query_generation.py
+
 [ra-conftest]: https://github.com/AarhusAI/retrieval-agent/blob/main/tests/conftest.py
 
 [ra-envexample]: https://github.com/AarhusAI/retrieval-agent/blob/main/.env.example
@@ -1303,3 +1307,7 @@ you're stuck. All `retrieval-agent/...` paths below link to `main` on GitHub.
 [ra-readme]: https://github.com/AarhusAI/retrieval-agent/blob/main/README.md
 
 [ra-claude]: https://github.com/AarhusAI/retrieval-agent/blob/main/CLAUDE.md
+
+[ad-compose]: https://github.com/AarhusAI/aarhusai-docker/blob/main/docker-compose.yml
+
+[ad-compose-server]: https://github.com/AarhusAI/aarhusai-docker/blob/main/docker-compose.server.yml