bench

Reproducible performance harness for the kplane apiserver.

The kplane apiserver multiplexes many virtual control planes (VCPs) onto a single Go process with shared per-resource watch caches and per-cluster isolation. This repo lifts the in-tree 200-VCP smoke test into a parameterized benchmark that anyone can run with one command and that reports CPU, heap, RSS, goroutines, and (with the mixed scenarios) read/write latency.

What you get per run:

per-phase CPU%, heap, RSS, goroutines, file descriptors scraped from /metrics
per-VCP normalized numbers in the same table (heap KiB / VCP, etc.)
read/write p50/p90/p99/max latency for the mixed scenarios
heap and goroutine pprof dumps for each phase

Captured runs from the published kplanedev/apiserver:latest image live in results/ — you can browse the numbers without running anything.

Prerequisites


Docker	with the `docker compose` v2 plugin (`docker compose version` should print v2.x). Default mode pulls `kplanedev/apiserver:latest` and `quay.io/coreos/etcd:v3.5.18` and brings up both containers per run.
Go	1.24+ to build and run `bench` itself. The CLI is the entry point in default mode; it does not link against the apiserver.
Free ports	`6443` (apiserver) and `2379` (etcd). The bench's docker runner does `compose down -v --remove-orphans` first, so leftover containers from an earlier run are not a problem — but anything else listening on those ports is.
Free memory	≥3 GiB for `bootstrap-200`, ≥6 GiB for `bootstrap-1000`, ≥8 GiB for `bootstrap-2000` (peak RSS during the bootstrap burst).
Disk	<100 MB for outputs per run (mostly pprof).

That's it — no apiserver checkout needed unless you want to bench a local build (see Hacking on the apiserver).

Quickstart

git clone https://github.com/kplane-dev/bench
cd bench

make bootstrap-200          # ≈4 minutes on a recent laptop

Result lands in out/bootstrap-200/:

summary.md        # human-readable digest with per-VCP and delta tables
phases.csv        # one row per phase, all numbers
latency.csv       # header-only for bootstrap; populated for mixed-*
pprof/
  heap-bootstrap-200-post-bootstrap.pb.gz
  heap-bootstrap-200-steady.pb.gz
  goroutine-bootstrap-200-post-bootstrap.txt
  goroutine-bootstrap-200-steady.txt

out/ is gitignored. results/ (committed) holds reference runs.

Other one-liners

make bootstrap-500
make bootstrap-1000
make bootstrap-2000

make mixed-hot-1000         # 10% hot subset, 5 wps + 20 rps per hot cluster, 2-min load
make mixed-uniform-1000     # spread the same load across all 1000 clusters

Or directly via the CLI:

go run ./cmd/bench run --scenario=bootstrap --clusters=2000 \
  --warmup=60s --post-bootstrap-settle=90s --steady-wait=90s

Scenarios

scenario	what it does	output of interest
`bootstrap`	Mirrors the in-tree 200vcp smoke test, parameterized by `--clusters`. Creates N VCPs each with one CRD.	per-VCP heap/RSS/goroutines, sustained idle CPU
`mixed-hot`	After bootstrap, drives concurrent CRUD on a hot subset (`--hot-percent`% of clusters) at `--write-rps-per-cluster` writes/s and `--read-rps-per-cluster` reads/s for `--load-duration`.	write/read p50/p90/p99 latency under "few noisy customers" load
`mixed-uniform`	Same as `mixed-hot` with `--hot-percent=100` so every cluster sees the same per-VCP RPS.	same metrics under "even load across the fleet"

How the benchmark gathers metrics

Per-VCP numbers are only meaningful if the baseline reflects a settled apiserver — startup churn (RBAC bootstrap, namespace controllers, watch caches warming) would otherwise leak into the per-VCP deltas.

Each run does, in order:

Clean launch. In docker mode, compose down -v --remove-orphans then compose up -d --pull always --force-recreate. etcd's data volume is removed; the apiserver container is recreated. In local mode, a new apiserver child process is spawned with a unique etcd prefix so the baseline isn't polluted by earlier runs even if etcd itself is long-running.
Warmup (--warmup, default 30s). Wait for the Go runtime, GC, and per-cluster controllers to quiet down.
Baseline sample. This is the reference point for all per-VCP deltas.
Bootstrap N VCPs concurrently across --workers (default 12). Each worker waits for the cluster's /readyz, waits for the default namespace, creates one CRD, waits for it to be Established, and waits for it to appear in discovery.
Post-bootstrap settle (--post-bootstrap-settle, default 15s), then post-bootstrap sample + heap/goroutine pprof.
Steady wait (--steady-wait, default 90s), then steady sample + heap/goroutine pprof.
(mixed-* only) Load phase: pre-warm 5 CRs per hot cluster, then ticker-driven writers and readers issue Get + Update / Get against the pre-warmed objects for --load-duration. Latency is collected client-side and percentiled. Post-load sample + pprof afterward.
Reports: summary.md, phases.csv, latency.csv written to the output dir.

The apiserver flag set used in docker mode mirrors apiserver/test/smoke/apiserver_test.go::startAPIServerWithOptions verbatim (RBAC + token-auth-file + service-account + shared service modes). Local mode reproduces the same flags against your sibling apiserver checkout.

Customization

--scenario           bootstrap | mixed-hot | mixed-uniform                    (default bootstrap)
--clusters           number of VCPs to bootstrap                              (default 200)
--workers            bootstrap concurrency                                    (default 12)

--apiserver-image    docker image tag                                         (default kplanedev/apiserver:latest)
--apiserver-bin      path to a prebuilt apiserver binary; switches to local mode
--apiserver-dir      path to apiserver source checkout; built on first use; switches to local mode
--etcd-endpoints     etcd URL for local mode; ETCD_ENDPOINTS env overrides
--apiserver-url      apiserver base URL (docker mode)                         (default https://127.0.0.1:6443)
--compose-file       path to docker-compose.yaml                              (default docker/docker-compose.yaml)

--out                output dir root                                          (default out)

--warmup                 wait this long after a clean launch before sampling baseline   (default 30s)
--post-bootstrap-settle  sleep before post-bootstrap sample                              (default 15s)
--steady-wait            sleep before steady-state sample                                (default 90s)

--load-duration          how long to drive mixed-* load                       (default 2m)
--hot-percent            mixed-hot subset size %                              (default 10)
--write-rps-per-cluster  writes/s per hot cluster                             (default 5)
--read-rps-per-cluster   reads/s per hot cluster                              (default 20)
--object-bytes           payload size in spec.data                            (default 1024)

Pin a specific apiserver version

go run ./cmd/bench run --clusters=1000 \
  --apiserver-image=kplanedev/apiserver:v0.5.0

APISERVER_IMAGE env also works.

Captured results

The runs in results/ were captured against kplanedev/apiserver:latest on Apple Silicon, Docker Desktop, 96 GiB host RAM, with --warmup=60s --post-bootstrap-settle=90s --steady-wait=90s.

scenario	clusters	bootstrap time	heap @ steady	per-VCP heap	per-VCP RSS	per-VCP goroutines	idle CPU%
`bootstrap`	200	12.0s	500 MiB	1.90 MiB	2.02 MiB	2.34	3.2%
`bootstrap`	500	22.7s	1.13 GiB	1.97 MiB	2.28 MiB	1.81	3.8%
`bootstrap`	2000	57.7s	4.05 GiB	2.00 MiB	2.30 MiB	1.08	8.1%
`mixed-hot`	200 + 2m load	12.0s	523 MiB	1.89 MiB	2.32 MiB	2.34	3.5%

Per-VCP heap is essentially flat between 200 and 2000 VCPs at roughly 2 MiB / VCP. Per-VCP goroutine cost falls as N grows because the fixed per-process overhead (controllers, root cluster informers, HTTP/2 server plumbing) amortizes over more VCPs.

Each scenario's directory contains:

summary.md — full per-phase + per-VCP + delta tables and (for mixed-hot) latency under load
phases.csv — one row per phase, suitable for plotting
latency.csv — one row of mixed-* latency stats (header-only for bootstrap)
pprof/heap-*.pb.gz and pprof/goroutine-*.txt — the usual artifacts; open with go tool pprof <file> to attribute heap to call sites

`mixed-hot-200` traffic phase

The 200-VCP mixed-hot run drove sustained CRUD on the 10% hot subset (20 clusters) at 5 writes/s + 20 reads/s per cluster for 2 minutes:

op	count	errors	p50	p99	max
write	3,080	60	799.7 ms	813.9 ms	841.3 ms
read	6,220	80	399.8 ms	411.2 ms	414.8 ms

These latencies include Docker Desktop's port-forwarding overhead and a saturated single-core handling Get→Update sequences against a contended watch cache; they are an intentional stress on the most loaded path rather than a measurement of best-case latency. On a Linux node with direct networking and adequate cores, expect 5-10× lower numbers.

Pod sizing recipe

The bench gives you the four numbers you need to size a Kubernetes pod for an apiserver running N VCPs:

pod field	what to use from the bench
`resources.requests.memory`	RSS at steady + 10% headroom
`resources.limits.memory`	RSS at post-bootstrap (peak during a cold start) + safety factor
`resources.requests.cpu`	average CPU% during the steady phase, in millicores. With no traffic this is small (~50-100m); with `mixed-hot` use the post-load steady
`resources.limits.cpu`	peak CPU during the bootstrap phase. Saturates multiple cores while N VCPs are coming up — set 2-4 cores or leave unset and use HPA

A worked example pulled from results/bootstrap-2000/ (the heaviest captured run):

	observed	recommendation
baseline (post-warmup, 0 VCPs)	RSS 252 MiB, CPU 0%	—
post-bootstrap peak	RSS 6.67 GiB, ~5 cores during the burst	`limits.memory: 8Gi`, `limits.cpu: 4-6`
steady (no traffic)	RSS 4.85 GiB, CPU 8%	`requests.memory: 5Gi`, `requests.cpu: 250m`

Per-VCP rule of thumb at idle (from the cross-N table above):

heap        ≈ 2.0 MiB / VCP
RSS         ≈ 2.3 MiB / VCP
goroutines  ≈ 1.1 / VCP at N≥1k (higher at smaller N because of fixed cost)
CPU         ≈ effectively idle steady; bootstrap is multi-core-bursty

Re-measure with mixed-hot at your expected RPS before sizing for production — bootstrap-only undercounts the steady CPU and heap that real customer traffic will add. RSS in particular runs ~10-20% high on macOS Docker because of LinuxKit VM accounting; on a Linux node expect lower numbers.

Hacking on the apiserver

If you're iterating on kplane-dev/apiserver and want to bench a local build, switch to local mode:

# 1) start a local etcd (one-time per session)
docker run -d --name bench-etcd -p 2379:2379 \
  quay.io/coreos/etcd:v3.5.18 etcd \
  --advertise-client-urls=http://0.0.0.0:2379 \
  --listen-client-urls=http://0.0.0.0:2379

# 2) point at a sibling apiserver checkout (built on first use)
ETCD_ENDPOINTS=http://127.0.0.1:2379 \
go run ./cmd/bench run --clusters=500 --scenario=bootstrap \
  --apiserver-dir=../apiserver

# or a prebuilt binary
APISERVER_BIN=$KPLANE_DEV/apiserver/bin/apiserver \
ETCD_ENDPOINTS=http://127.0.0.1:2379 \
go run ./cmd/bench run --clusters=500

Either flag (--apiserver-bin, --apiserver-dir, or their env-var equivalents APISERVER_BIN / APISERVER_DIR) switches the runner from docker to local mode automatically.

Building / testing the bench itself

make tidy              # populate go.sum after modifying go.mod
make build             # produces bin/bench
make test              # unit tests for pure helpers
make vet               # go vet ./...
make fmt               # gofmt -w .

Repo layout

cmd/bench/                 # cobra CLI entry point
internal/
  runner/                  # docker + local apiserver runners
  workload/                # bootstrap + mixed-traffic loops
  metrics/                 # /metrics + pprof scrape, CSV/MD writers
docker/                    # compose stack used by docker mode
scenarios/                 # YAML manifests of canonical runs (advisory)
results/                   # committed reference results
out/                       # transient run outputs (gitignored)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

bench

Prerequisites

Quickstart

Other one-liners

Scenarios

How the benchmark gathers metrics

Customization

Pin a specific apiserver version

Captured results

`mixed-hot-200` traffic phase

Pod sizing recipe

Hacking on the apiserver

Building / testing the bench itself

Repo layout

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
cmd/bench		cmd/bench
docker		docker
internal		internal
out		out
results		results
scenarios		scenarios
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
go.mod		go.mod
go.sum		go.sum

Folders and files

Latest commit

History

Repository files navigation

bench

Prerequisites

Quickstart

Other one-liners

Scenarios

How the benchmark gathers metrics

Customization

Pin a specific apiserver version

Captured results

mixed-hot-200 traffic phase

Pod sizing recipe

Hacking on the apiserver

Building / testing the bench itself

Repo layout

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`mixed-hot-200` traffic phase

Packages