|
| 1 | +# v2.0 Performance Acceptance Gates |
| 2 | + |
| 3 | +This document records the v1 baseline measurements that drive the |
| 4 | +two PRD §3.6 numeric acceptance criteria for libhttpserver v2.0: |
| 5 | + |
| 6 | +| PRD requirement | Criterion | Verified by | |
| 7 | +|---|---|---| |
| 8 | +| PRD-REQ-REQ-001 | `get_headers()` ≥10× faster than v1 | `test/bench_get_headers.cpp` | |
| 9 | +| PRD-REQ-REQ-003 | `sizeof(http_resource)` shrinks by ~empty `std::map<std::string,bool>` | `test/bench_sizeof_http_resource.cpp` | |
| 10 | + |
| 11 | +The literal v1 constants live in |
| 12 | +[`test/v1_baseline/v1_constants.hpp`](v1_baseline/v1_constants.hpp); |
| 13 | +this file documents how they were obtained so future maintainers can |
| 14 | +re-measure if the build host's libstdc++ / libc++ / libmicrohttpd |
| 15 | +versions change materially. |
| 16 | + |
| 17 | +## Baseline measurement environment |
| 18 | + |
| 19 | +| Field | Value | |
| 20 | +|---|---| |
| 21 | +| `master` SHA at measurement | `d8b055e` — "Migrate to libmicrohttpd 1.0.0 API with new features (#370)" | |
| 22 | +| Host triple | `aarch64-apple-darwin25.3.0` (Apple Silicon) | |
| 23 | +| Compiler | Apple clang 21.0.0 (`clang-2100.0.123.102`) | |
| 24 | +| C++ standard library | libc++ (LLVM) | |
| 25 | +| Build profile | `./configure --enable-debug=no` (i.e. `-O3`, `NDEBUG`, no sanitizers) | |
| 26 | +| libmicrohttpd | 1.0.5 (only relevant to the ns/call measurement) | |
| 27 | + |
| 28 | +## Baseline values |
| 29 | + |
| 30 | +| Quantity | v1 value | Source | |
| 31 | +|---|---|---| |
| 32 | +| `sizeof(http_resource)` | 32 bytes | `v1_baseline/measure_v1_sizes.cpp` | |
| 33 | +| `sizeof(std::map<std::string,bool>)` | 24 bytes | `v1_baseline/measure_v1_sizes.cpp` | |
| 34 | +| `get_headers()` median ns/call (16 headers) | ~768 ns (committed: 760 ns, conservative) | `v1_baseline/measure_v1_get_headers.cpp` | |
| 35 | + |
| 36 | +The committed `V1_GET_HEADERS_NS_PER_CALL = 760.0` is the rounded |
| 37 | +**lower** end of the observed 756–784 ns range so the ratio |
| 38 | +assertion remains conservative under host jitter. |
| 39 | + |
| 40 | +## v2.0 measured values (re-run `make bench` to refresh) |
| 41 | + |
| 42 | +| Quantity | v2.0 value | Ratio vs v1 | |
| 43 | +|---|---|---| |
| 44 | +| `sizeof(http_resource)` | 16 bytes | 50% of v1 | |
| 45 | +| `get_headers()` median ns/call (16 headers) | ~3.3 ns | ~230× faster than v1 | |
| 46 | + |
| 47 | +Concretely: on the maintainer reference host, `make bench` printed |
| 48 | +`bench_get_headers v1=760.000ns v2=3.293ns ratio=230.76x` on |
| 49 | +`feature/v2.0` HEAD = `c71b0e8`. |
| 50 | + |
| 51 | +## Methodology — `bench_get_headers` |
| 52 | + |
| 53 | +- **Fixture:** `create_test_request().header("X-Bench-00","v00")… |
| 54 | + .header("X-Bench-15","v15").build()` (16 headers). |
| 55 | +- **Path under test:** `http_request::get_headers()` — returns a |
| 56 | + `const http::header_view_map&` to a per-request memoised cache. |
| 57 | + The first call hits the cold path in |
| 58 | + `http_request_impl::ensure_headerlike_cache` (`src/http_request.cpp:173`); |
| 59 | + it populates `headers_cached_` and sets `headers_cache_built_=true`. |
| 60 | + Every subsequent call returns the cached reference unchanged (the |
| 61 | + warm path). The bench measures the warm path because that is what |
| 62 | + the PRD claim is about: a real consumer reads headers many times |
| 63 | + per request, and v2.0's improvement is in the steady-state cost. |
| 64 | +- **Warmup:** 10,000 iterations of `get_headers()` to populate the |
| 65 | + cache, warm the icache and branch predictor. |
| 66 | +- **Measurement:** 11 outer repetitions × 1,000,000 inner iterations. |
| 67 | + Each outer rep is timed end-to-end with |
| 68 | + `std::chrono::steady_clock`. Per-call cost = elapsed_ns / 1,000,000. |
| 69 | +- **Reported number:** median over the 11 outer reps (`samples_ns[5]`, |
| 70 | + the true middle element of an odd-length sorted array). Median (not |
| 71 | + mean) for outlier robustness on shared CI runners. OUTER=11 (odd) |
| 72 | + ensures `samples_ns[OUTER/2]` is the unambiguous middle element with |
| 73 | + no tie-breaking needed. |
| 74 | +- **Sink:** each call's return reference is fed through |
| 75 | + `asm volatile("" : : "r,m"(&ref) : "memory")` to defeat |
| 76 | + dead-store elimination. |
| 77 | + |
| 78 | +### v1 side of the comparison |
| 79 | + |
| 80 | +v1's `get_headers()` does not have a per-request cache; every call |
| 81 | +runs `MHD_get_connection_values` against `underlying_connection`, |
| 82 | +which invokes the per-header callback that inserts into a |
| 83 | +fresh-stack-allocated `header_view_map` (`master:src/http_request.cpp:177`). |
| 84 | +The dominant cost is the 16 `std::map<std::string,std::string>` |
| 85 | +node allocations + 16 string copies + the std::map destructor. |
| 86 | + |
| 87 | +Because v1's `get_headers()` requires a live MHD connection (and |
| 88 | +running an MHD daemon for a microbench would conflate the per-call |
| 89 | +cost with the network round-trip), the baseline TU |
| 90 | +`v1_baseline/measure_v1_get_headers.cpp` stubs |
| 91 | +`MHD_get_connection_values` with a shim that invokes the v1 callback |
| 92 | +16 times against synthetic header pairs. The function body of v1's |
| 93 | +`get_headerlike_values` is a structural transcription of |
| 94 | +`master:src/http_request.cpp:177`. This faithfully reproduces v1's |
| 95 | +per-call cost (heap-allocate a tree of 16 nodes + copy strings + |
| 96 | +destroy) without conflating it with MHD network noise. |
| 97 | + |
| 98 | +## Methodology — `sizeof(http_resource)` check |
| 99 | + |
| 100 | +- **Mechanism:** compile-time `static_assert` in |
| 101 | + `test/bench_sizeof_http_resource.cpp`. |
| 102 | +- **Algebra:** the assertion encodes that removing the v1 |
| 103 | + `std::map<std::string, bool> method_state` member saves at least |
| 104 | + its empty footprint, less the size of the new `method_set` field |
| 105 | + that replaced it (rounded up to alignment): |
| 106 | + |
| 107 | + ```cpp |
| 108 | + static_assert(sizeof(http_resource) + V1_STD_MAP_STRING_BOOL_SIZEOF |
| 109 | + <= V1_HTTP_RESOURCE_SIZEOF + sizeof(method_set) * 2, |
| 110 | + "..."); |
| 111 | + ``` |
| 112 | +
|
| 113 | + With macOS / libc++ numbers (v1=32, map=24, v2=16, method_set=4): |
| 114 | + `16 + 24 = 40 <= 32 + 4*2 = 40` — passes tight. |
| 115 | +
|
| 116 | + With Linux / libstdc++ numbers (v1=56, map=48, v2=16, |
| 117 | + method_set=4): `16 + 48 = 64 <= 56 + 4*2 = 64` — also passes tight. |
| 118 | +
|
| 119 | +- A second `static_assert` requires `sizeof(http_resource) < |
| 120 | + V1_HTTP_RESOURCE_SIZEOF` (strict shrinkage) as belt-and-suspenders. |
| 121 | +
|
| 122 | +- If a future refactor reintroduces a per-resource heap container |
| 123 | + or grows the bitmask storage, the build breaks. This is the |
| 124 | + intended regression-guard. |
| 125 | +
|
| 126 | +### Why not the literal task formulation |
| 127 | +
|
| 128 | +TASK-039's action-item phrasing is: |
| 129 | +
|
| 130 | +``` |
| 131 | +static_assert(sizeof(http_resource) |
| 132 | + <= sizeof_v1_http_resource |
| 133 | + - sizeof(std::map<std::string, bool>)); |
| 134 | +``` |
| 135 | +
|
| 136 | +The literal formula fails on every stdlib because v2.0 introduces a |
| 137 | +small new member (`method_set methods_allowed_`) plus padding to the |
| 138 | +next pointer boundary. The reduction we achieved is `(v1_size - |
| 139 | +v2_size)`, which equals `sizeof(empty_map) - sizeof(method_set) - |
| 140 | +padding` ≈ map_size minus ~8 bytes. The corrected algebra above |
| 141 | +captures the actual contract ("the map went away") without |
| 142 | +papering over the new field's cost. |
| 143 | +
|
| 144 | +## How to re-run on this branch |
| 145 | +
|
| 146 | +```sh |
| 147 | +# From the build directory (must be release mode): |
| 148 | +cd build |
| 149 | +make bench |
| 150 | +
|
| 151 | +# Sample output (maintainer reference host): |
| 152 | +# === Running bench: bench_sizeof_http_resource === |
| 153 | +# === Running bench: bench_get_headers === |
| 154 | +# bench_get_headers v1=760.000ns v2=3.293ns ratio=230.76x ... |
| 155 | +# PASS: ratio 230.76x >= 10.0x |
| 156 | +``` |
| 157 | + |
| 158 | +The bench binaries are listed in `EXTRA_PROGRAMS`, not |
| 159 | +`check_PROGRAMS`, so `make all` and `make check` do not build or run |
| 160 | +them. Only `make bench` does. |
| 161 | + |
| 162 | +## How to re-measure v1 |
| 163 | + |
| 164 | +See [`test/v1_baseline/README.md`](v1_baseline/README.md). |
| 165 | + |
| 166 | +## Why bench is not part of `make check` |
| 167 | + |
| 168 | +- **Sanitizer matrix:** ASan / MSan / TSan / UBSan instrumentation |
| 169 | + inflates per-call cost 10–50×, which would either make v2.0 look |
| 170 | + slower than v1 (false negative) or make the ratio meaningless. |
| 171 | + The verify-build CI matrix runs `make check` under sanitizers; we |
| 172 | + keep `bench` out of that path so the matrix never reports a false |
| 173 | + ratio failure. The bench TU additionally guards itself with |
| 174 | + `__SANITIZE_*` / `__has_feature(*_sanitizer)` and prints "skipped" |
| 175 | + if invoked under sanitizers, so direct `./bench_get_headers` |
| 176 | + invocations on a sanitizer build are no-ops. |
| 177 | +- **Noise sensitivity:** running bench on every contributor laptop |
| 178 | + (or every CI runner under variable background load) would produce |
| 179 | + flaky CI. Release-readiness is gated on `make bench` succeeding |
| 180 | + once on a quiet release-mode host. The release runbook |
| 181 | + (TASK-040+) calls it. |
| 182 | + |
| 183 | +## Known noise sources / mitigations |
| 184 | + |
| 185 | +| Source | Mitigation | |
| 186 | +|---|---| |
| 187 | +| CPU frequency scaling | Pin with `taskset` (Linux) or run on AC power (macOS); not enforced. | |
| 188 | +| Page faults / first-touch | Warmup phase covers this. | |
| 189 | +| Other tenants on the host | Median (not mean) over 10 reps. | |
| 190 | +| libstdc++ ABI changes | If you upgrade GCC across a major version, re-run `measure_v1_sizes.cpp` and update the constants. | |
| 191 | +| libmicrohttpd callback overhead changes | If libmicrohttpd's `MHD_get_connection_values` signature or per-call cost changes substantially, the v1 baseline ns/call number may drift; re-run `measure_v1_get_headers.cpp`. | |
| 192 | + |
| 193 | +## Adding a new bench |
| 194 | + |
| 195 | +`test/Makefile.am` has the recipe at the bottom; in summary: |
| 196 | + |
| 197 | +1. Append the new program name to `bench_targets`. |
| 198 | +2. Add `<name>_SOURCES = ...` and `<name>_LDADD = ...` lines. |
| 199 | +3. Run `make bench` from the build directory. |
| 200 | + |
| 201 | +Document the v1 baseline (if any) and the methodology here. |
0 commit comments