Skip to content

perf(benchmarks): rigor overhaul — de-inflate baseline, DI scaling, reliability (v26.06.73)#100

Merged
ancongui merged 2 commits into
mainfrom
perf/benchmark-overhaul
Jun 7, 2026
Merged

perf(benchmarks): rigor overhaul — de-inflate baseline, DI scaling, reliability (v26.06.73)#100
ancongui merged 2 commits into
mainfrom
perf/benchmark-overhaul

Conversation

@ancongui
Copy link
Copy Markdown
Contributor

@ancongui ancongui commented Jun 7, 2026

Critical audit of benchmarks/run.py — measure what PyFly actually adds, keep ratios stable.

  1. Baseline de-inflated. The old "bare Starlette ~436µs/req" was ~99% TestClient/httpx round-trip artifact (confirmed: direct ASGI call = ~4.8µs; per-request asyncio.run alone = ~129µs). Request benchmarks now drive the ASGI app directly (one event loop, no httpx). PyFly's filter-chain overhead is reported as an absolute ~+43µs/req over the real ~4.8µs base — not a % of an inflated number. TestClient cost is called out + excluded.
  2. DI coverage expanded — transient with 1/3/5/10 deps + nested depth-3. The data shows resolution is linear (~+0.4µs/dep), no superlinearity.
  3. Dependency baselines labelledpydantic model_dump_json[dep] (it's Pydantic, not PyFly); bare Starlette → [base].
  4. Naming fixedTransactionIdFilter documented as an MDC-style X-Transaction-Id correlation id, not @Transactional.
  5. Reliability — warmup + 9 GC-disabled runs; reports median/best/p99/spread (±% ~1–3%, i.e. stable). Access log (sink-dependent I/O) excluded from the CPU number + documented separately.

README rewritten with methodology + honest numbers + a direct-ASGI per-filter decomposition. Gates: ruff+format (src tests benchmarks).

Andrés Contreras Guillén added 2 commits June 7, 2026 21:49
…p labels, reliability + bump v26.06.73

Critical audit of benchmarks/run.py (measure PyFly, not the harness):
- Baseline de-inflated: old 'bare Starlette ~436us' was ~99% TestClient/httpx round-trip artifact.
  Request benchmarks now drive ASGI directly (one loop, no httpx) -> real baseline ~4.8us; PyFly
  filter-chain overhead reported as absolute ~+43us/req over the REAL base. TestClient cost called
  out + excluded.
- DI coverage: transient + 1/3/5/10 deps + nested depth-3 -> data shows linear ~+0.4us/dep.
- Dependency labeling: pydantic model_dump_json tagged [dep]; bare Starlette [base].
- Naming: TransactionIdFilter documented as MDC-style X-Transaction-Id correlation id, NOT
  @transactional declarative transactions.
- Reliability: warmup + 9 GC-disabled runs; report median/best/p99/spread (±% ~1-3%). Access log
  (sink-dependent I/O) excluded from the CPU number + documented separately.
README rewritten with methodology + honest numbers + per-filter decomposition (direct ASGI).
… depth curve, fix rounding

Review fixes to benchmarks/README.md + run.py:
1. The unexplained ~17us: create_app's default chain has 7 filters, not 5 — the table omitted the
   default-on MetricsFilter (~9.4us) + HttpExchangeRecorderFilter (~8.8us). Added both; per-filter
   cumulative now ~44.3us, reconciling to the +44us full-chain overhead. Confirmed the zero-filter
   machinery is genuinely free (5.1 vs 4.9us bare, direct ASGI).
2. Depth over-claim: 'depth costs the same as width per node' was wrong (one data point). Added
   nested depth-1/3/5 rows to run.py; data shows depth is linear at ~2.7us/node (transient
   construction rate), DISTINCT from width's ~0.4us/cached-dep rate. README now states both with data.
3. Rounding: '~5x' -> '~5.3x' (15.6->2.95us), consistent with the rest of the doc.

Headline overhead aligned to the decomposition-backed ~44us. ruff+format clean.
@ancongui ancongui merged commit 4a0c15f into main Jun 7, 2026
5 checks passed
@ancongui ancongui deleted the perf/benchmark-overhaul branch June 7, 2026 20:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant