perf(benchmarks): rigor overhaul — de-inflate baseline, DI scaling, reliability (v26.06.73)#100
Merged
Merged
Conversation
added 2 commits
June 7, 2026 21:49
…p labels, reliability + bump v26.06.73 Critical audit of benchmarks/run.py (measure PyFly, not the harness): - Baseline de-inflated: old 'bare Starlette ~436us' was ~99% TestClient/httpx round-trip artifact. Request benchmarks now drive ASGI directly (one loop, no httpx) -> real baseline ~4.8us; PyFly filter-chain overhead reported as absolute ~+43us/req over the REAL base. TestClient cost called out + excluded. - DI coverage: transient + 1/3/5/10 deps + nested depth-3 -> data shows linear ~+0.4us/dep. - Dependency labeling: pydantic model_dump_json tagged [dep]; bare Starlette [base]. - Naming: TransactionIdFilter documented as MDC-style X-Transaction-Id correlation id, NOT @transactional declarative transactions. - Reliability: warmup + 9 GC-disabled runs; report median/best/p99/spread (±% ~1-3%). Access log (sink-dependent I/O) excluded from the CPU number + documented separately. README rewritten with methodology + honest numbers + per-filter decomposition (direct ASGI).
… depth curve, fix rounding Review fixes to benchmarks/README.md + run.py: 1. The unexplained ~17us: create_app's default chain has 7 filters, not 5 — the table omitted the default-on MetricsFilter (~9.4us) + HttpExchangeRecorderFilter (~8.8us). Added both; per-filter cumulative now ~44.3us, reconciling to the +44us full-chain overhead. Confirmed the zero-filter machinery is genuinely free (5.1 vs 4.9us bare, direct ASGI). 2. Depth over-claim: 'depth costs the same as width per node' was wrong (one data point). Added nested depth-1/3/5 rows to run.py; data shows depth is linear at ~2.7us/node (transient construction rate), DISTINCT from width's ~0.4us/cached-dep rate. README now states both with data. 3. Rounding: '~5x' -> '~5.3x' (15.6->2.95us), consistent with the rest of the doc. Headline overhead aligned to the decomposition-backed ~44us. ruff+format clean.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Critical audit of
benchmarks/run.py— measure what PyFly actually adds, keep ratios stable.TestClient/httpx round-trip artifact (confirmed: direct ASGI call = ~4.8µs; per-requestasyncio.runalone = ~129µs). Request benchmarks now drive the ASGI app directly (one event loop, no httpx). PyFly's filter-chain overhead is reported as an absolute ~+43µs/req over the real ~4.8µs base — not a % of an inflated number. TestClient cost is called out + excluded.pydantic model_dump_json→[dep](it's Pydantic, not PyFly); bare Starlette →[base].TransactionIdFilterdocumented as an MDC-styleX-Transaction-Idcorrelation id, not@Transactional.±%~1–3%, i.e. stable). Access log (sink-dependent I/O) excluded from the CPU number + documented separately.README rewritten with methodology + honest numbers + a direct-ASGI per-filter decomposition. Gates: ruff+format (src tests benchmarks).