perf(benchmarks): rigor overhaul — de-inflate baseline, DI scaling, reliability (v26.06.73) by ancongui · Pull Request #100 · fireflyframework/fireflyframework-pyfly

ancongui · 2026-06-07T19:49:52Z

Critical audit of benchmarks/run.py — measure what PyFly actually adds, keep ratios stable.

Baseline de-inflated. The old "bare Starlette ~436µs/req" was ~99% TestClient/httpx round-trip artifact (confirmed: direct ASGI call = ~4.8µs; per-request asyncio.run alone = ~129µs). Request benchmarks now drive the ASGI app directly (one event loop, no httpx). PyFly's filter-chain overhead is reported as an absolute ~+43µs/req over the real ~4.8µs base — not a % of an inflated number. TestClient cost is called out + excluded.
DI coverage expanded — transient with 1/3/5/10 deps + nested depth-3. The data shows resolution is linear (~+0.4µs/dep), no superlinearity.
Dependency baselines labelled — pydantic model_dump_json → [dep] (it's Pydantic, not PyFly); bare Starlette → [base].
Naming fixed — TransactionIdFilter documented as an MDC-style X-Transaction-Id correlation id, not @Transactional.
Reliability — warmup + 9 GC-disabled runs; reports median/best/p99/spread (±% ~1–3%, i.e. stable). Access log (sink-dependent I/O) excluded from the CPU number + documented separately.

README rewritten with methodology + honest numbers + a direct-ASGI per-filter decomposition. Gates: ruff+format (src tests benchmarks).

@transactional

…p labels, reliability + bump v26.06.73 Critical audit of benchmarks/run.py (measure PyFly, not the harness): - Baseline de-inflated: old 'bare Starlette ~436us' was ~99% TestClient/httpx round-trip artifact. Request benchmarks now drive ASGI directly (one loop, no httpx) -> real baseline ~4.8us; PyFly filter-chain overhead reported as absolute ~+43us/req over the REAL base. TestClient cost called out + excluded. - DI coverage: transient + 1/3/5/10 deps + nested depth-3 -> data shows linear ~+0.4us/dep. - Dependency labeling: pydantic model_dump_json tagged [dep]; bare Starlette [base]. - Naming: TransactionIdFilter documented as MDC-style X-Transaction-Id correlation id, NOT @transactional declarative transactions. - Reliability: warmup + 9 GC-disabled runs; report median/best/p99/spread (±% ~1-3%). Access log (sink-dependent I/O) excluded from the CPU number + documented separately. README rewritten with methodology + honest numbers + per-filter decomposition (direct ASGI).

… depth curve, fix rounding Review fixes to benchmarks/README.md + run.py: 1. The unexplained ~17us: create_app's default chain has 7 filters, not 5 — the table omitted the default-on MetricsFilter (~9.4us) + HttpExchangeRecorderFilter (~8.8us). Added both; per-filter cumulative now ~44.3us, reconciling to the +44us full-chain overhead. Confirmed the zero-filter machinery is genuinely free (5.1 vs 4.9us bare, direct ASGI). 2. Depth over-claim: 'depth costs the same as width per node' was wrong (one data point). Added nested depth-1/3/5 rows to run.py; data shows depth is linear at ~2.7us/node (transient construction rate), DISTINCT from width's ~0.4us/cached-dep rate. README now states both with data. 3. Rounding: '~5x' -> '~5.3x' (15.6->2.95us), consistent with the rest of the doc. Headline overhead aligned to the decomposition-backed ~44us. ruff+format clean.

Andrés Contreras Guillén added 2 commits June 7, 2026 21:49

ancongui merged commit 4a0c15f into main Jun 7, 2026
5 checks passed

ancongui deleted the perf/benchmark-overhaul branch June 7, 2026 20:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(benchmarks): rigor overhaul — de-inflate baseline, DI scaling, reliability (v26.06.73)#100

perf(benchmarks): rigor overhaul — de-inflate baseline, DI scaling, reliability (v26.06.73)#100
ancongui merged 2 commits into
mainfrom
perf/benchmark-overhaul

ancongui commented Jun 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ancongui commented Jun 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant