Skip to content

docs(perf): close out the despeckle optimization round with final baselines#40

Merged
P4suta merged 1 commit into
mainfrom
perf/despeckle-closeout-2
Jun 10, 2026
Merged

docs(perf): close out the despeckle optimization round with final baselines#40
P4suta merged 1 commit into
mainfrom
perf/despeckle-closeout-2

Conversation

@P4suta

@P4suta P4suta commented Jun 10, 2026

Copy link
Copy Markdown
Owner

Stacked on #33. Final PR of the despeckle round.

The round's result (200-page fixture, committed baselines)

metric before the round after Δ
conv (-j8) 14.48s 9.03s −37.6%
conv (-j1) 49.85s 35.63s −28.5%
despeckle stage (-j8) 10.19s 5.21s −49%
despeckle share of conv 71.6% 57.7%
clean() (pipeline path, single-threaded) 174.9ms 106.8ms −39%

Delivered by #31 (op-level benchmark), #32 (skip the metrics-only counting passes), #33 (exact DWA morphology routing).

Negative results, measured rather than assumed

  • smallHoles De Morgan flip: implemented, measured at zero effect, reverted. The inverted-select penalty (22.2ms vs 15.3ms) lives in pixConnComp's extraction of the giant background component, not in the re-render the flip eliminated.
  • Single-labeling fusion: not pursued. Same cost-center insight caps its realistic gain at the acceptance-gate boundary (−4..6% conv) against the round's largest FFM surface cost (PIXA/BOXA lifecycle).

If more is ever needed

The remaining clean() distribution is the selectBySize block (~70%); the next levers are the extraction cost inside pixSelectBySize, or the register stage (now 35% of conv).

🤖 Generated with Claude Code

Re-file of #34 from a fresh branch off main (the final regenerated baselines; the original was orphaned by stacked squash merges).

…elines

Regenerated both committed baselines on the full optimization stack:

- pipeline (200-page fixture): conv 9.03s at -j8 vs the original 14.48s
  baseline (-37.6%); despeckle stage 5.21s vs 10.19s (-49%), its share
  down from 71.6% to 57.7%. At -j1: conv 35.63s vs 49.85s (-28.5%).
- cleaner: clean() without component stats 106.8ms vs the 174.9ms
  round-opening baseline (-39%); the remaining distribution is the
  selectBySize block (~70%), dilate 11%, write/read ~7%.

Also records the round's negative results, measured rather than
assumed: the smallHoles De Morgan flip was implemented, measured at
zero effect (the inverted-select penalty lives in pixConnComp's
extraction of the giant background component, not in the re-render),
and reverted; the single-labeling fusion was not pursued (predicted
-4..6% conv sits at the acceptance gate with the round's largest FFM
surface cost). The next levers, should they ever be needed: the
extraction cost inside pixSelectBySize, or the register stage (now
35% of conv).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@P4suta P4suta merged commit 67c7bbd into main Jun 10, 2026
20 checks passed
@P4suta P4suta deleted the perf/despeckle-closeout-2 branch June 10, 2026 11:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant