Skip to content

UPSTREAM PR #1234: feat: add res_multistep, res_2s and bong tangent scheduler#37

Open
loci-dev wants to merge 4 commits intomasterfrom
upstream-PR1234-branch_rmatif-more-samplers
Open

UPSTREAM PR #1234: feat: add res_multistep, res_2s and bong tangent scheduler#37
loci-dev wants to merge 4 commits intomasterfrom
upstream-PR1234-branch_rmatif-more-samplers

Conversation

@loci-dev
Copy link

Mirrored from leejet/stable-diffusion.cpp#1234

Add two more samplers res_multistep and res_2s with bong_tangent scheduler

Results with Z-Image base:

Scheduler res_multisteps res_2s euler
bong_tangent res_multisteps_bong res_2s_bong euler_bong
simple res_multistep_simple res_2s_simple euler_simple

Refrences:

res_multistep
res_2s
bong tangent

@loci-dev loci-dev temporarily deployed to stable-diffusion-cpp-prod January 29, 2026 18:47 — with GitHub Actions Inactive
@loci-review
Copy link

loci-review bot commented Jan 29, 2026

Performance Review Report: Stable Diffusion C++ Implementation

Impact Classification: Minor Impact

Executive Summary

Analysis of 11 functions across stable-diffusion.cpp reveals compiler-driven optimizations with one minor feature addition. Changes show 10-1000ns absolute impacts in non-critical utility functions, with no effect on inference performance.

Key Findings

Commit Context: 4 commits by rmatif added two new sampling methods (RES Multistep, RES 2S) to expand user options from 14 to 18 available samplers. File changes: 7 modified, 3 added, 3 deleted.

Function Changes:

  • 10 STL functions show compiler optimization differences without source changes
  • 1 application function (sample method lookup lambda) added 4 new map entries, increasing response time by 762ns (+6.7%) with 70ns throughput improvement (+15%)

Most Impacted Functions:

  • std::_Hashtable::end(): Improved 162ns (-58% response time) - hashtable accessor optimization
  • std::_Bit_iterator::operator++: Degraded 108ns (+46% response time) but gained 175% throughput
  • nlohmann::json::diyfp::mul: Degraded 82ns (+39% response time) in JSON serialization

Performance Assessment:
All changes occur in initialization, API validation, and container management code—not in inference hot paths. Cumulative overhead per API request: ~680ns. Typical inference time: 1-10 seconds. Impact: <0.0001% of total execution time.

Power Consumption:
7 of 11 functions show throughput improvements (15-175%), indicating net positive power efficiency for concurrent workloads. Target version optimizes for multi-threaded server scenarios.

Justification:
The 762ns overhead for expanded sampling method support is negligible and justified by feature enhancement. Compiler optimizations in STL code reflect modern toolchain improvements prioritizing parallelism over single-operation latency.

See the complete breakdown in Version Insights
Have questions? Tag @loci-dev to ask about this PR.

@loci-dev loci-dev force-pushed the master branch 3 times, most recently from 0219cb4 to 17a1e1e Compare February 1, 2026 14:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants