UPSTREAM PR #1234: feat: add res_multistep, res_2s and bong tangent scheduler by loci-dev · Pull Request #37 · auroralabs-loci/stable-diffusion.cpp

loci-dev · 2026-01-29T18:47:29Z

Mirrored from leejet/stable-diffusion.cpp#1234

Add two more samplers res_multistep and res_2s with bong_tangent scheduler

Results with Z-Image base:

Scheduler	res_multisteps	res_2s	euler
bong_tangent
simple

Refrences:

res_multistep
res_2s
bong tangent

loci-review · 2026-01-29T19:41:51Z

Performance Review Report: Stable Diffusion C++ Implementation

Impact Classification: Minor Impact

Executive Summary

Analysis of 11 functions across stable-diffusion.cpp reveals compiler-driven optimizations with one minor feature addition. Changes show 10-1000ns absolute impacts in non-critical utility functions, with no effect on inference performance.

Key Findings

Commit Context: 4 commits by rmatif added two new sampling methods (RES Multistep, RES 2S) to expand user options from 14 to 18 available samplers. File changes: 7 modified, 3 added, 3 deleted.

Function Changes:

10 STL functions show compiler optimization differences without source changes
1 application function (sample method lookup lambda) added 4 new map entries, increasing response time by 762ns (+6.7%) with 70ns throughput improvement (+15%)

Most Impacted Functions:

std::_Hashtable::end(): Improved 162ns (-58% response time) - hashtable accessor optimization
std::_Bit_iterator::operator++: Degraded 108ns (+46% response time) but gained 175% throughput
nlohmann::json::diyfp::mul: Degraded 82ns (+39% response time) in JSON serialization

Performance Assessment:
All changes occur in initialization, API validation, and container management code—not in inference hot paths. Cumulative overhead per API request: ~680ns. Typical inference time: 1-10 seconds. Impact: <0.0001% of total execution time.

Power Consumption:
7 of 11 functions show throughput improvements (15-175%), indicating net positive power efficiency for concurrent workloads. Target version optimizes for multi-threaded server scenarios.

Justification:
The 762ns overhead for expanded sampling method support is negligible and justified by feature enhancement. Compiler optimizations in STL code reflect modern toolchain improvements prioritizing parallelism over single-operation latency.

See the complete breakdown in Version Insights
Have questions? Tag @loci-dev to ask about this PR.

rmatif added 4 commits January 29, 2026 18:48

add res multistep

5d7c1f8

add res_2s

9843170

add bong tangent

de617eb

fix res_2s model call

3ddab19

loci-dev temporarily deployed to stable-diffusion-cpp-prod January 29, 2026 18:47 — with GitHub Actions Inactive

loci-dev force-pushed the master branch 3 times, most recently from 0219cb4 to 17a1e1e Compare February 1, 2026 14:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UPSTREAM PR #1234: feat: add res_multistep, res_2s and bong tangent scheduler#37

UPSTREAM PR #1234: feat: add res_multistep, res_2s and bong tangent scheduler#37
loci-dev wants to merge 4 commits intomasterfrom
upstream-PR1234-branch_rmatif-more-samplers

loci-dev commented Jan 29, 2026

Uh oh!

loci-review bot commented Jan 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

loci-dev commented Jan 29, 2026

Uh oh!

loci-review bot commented Jan 29, 2026

Performance Review Report: Stable Diffusion C++ Implementation

Impact Classification: Minor Impact

Executive Summary

Key Findings

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants