UPSTREAM PR #1247: sd-server: set cfg_scale in the guidance parameters by loci-dev · Pull Request #49 · auroralabs-loci/stable-diffusion.cpp

loci-dev · 2026-02-03T06:47:38Z

Note

Source pull request: leejet/stable-diffusion.cpp#1247

What:
sd_xl_turbo_1.0_fp16 generates low-quality images.

Why:
cfg_scale is not being passed along from sd-server to stable-diffusion
sd_xl_turbo_1.0_fp16 requires setting cfg_scale to 1.0 for good result

How:
This update passes along cfg_scale from server's json request.value to stable-diffusion's gen_params.sample_params.guidance.txt_cfg parameter.

loci-review · 2026-02-03T07:54:09Z

Overview

Analysis of 48,154 functions across build.bin.sd-server and build.bin.sd-cli reveals net positive performance with 131 modified, 60 new, and 132 removed functions. Power consumption improved by -1.745% (512,977 nJ → 504,025 nJ) for build.bin.sd-server and -1.866% (479,167 nJ → 470,226 nJ) for build.bin.sd-cli. The single commit between versions modified guidance parameter handling (cfg_scale), unrelated to observed performance changes.

Function Analysis

ggml_compute_forward_flash_attn_ext_f16 (both binaries) shows exceptional improvement: response time decreased -47% (25,524 ns → 13,500 ns for server, 25,565 ns → 13,527 ns for CLI), saving ~12,000 ns per call. This performance-critical flash attention function is called hundreds of times per image generation across text encoding and denoising steps. The optimization originates from ggml submodule updates, not application code changes.

Multiple STL functions show significant regressions: __iter_equals_val (+237% response time, +185 ns), end() methods (+228% response time, +183 ns), and _S_key (+164% response time, +187 ns). These standard library functions experienced 200-300% throughput time increases likely due to compiler/toolchain differences rather than source code modifications. While percentages are high, absolute impacts are small (150-200 ns per call) and occur in non-critical paths.

path_str functions show +225% response time (+3,360 ns) but affect only initialization (backend registration), not inference hot paths. _M_destroy for Conv2d shows +180% throughput time (+189 ns) affecting object cleanup. make_shared instantiations show +113% throughput time but only +8-9% response time, indicating allocation overhead increases with minimal downstream impact.

Additional Findings

The flash attention optimization dominates the performance profile, providing 7-15 milliseconds improvement per image generation. This ML-critical function benefits text encoding (CLIP/T5) and all attention operations in U-Net/DiT/Flux architectures across 20-50 diffusion steps. The 47% speedup in this hot path far outweighs cumulative STL regressions (~20-50 microseconds in initialization, ~10-20 microseconds in inference), resulting in measurable end-to-end inference acceleration and reduced power consumption. The optimization scales with model complexity and resolution, providing greater benefits for larger models and higher-resolution generation.

🔎 Full breakdown: Loci Inspector.
💬 Questions? Tag @loci-dev.

set cfg_scale in the guidance parameters

40c27b9

loci-dev temporarily deployed to stable-diffusion-cpp-prod February 3, 2026 06:47 — with GitHub Actions Inactive

loci-dev force-pushed the main branch from 3c05029 to d870714 Compare February 3, 2026 07:24

loci-dev force-pushed the main branch 10 times, most recently from 76645dd to 5bbc590 Compare February 7, 2026 04:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UPSTREAM PR #1247: sd-server: set cfg_scale in the guidance parameters#49

UPSTREAM PR #1247: sd-server: set cfg_scale in the guidance parameters#49
loci-dev wants to merge 1 commit intomainfrom
loci/pr-1247-cfg_scale_pr

loci-dev commented Feb 3, 2026

Uh oh!

loci-review bot commented Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

loci-dev commented Feb 3, 2026

Uh oh!

loci-review bot commented Feb 3, 2026

Overview

Function Analysis

Additional Findings

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant