Skip to content

fix: correct image to image DDIM and TCD#1410

Open
wbruna wants to merge 7 commits intoleejet:masterfrom
wbruna:sd_fix_ddim_tcd_i2i
Open

fix: correct image to image DDIM and TCD#1410
wbruna wants to merge 7 commits intoleejet:masterfrom
wbruna:sd_fix_ddim_tcd_i2i

Conversation

@wbruna
Copy link
Copy Markdown
Contributor

@wbruna wbruna commented Apr 11, 2026

This supersedes my initial attempt at #665 , and has a much better chance of actually being correct 🙂

In a nutshell, there are two issues with DDIM and TCD:

  • they assume the initial image is pure noise, and apply a noise scaling which happens to be inversely proportional to the initial sigma value. This factor gets very close to 1 for the typical initial sigma for text to image, but it throws lower noise levels (e.g. in low-strength image to image) completely out of scale;
  • they have their own internal sigma schedules, but we rely on the input sigmas vector to set the right values for image to image.

This series remove that initial noise scaling, adapt the samplers to follow the provided sigma vectors, and sets appropriate default sigma schedulers: simple for DDIM, and LCM for TCD. I've tried to keep each commit self-contained, to be easier to follow and test.

Fixes #663 .

wbruna added 7 commits April 11, 2026 16:11
We don't have the noise component isolated during img2img at the
sampler's code, so move the initial scaling to the latent
initialization code.

Note that, for normal txt2img, this scale factor is very close
to 1 due to the large initial sigma. But it gets larger for
small sigmas, e.g. with low-strength i2i.
Also tweaks the criteria to check for the last iteration, to
make a follow-up change easier to read.
Instead of the arbitrarily-chosen first alpha_cumprod (which is
very close to 1), use a constant 1, which is the value actually
used at the start of the cumulative product calculations.

This also avoids the need to check for the last iteration when
scaling x for the next loop.
Apart from the rounding criteria, Simple is equivalent to the
hardcoded DDIM scheduler. So, drop the local compvis_sigmas and
alphas_cumprod tables, and recover the alpha_cumprod values from
the provided sigmas vector.

This partially fixes DDIM behavior with image to image, since
we rely on the input sigmas vector to provide the appropriate
noise levels. It also allows combining DDIM with different
schedulers.
LCM and TCD timesteps are identical, so adapt the TCD code to use
the provided sigmas vector, keeping the internal tables to obtain
the timesteps from the sigmas.

As with DDIM, this partially fixes TCD for image to image operations.

An alternative could be using the input sigmas to obtain the timestep
and prev_timestep values, then obtain new sigma and alpha_cumprod
values from the tables, keeping most of the code as-is. It'd be more
complex, though: input sigmas that happened to be too close to one
another could end up rounded to the same timestep value, and it's
not clear how to best guard against that.
As explained in a previous commit, the initial noise scaling
has very little effect for text to image; but for image to
image, the lower the denoising strength, the stronger it gets,
until the model isn't able to compensate for it.

I've placed this change at the end of the series to make it
easier to test the results.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

img2img not working with DDIM trailing or TCD

1 participant