fix: correct image to image DDIM and TCD#1410
Open
wbruna wants to merge 7 commits intoleejet:masterfrom
Open
Conversation
We don't have the noise component isolated during img2img at the sampler's code, so move the initial scaling to the latent initialization code. Note that, for normal txt2img, this scale factor is very close to 1 due to the large initial sigma. But it gets larger for small sigmas, e.g. with low-strength i2i.
Also tweaks the criteria to check for the last iteration, to make a follow-up change easier to read.
Instead of the arbitrarily-chosen first alpha_cumprod (which is very close to 1), use a constant 1, which is the value actually used at the start of the cumulative product calculations. This also avoids the need to check for the last iteration when scaling x for the next loop.
Apart from the rounding criteria, Simple is equivalent to the hardcoded DDIM scheduler. So, drop the local compvis_sigmas and alphas_cumprod tables, and recover the alpha_cumprod values from the provided sigmas vector. This partially fixes DDIM behavior with image to image, since we rely on the input sigmas vector to provide the appropriate noise levels. It also allows combining DDIM with different schedulers.
LCM and TCD timesteps are identical, so adapt the TCD code to use the provided sigmas vector, keeping the internal tables to obtain the timesteps from the sigmas. As with DDIM, this partially fixes TCD for image to image operations. An alternative could be using the input sigmas to obtain the timestep and prev_timestep values, then obtain new sigma and alpha_cumprod values from the tables, keeping most of the code as-is. It'd be more complex, though: input sigmas that happened to be too close to one another could end up rounded to the same timestep value, and it's not clear how to best guard against that.
As explained in a previous commit, the initial noise scaling has very little effect for text to image; but for image to image, the lower the denoising strength, the stronger it gets, until the model isn't able to compensate for it. I've placed this change at the end of the series to make it easier to test the results.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This supersedes my initial attempt at #665 , and has a much better chance of actually being correct 🙂
In a nutshell, there are two issues with DDIM and TCD:
This series remove that initial noise scaling, adapt the samplers to follow the provided sigma vectors, and sets appropriate default sigma schedulers: simple for DDIM, and LCM for TCD. I've tried to keep each commit self-contained, to be easier to follow and test.
Fixes #663 .