Added --blocks_to_swap_while_sampling, may allow faster sample image generation#2056
Added --blocks_to_swap_while_sampling, may allow faster sample image generation#2056araleza wants to merge 1 commit intokohya-ss:sd3from
Conversation
|
Maybe it would be better to not have the |
|
I'm sorry it took me so long to check. rockerBOO has a point. I think there may be a way to extend |
|
Thanks for the reviews, @rockerBOO and @kohya-ss . I'll take a look at the code soon and try to include your suggestions for improvement. :) |
I've recently switched over to doing Flux full fine tuning instead of LoRA training. But I've found that sample image generation while training is very slow. I'm using
--blocks_to_swap 35which lets me have a batch size of 5. This block-swapping persists during sample inference, increasing sample image generation time.The reason that block swapping is useful while training is because it saves a lot of VRAM, allowing a larger batch size, and for the memory needed for the optimizer's structures, e.g. momentum. But these are not required when doing sample image inference. If I open up nvtop I can see that my VRAM is mostly unused during this time.
This new option allows the number of blocks to swap to be set to a lower number while generating sample images. This may allow faster image generation. e.g. On my current setup where I'm using 50 sampling steps per image, putting
--blocks_to_swap_while_sampling 2reduces the time per image from around 3 minutes to around 1 minute 48 secs. That might not sound like a big difference at first, but if I generate around 100 images over a run, this saves around 2 hours in total.