When training from scratch using my own dataset, after around 100k steps, the AE loss decreases gradually from 1.02×10⁵ to −3.41×10⁵, and then starts producing NaN values.
However, when loading the author’s pretrained model, the AE loss starts at −3.59×10⁵ and becomes NaN after only a few hundred steps.
I have two main questions:
- Why is the loss scale so large? Is this normal?
- What could be the possible causes of the NaN issue?
When training from scratch using my own dataset, after around 100k steps, the AE loss decreases gradually from 1.02×10⁵ to −3.41×10⁵, and then starts producing NaN values.
However, when loading the author’s pretrained model, the AE loss starts at −3.59×10⁵ and becomes NaN after only a few hundred steps.
I have two main questions: