Some questions about the Hyperparameter settings

Great work! I would like to ask what the window_size and sequence_length are for the model during pre-training, fine-tuning, and testing large-seer. This seems to be important for model training. I have observed that the loss fluctuates during the early stages of training. I would like to know approximately how many epochs it takes for the loss to stabilise.