Test Mosaic Tutorial Post 2.11#3809
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/tutorials/3809
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 78d59fe with merge base cc4874c ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Switches gradient_checkpointing_enable() to use non-reentrant checkpointing, which properly preserves dropout RNG state during recomputation and resolves the SystemError during loss.backward(). Issue: #3774
This reverts commit 6e486af.
Disable dropout (resid_pdrop=0, attn_pdrop=0, embd_pdrop=0) in the run_training_ac function to avoid SystemError from _VF.dropout returning NULL during backward recomputation of GPT2Block. Dropout is irrelevant to the memory profiling purpose of this tutorial. Issue: #3774
|
@basilwong here is an attempt to get this to work. It is a bandaid at best, still needs an actual fix |
|
Hi @sekyondaMeta! Thank you for your pull request. We require contributors to sign our Contributor License Agreement, and yours needs attention. You currently have a record in our system, but the CLA is no longer valid, and will need to be resubmitted. ProcessIn order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks! |
Test mosaic tutorial post 2.11 release
Disabled GPT-2 dropout (resid_pdrop=0, attn_pdrop=0, embd_pdrop=0) in run_training_ac() to work around a PyTorch 2.11 bug where the CUDA dropout kernel crashes during gradient checkpointing recomputation (#3774). Dropout has no impact on this tutorial's purpose of memory profiling with Mosaic.
cc @basilwong