Enable unsloth as training backend#667
Enable unsloth as training backend#667FilippoBoni1921 wants to merge 9 commits intoPrimeIntellect-ai:mainfrom
Conversation
|
|
|
Definitely want to have a nice unsloth integration, and potentially open to adding something like this, but could we restructure this PR in a way where it's less breaking/intrusive? We shouldn't add unsloth as required dependency, or remove the existing 'rl' dependencies (maybe use a new dependency group?); config classes should live in the same 'rl' subfolder and follow the established style/dev patterns. In general, we're not planning on doing many feature changes to the included RLTrainer, which is intended as a "minimal demo trainer" that people can easily read + modify -- for performance + customizability, we generally recommend https://github.com/PrimeIntellect-ai/prime-rl, but it doesn't have all of the fancy Unsloth features for quantized models (and likely won't any time soon, below fp8 at least). Also confused by some of the other changes, which seem unrelated (e.g. https protocol option)? |
|
Thank you @willccbb for your reply! I know the PR was not very well structured. I did it fastly and started it as a draft. Waiting for your feedback. Let me know if I can help in other issue for 'verifiers' or 'prime-rl' |
Description
This PR has the goal to select unsloth as training backend in order to improve GPU usage efficiency
Type of Change
Testing
uv run pytestlocally.Checklist
Additional Notes