‼️ If you want to work on this issue: please comment below and wait until a maintainer assigns this issue to you before opening a PR to avoid several contributions on the same issue. Thanks! 😊
✨ What You’ll Do
Right now, we have an implementation of flash attention 3 in Pruna. However, quantization has not been included. So, your task will be to include FP8 quantization if applies.
🤖 Useful Resources
✅ Acceptance Criteria
- It follows the style guidelines.
- Tests & Docs: All existing and new unit tests pass, and the documentation is updated
And don’t forget to give us a ⭐️!
❓ Questions?
Feel free to jump into the #contributing Discord channel if you hit any roadblocks. Can’t wait to see your contribution! 🚀
Share on Socials

✨ What You’ll Do
Right now, we have an implementation of flash attention 3 in Pruna. However, quantization has not been included. So, your task will be to include FP8 quantization if applies.
🤖 Useful Resources
src/pruna/algorithms/kernels/flash_attn3.py) along the lines of "quantize=True/False"✅ Acceptance Criteria
And don’t forget to give us a ⭐️!
❓ Questions?
Feel free to jump into the #contributing Discord channel if you hit any roadblocks. Can’t wait to see your contribution! 🚀
Share on Socials