-
Notifications
You must be signed in to change notification settings - Fork 803
Use pytorch_tokenizer in coreml static runner #16606
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/16606
Note: Links to docs will display an error until the docs builds have been completed. ❌ 3 New Failures, 1 Pending, 2 Unrelated FailuresAs of commit b4ba65d with merge base 9510334 ( NEW FAILURES - The following jobs have failed:
FLAKY - The following job failed but was likely due to flakiness present on trunk:
UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
f025e17 to
162b667
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR updates the static runner to support HuggingFace tokenizers (like those used by Qwen models) by replacing the custom tokenizer wrapper with pytorch_tokenizers.get_tokenizer. Additionally, it fixes the RMSNorm usage in static attention to use the custom RMSNorm implementation instead of torch.nn.RMSNorm.
Changes:
- Replaced custom tokenizer wrapper with
pytorch_tokenizers.get_tokenizerto support HuggingFace tokenizers - Added
get_stop_tokenshelper function to handle different tokenizer interfaces - Changed
torch.nn.RMSNormto customRMSNormin StaticAttention initialization
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| examples/models/llama/static_attention.py | Replaces torch.nn.RMSNorm with custom RMSNorm import for QK normalization layers |
| examples/apple/coreml/llama/run_static_llm.py | Removes custom Tokenizer class and switches to pytorch_tokenizers library for broader tokenizer support |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
162b667 to
b4ba65d
Compare
Summary
Lora models created with unsloth use HFTokenizer, not supported by the static runner.
Test plan
Export llama1b lora model
Run llama1b lora model