Use pytorch_tokenizer in coreml static runner #16606

lucylq · 2026-01-14T22:16:32Z

Summary

Lora models created with unsloth use HFTokenizer, not supported by the static runner.

Test plan

Export llama1b lora model

python export_static_llm_coreml.py \
      --checkpoint $LLAMA1B/original/consolidated.00.pth  \
      --params $LLAMA1B/original/params.json \
      --adapter_checkpoint $LLAMA1B/lora/adapter_model.safetensors \
      --adapter_config $LLAMA1B/lora/adapter_config.json \
      --output coreml-llama1b-lora.pte \
      --max_context_len 1024

Run llama1b lora model

(executorch) lfq@lfq-mbp llama % python run_static_llm.py \
    --model /Users/lfq/executorch/examples/apple/coreml/llama/coreml-llama1b-lora.pte \
    --params $LLAMA1B/original/params.json \
    --tokenizer $LLAMA1B/tokenizer.json \
    --tokenizer_config $LLAMA1B/tokenizer_config.json \
    --prompt "What is 15% of 80?" \
    --max_new_tokens 100 
W0114 16:23:07.644390 81771 site-packages/torch/distributed/elastic/multiprocessing/redirects.py:29] NOTE: Redirects are currently not supported in Windows or MacOs.
W0114 16:23:08.208112 81771 site-packages/torch/utils/flop_counter.py:45] triton not found; flop counting will not work for triton kernels
W0114 16:23:08.208413 81771 site-packages/torch/utils/flop_counter.py:45] triton not found; flop counting will not work for triton kernels
W0114 16:23:08.208502 81771 site-packages/torch/utils/flop_counter.py:45] triton not found; flop counting will not work for triton kernels
W0114 16:23:08.208578 81771 site-packages/torch/utils/flop_counter.py:45] triton not found; flop counting will not work for triton kernels
W0114 16:23:08.208648 81771 site-packages/torch/utils/flop_counter.py:45] triton not found; flop counting will not work for triton kernels
W0114 16:23:08.208904 81771 site-packages/torch/utils/flop_counter.py:45] triton not found; flop counting will not work for triton kernels
W0114 16:23:08.208997 81771 site-packages/torch/utils/flop_counter.py:45] triton not found; flop counting will not work for triton kernels
W0114 16:23:08.209072 81771 site-packages/torch/utils/flop_counter.py:45] triton not found; flop counting will not work for triton kernels
W0114 16:23:08.209149 81771 site-packages/torch/utils/flop_counter.py:45] triton not found; flop counting will not work for triton kernels
W0114 16:23:08.209216 81771 site-packages/torch/utils/flop_counter.py:45] triton not found; flop counting will not work for triton kernels
W0114 16:23:08.209280 81771 site-packages/torch/utils/flop_counter.py:45] triton not found; flop counting will not work for triton kernels
W0114 16:23:08.209353 81771 site-packages/torch/utils/flop_counter.py:45] triton not found; flop counting will not work for triton kernels
W0114 16:23:08.209419 81771 site-packages/torch/utils/flop_counter.py:45] triton not found; flop counting will not work for triton kernels
I tokenizers:regex.cpp:27] Registering override fallback regex
Model config: 16 layers, dim=2048
Input length: 32, Cache length: 992
Loading model from /Users/lfq/executorch/examples/apple/coreml/llama/coreml-llama1b-lora.pte...
[program.cpp:154] InternalConsistency verification requested but not available
[ETCoreMLModelManager.mm:474] Cache Hit: Successfully retrieved compiled model with identifier=executorch_2d7b5a72-14ac-4133-b35d-269dd19a3ed5_cpu_and_ne from the models cache.
[ETCoreMLModelManager.mm:474] Cache Hit: Successfully retrieved compiled model with identifier=executorch_9d2dc1da-5080-4b9c-a49d-031352db1b03_cpu_and_ne from the models cache.
[ETCoreMLModelManager.mm:474] Cache Hit: Successfully retrieved compiled model with identifier=executorch_4dd01157-98ab-4a24-b89b-abd4b98b1f3e_cpu_and_ne from the models cache.
Method metadata: num_inputs=36, num_outputs=33

Prompt: What is 15% of 80?
Prompt tokens: 10
--------------------------------------------------
Prefilling... done in 0.15s

What is 15% of 80? - 15% of 80 is equal to 12. The answer is 0.12. The answer is 0.12. The answer is 0.12. The answer is 0.12. The answer is 0.12. The answer is 0.12. The answer is 0.12. The answer is 0.12. The answer is 0.12. The answer is 0.12. The answer is 0.12
--------------------------------------------------
Prefill: 10 tokens in 0.15s
Decode: 100 tokens in 7.18s (13.92 tok/s)

pytorch-bot · 2026-01-14T22:16:35Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/16606

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 3 New Failures, 1 Pending, 2 Unrelated Failures

As of commit b4ba65d with merge base 9510334 ():

NEW FAILURES - The following jobs have failed:

pull / test-samsung-models-linux / linux-job (gh)
test_mv3_fp16
Test CUDA Builds / test-model-cuda-e2e (mistralai, Voxtral-Mini-3B-2507, quantized-int4-tile-packed) / linux-job (gh)
RuntimeError: Command docker exec -t 59fee8b3f9ce54c4f23d521d6cb08849258a2ac24cc7e888a65238038f94059e /exec failed with exit code 1
Test CUDA Builds / test-model-cuda-e2e (mistralai, Voxtral-Mini-3B-2507, quantized-int4-weight-only) / linux-job (gh)
RuntimeError: Command docker exec -t 45c90379bc89fc2b20c7ab5468b89d46a81a5a249a1d69c9bf154ee26a4a002f /exec failed with exit code 1

FLAKY - The following job failed but was likely due to flakiness present on trunk:

pull / test-models-linux (w2l, portable, linux.4xlarge.memory) / linux-job (gh) (detected as infra flaky with no log or failing log classifier)

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

pull / android / run-emulator (gh) (#16137)
Timeout waiting for emulator to boot.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-01-14T22:17:19Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Copilot

Pull request overview

This PR updates the static runner to support HuggingFace tokenizers (like those used by Qwen models) by replacing the custom tokenizer wrapper with pytorch_tokenizers.get_tokenizer. Additionally, it fixes the RMSNorm usage in static attention to use the custom RMSNorm implementation instead of torch.nn.RMSNorm.

Changes:

Replaced custom tokenizer wrapper with pytorch_tokenizers.get_tokenizer to support HuggingFace tokenizers
Added get_stop_tokens helper function to handle different tokenizer interfaces
Changed torch.nn.RMSNorm to custom RMSNorm in StaticAttention initialization

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
examples/models/llama/static_attention.py	Replaces torch.nn.RMSNorm with custom RMSNorm import for QK normalization layers
examples/apple/coreml/llama/run_static_llm.py	Removes custom Tokenizer class and switches to pytorch_tokenizers library for broader tokenizer support

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

examples/models/llama/static_attention.py

examples/apple/coreml/llama/run_static_llm.py

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 14, 2026

lucylq force-pushed the lfq.use-pytorch-tokenizer-static-runner branch from f025e17 to 162b667 Compare January 14, 2026 22:20

lucylq marked this pull request as ready for review January 14, 2026 22:20

lucylq requested review from cccclai and metascroy as code owners January 14, 2026 22:20

Copilot AI review requested due to automatic review settings January 14, 2026 22:20

Copilot started reviewing on behalf of lucylq January 14, 2026 22:21 View session

Copilot AI reviewed Jan 14, 2026

View reviewed changes

examples/models/llama/static_attention.py Outdated Show resolved Hide resolved

examples/apple/coreml/llama/run_static_llm.py Show resolved Hide resolved

Use pytorch_tokenizer in static runner

b4ba65d

lucylq force-pushed the lfq.use-pytorch-tokenizer-static-runner branch from 162b667 to b4ba65d Compare January 14, 2026 23:26

metascroy approved these changes Jan 15, 2026

View reviewed changes

This was referenced Jan 15, 2026

Add lora to static attention #16611

Open

Coreml lora #16564

Open

lucylq changed the title ~~Use pytorch_tokenizer in static runner~~ Use pytorch_tokenizer in coreml static runner Jan 15, 2026

lucylq merged commit 33974d5 into main Jan 15, 2026
310 of 323 checks passed

lucylq deleted the lfq.use-pytorch-tokenizer-static-runner branch January 15, 2026 18:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use pytorch_tokenizer in coreml static runner #16606

Use pytorch_tokenizer in coreml static runner #16606

Uh oh!

lucylq commented Jan 14, 2026 •

edited

Loading

Uh oh!

pytorch-bot bot commented Jan 14, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Jan 14, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Use pytorch_tokenizer in coreml static runner #16606

Use pytorch_tokenizer in coreml static runner #16606

Uh oh!

Conversation

lucylq commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

pytorch-bot bot commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/16606

❌ 3 New Failures, 1 Pending, 2 Unrelated Failures

Uh oh!

github-actions bot commented Jan 14, 2026

This PR needs a release notes: label

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lucylq commented Jan 14, 2026 •

edited

Loading

pytorch-bot bot commented Jan 14, 2026 •

edited

Loading

This PR needs a `release notes:` label