Conversation
Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Adds support for the gpt-5.4 model name to the ML.NET TiktokenTokenizer model-to-encoding mapping (using O200kBase) and extends the unit tests to validate both exact-name and prefix/variant recognition.
Changes:
- Add
gpt-5.4exact-name mapping toO200kBaseand agpt-5.4-prefix mapping for variant recognition. - Extend
TiktokenTeststo covergpt-5.4and a variant (gpt-5.4-nano) in supported model name tests, and addgpt-5.4to the model-creation test data. - Add a static
GPT5_4tokenizer instance for parity with other GPT-5.x test fixtures.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| src/Microsoft.ML.Tokenizers/Model/TiktokenTokenizer.cs | Adds gpt-5.4 to both exact-name and prefix-based model mappings to O200kBase. |
| test/Microsoft.ML.Tokenizers.Tests/TiktokenTests.cs | Adds gpt-5.4 coverage to supported model name and creation tests, plus a GPT5_4 static tokenizer. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #7591 +/- ##
==========================================
- Coverage 69.54% 69.54% -0.01%
==========================================
Files 1484 1484
Lines 273206 273209 +3
Branches 27919 27919
==========================================
+ Hits 190012 190014 +2
- Misses 75831 75834 +3
+ Partials 7363 7361 -2
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
Summary
Validation
dotnet build /home/runner/work/machinelearning/machinelearning/src/Microsoft.ML.Tokenizers/Microsoft.ML.Tokenizers.csprojdotnet build /home/runner/work/machinelearning/machinelearning/test/Microsoft.ML.Tokenizers.Tests/Microsoft.ML.Tokenizers.Tests.csproj --no-restoredotnet test /home/runner/work/machinelearning/machinelearning/test/Microsoft.ML.Tokenizers.Tests/Microsoft.ML.Tokenizers.Tests.csproj --no-build --filter "FullyQualifiedName~Microsoft.ML.Tokenizers.Tests.TiktokenTests.TestAllSupportedModelNames|FullyQualifiedName~Microsoft.ML.Tokenizers.Tests.TiktokenTests.TestCreationUsingModel"Security Summary