Skip to content

Add gpt-5.4 to tiktoken tokenizer#7591

Merged
stephentoub merged 2 commits intomainfrom
copilot/add-gpt-5-4-to-tiktoken
Mar 12, 2026
Merged

Add gpt-5.4 to tiktoken tokenizer#7591
stephentoub merged 2 commits intomainfrom
copilot/add-gpt-5-4-to-tiktoken

Conversation

Copy link
Contributor

Copilot AI commented Mar 12, 2026

Summary

  • add gpt-5.4 to the TiktokenTokenizer GPT-5.x model mappings using O200kBase
  • recognize gpt-5.4 variant names through the existing prefix-based scheme, including variants such as gpt-5.4-nano
  • extend tokenizer tests to cover the new exact model name and variant recognition

Validation

  • dotnet build /home/runner/work/machinelearning/machinelearning/src/Microsoft.ML.Tokenizers/Microsoft.ML.Tokenizers.csproj
  • dotnet build /home/runner/work/machinelearning/machinelearning/test/Microsoft.ML.Tokenizers.Tests/Microsoft.ML.Tokenizers.Tests.csproj --no-restore
  • dotnet test /home/runner/work/machinelearning/machinelearning/test/Microsoft.ML.Tokenizers.Tests/Microsoft.ML.Tokenizers.Tests.csproj --no-build --filter "FullyQualifiedName~Microsoft.ML.Tokenizers.Tests.TiktokenTests.TestAllSupportedModelNames|FullyQualifiedName~Microsoft.ML.Tokenizers.Tests.TiktokenTests.TestCreationUsingModel"

Security Summary

  • Ran automated code review with no findings.
  • Ran CodeQL checker; no analyzable code changes were detected, so no security issues were reported.

Copilot AI and others added 2 commits March 12, 2026 16:30
Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
@stephentoub stephentoub marked this pull request as ready for review March 12, 2026 17:14
@stephentoub stephentoub requested review from Copilot and tarekgh March 12, 2026 17:14
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds support for the gpt-5.4 model name to the ML.NET TiktokenTokenizer model-to-encoding mapping (using O200kBase) and extends the unit tests to validate both exact-name and prefix/variant recognition.

Changes:

  • Add gpt-5.4 exact-name mapping to O200kBase and a gpt-5.4- prefix mapping for variant recognition.
  • Extend TiktokenTests to cover gpt-5.4 and a variant (gpt-5.4-nano) in supported model name tests, and add gpt-5.4 to the model-creation test data.
  • Add a static GPT5_4 tokenizer instance for parity with other GPT-5.x test fixtures.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
src/Microsoft.ML.Tokenizers/Model/TiktokenTokenizer.cs Adds gpt-5.4 to both exact-name and prefix-based model mappings to O200kBase.
test/Microsoft.ML.Tokenizers.Tests/TiktokenTests.cs Adds gpt-5.4 coverage to supported model name and creation tests, plus a GPT5_4 static tokenizer.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

@stephentoub stephentoub enabled auto-merge (squash) March 12, 2026 18:31
@codecov
Copy link

codecov bot commented Mar 12, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 69.54%. Comparing base (def7a4a) to head (f6e581b).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7591      +/-   ##
==========================================
- Coverage   69.54%   69.54%   -0.01%     
==========================================
  Files        1484     1484              
  Lines      273206   273209       +3     
  Branches    27919    27919              
==========================================
+ Hits       190012   190014       +2     
- Misses      75831    75834       +3     
+ Partials     7363     7361       -2     
Flag Coverage Δ
Debug 69.54% <100.00%> (-0.01%) ⬇️
production 63.81% <100.00%> (-0.01%) ⬇️
test 89.59% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...Microsoft.ML.Tokenizers/Model/TiktokenTokenizer.cs 81.18% <100.00%> (+0.04%) ⬆️
...est/Microsoft.ML.Tokenizers.Tests/TiktokenTests.cs 99.14% <100.00%> (+<0.01%) ⬆️

... and 3 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants