Skip to content

Add get_special_tokens and is_special_token methods#1945

Open
ArthurZucker wants to merge 1 commit intomainfrom
feature/special-tokens-api
Open

Add get_special_tokens and is_special_token methods#1945
ArthurZucker wants to merge 1 commit intomainfrom
feature/special-tokens-api

Conversation

@ArthurZucker
Copy link
Copy Markdown
Collaborator

Summary

  • Add get_special_tokens() method to return the list of special tokens
  • Add is_special_token(token) method to check if a token is a special token
  • Expose these methods on both AddedVocabulary and Tokenizer
  • Add Python bindings with type hints
  • Add comprehensive tests

Motivation

These methods expose the internal special_tokens_set from AddedVocabulary, making it easier to:

  • Inspect which tokens are marked as special
  • Programmatically check if a given token is special
  • Debug tokenizer behavior related to special token handling

Test plan

  • Rust unit tests pass (cargo test --lib)
  • Python bindings tests pass (pytest tests/bindings/test_tokenizer.py)

🤖 Generated with Claude Code

Add methods to query special tokens:
- get_special_tokens(): returns the list of special tokens
- is_special_token(token): checks if a token is a special token

These methods expose the internal special_tokens_set from AddedVocabulary,
making it easier to inspect and work with special tokens programmatically.

Includes Python bindings and tests.
@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants