Describe the bug
BERTUnfactoredDisambiguator.pretrained() clips last token of text when text contains the special character �.
MLEDisambiguator.pretrained() works fine and doesn't clip tokens.
To Reproduce
from camel_tools.disambig.mle import MLEDisambiguator
from camel_tools.disambig.bert import BERTUnfactoredDisambiguator
mle = MLEDisambiguator.pretrained()
brt = BERTUnfactoredDisambiguator.pretrained()
mle.disambiguate("أعضاء مجلس � الإدارة المحترمون يتفضلون".split())
brt.disambiguate("أعضاء مجلس � الإدارة المحترمون يتفضلون".split())
Expected behavior
To provide disambiguation for all tokens.
Screenshots

Desktop (please complete the following information):
- WSL2 under Windows 11 home.
- WSL version: 2.5.9.0
- Kernel version: 6.6.87.2-1
- WSLg version: 1.0.66
- MSRDC version: 1.2.6074
- Direct3D version: 1.611.1-81528511
- DXCore version: 10.0.26100.1-240331-1435.ge-release
- Windows version: 10.0.26100.4349
- Python version: 3.12
- CAMeL Tools version as well as installation source (pip, conda, source). camel_tools.version '1.5.6' using pip
Additional context
I understand it's not an expected character in Arabic text. But it happens occasionally in automated workflows. Thank you.
Describe the bug
BERTUnfactoredDisambiguator.pretrained() clips last token of text when text contains the special character �.
MLEDisambiguator.pretrained() works fine and doesn't clip tokens.
To Reproduce
Expected behavior
To provide disambiguation for all tokens.
Screenshots
Desktop (please complete the following information):
Additional context
I understand it's not an expected character in Arabic text. But it happens occasionally in automated workflows. Thank you.