A web application to compare different tokenization methods from various AI providers including OpenAI, HuggingFace, Google Gemini, and traditional NLP tokenizers.
- Input text and see how different tokenizers process it
- Compare results across multiple tokenization algorithms
- View token IDs, text representations, and token values
- Support for multiple tokenizer providers:
- OpenAI (cl100k, p50k, r50k)
- Google Gemini
- HuggingFace (BERT, GPT-2)
- Natural.js (Word, WordPunct, Treebank)
# Install dependencies
npm install
# Create a .env file with your API keys (see .env.example)
cp .env.example .env
# Start the development server
npm run devOpen http://localhost:3000 with your browser to see the result.
This project is configured for easy deployment on Railway.
- Fork this repository
- Create a new project on Railway and connect it to your GitHub repository
- Add the required environment variables (optional):
OPENAI_API_KEY- Your OpenAI API keyNEXT_PUBLIC_GEMINI_API_KEY- Your Google Gemini API keyHUGGINGFACE_API_TOKEN- Your HuggingFace API token
- Deploy the app
Railway will automatically build and deploy your application.
Here is an example deployment: My Railway Deployment .
- Next.js
- TypeScript
- Tailwind CSS
- OpenAI API
- Google Generative AI API
- HuggingFace Tokenizers
- Natural.js
To learn more about Next.js, take a look at the following resources:
- Next.js Documentation - learn about Next.js features and API.
- Learn Next.js - an interactive Next.js tutorial.
Apache 2.0