Overview
- Project::
Token_Analyzer— a lightweight Streamlit app for counting and estimating tokens and cost for several LLM models. - Primary file::
app.py— Streamlit UI and token-count logic.
What it does
- Loads text or a prompt (system + user), tokenizes it using
tiktokenwhen an exact tokenizer is available, and shows:- Token count
- Token IDs and decoded token text (when available)
- Estimated cost for the selected model
- A fallback token-estimate heuristic for models where exact tokenizers are not available
Supported models (from app.py)
- OpenAI (exact token counts when
tiktokensupports the tokenizer):GPT-4o(gpt-4o) — price per 1M: $5.00GPT-4o Mini(gpt-4o-mini) — price per 1M: $0.15GPT-4 Turbo (Legacy)(gpt-4-turbo) — tokenizer:cl100k_base, price per 1M: $10.00GPT-3.5 Turbo(gpt-3.5-turbo) — tokenizer:cl100k_base, price per 1M: $0.50
- Google / Anthropic (estimated tokens — app uses a heuristic):
Gemini 1.5 Flash— estimated price per 1M: $0.35Gemini 1.5 Pro— estimated price per 1M: $3.50Claude 3.5 Sonnet— estimated price per 1M: $3.00Claude 3 Haiku— estimated price per 1M: $0.25
Tokenization behavior
- If the selected model's
MODEL_CONFIGhasexact_tokens: True, the app will attempt to use thetiktokenencoding returned bytiktoken.get_encoding(...)ortiktoken.encoding_for_model(...)(viaget_tokenizer()inapp.py) to calculate an exact token count and display token breakdowns. - For models marked
exact_tokens: False, the app uses a simple heuristic: each token ≈ 4 characters (i.e.,len(text) // 4) as a conservative estimate and displays a warning that the value is estimated.
Cost formula used in the app:
cost = (token_count / 1_000_000) * price_per_1m
Setup and Run
Prerequisites
- Python 3.8+ (the app uses
streamlitandtiktoken). - A virtual environment is recommended.
Install dependencies
- Create and activate a virtual environment (PowerShell examples):
python -m venv .venv
.\.venv\Scripts\Activate.ps1- Install requirements:
pip install -r requirements.txtRun the app
streamlit run app.pyUsage
- Open the Streamlit URL shown in the terminal (usually
http://localhost:8501). - Enter your text in the main text box OR use the Prompt Playground to enter a
System Promptand aUser Prompt. The app will combine these when both are provided. - Select a model from the
Select Modeldropdown and clickAnalyze. - When
exact_tokensis True for the selected model, you'll see token IDs and a token breakdown. When False, you'll see an estimated token count and a warning. - If the selected model is an OpenAI model, there's an optional checkbox to compare token counts with the
gpt-4o-minitokenizer.
Troubleshooting
- If
tiktokenraises aKeyErrorwhile trying to resolve an encoding, the app attempts to calltiktoken.get_encoding(...)for known encoding names (likecl100k_base). If you encounter other encoding names thattiktokendoes not support, update theMODEL_CONFIGinapp.pyto provide a supported tokenizer or setexact_tokenstoFalsefor that model. - If Streamlit or
tiktokenfails to import, ensure your environment is activated andpip install -r requirements.txtcompleted without errors.
Files of interest
app.py— main Streamlit app that includesMODEL_CONFIG, input UI, tokenization & cost logic.requirements.txt— Python dependencies used by the project (install withpip).
Notes for maintainers / contributions
- The token model list and pricing is defined in the
MODEL_CONFIGdictionary insideapp.py. Update that dictionary to add/remove models or to change prices. - Keep the
exact_tokensflag aligned to whethertiktokenprovides a reliable tokenizer for the model. - If you add models with custom tokenizers, update
get_tokenizer()inapp.pyaccordingly.
License
- Will update soon
Contact / Further improvements
- Consider adding tests for the tokenizer fallback behavior and for cost calculations.
- Optionally add a small example
texts/folder containing sample prompts for quick testing.
Enjoy! — Open app.py and run the Streamlit app to analyze tokens for your prompts and text.