🧠 TokenKit

TokenKit — A professional .NET 8.0 library and CLI for tokenization, validation, cost estimation, and model registry management across multiple LLM providers (OpenAI, Anthropic, Gemini, etc.).

✨ Features

Category	Description
🔢 Tokenization	Analyze text or files and count tokens using multiple encoder engines (`simple`, `SharpToken`, `ML.Tokenizers`)
💰 Cost Estimation	Automatically calculate estimated API cost based on model metadata
✅ Prompt Validation	Validate prompt length against model context limits
🧩 Model Registry	Manage model metadata (`maxTokens`, pricing, encodings, providers) via JSON registry
⚙️ CLI & SDK	Use TokenKit as a .NET library or a global CLI tool
🧮 Multi-Encoder Support	Dynamically select tokenization engines via `--engine` flag
📦 Self-contained Data	Local registry stored in `Registry/models.data.json`, auto-updatable
🔍 Live Model Scraper	Optional OpenAI API key support to fetch real-time model data
📊 Structured Logging	All CLI commands logged to `tokenkit.log` with rotation (1MB max)
🤫 Quiet & JSON Modes	Machine-readable (`--json`) and silent (`--quiet`) output modes for automation
🎨 CLI Polish	Colorized output, ASCII banner, and improved user experience

⚙️ Installation

📦 As a Library (NuGet)

dotnet add package TokenKit

💻 As a Global CLI Tool

dotnet tool install -g TokenKit

🚀 Usage (All-in-One Guide)

🔹 Analyze Inline Text

tokenkit analyze "Hello from TokenKit!" --model gpt-4o

🔹 Analyze File Input

tokenkit analyze prompt.txt --model gpt-4o

🔹 Pipe Input (stdin)

echo "This is piped text input" | tokenkit analyze --model gpt-4o

Example Output:

{
  "Model": "gpt-4o",
  "Provider": "OpenAI",
  "TokenCount": 4,
  "EstimatedCost": 0.00002,
  "Valid": true
}

🔹 Validate Prompt Length

tokenkit validate "A very long prompt to validate" --model gpt-4o

{
  "IsValid": true,
  "Message": "OK"
}

🔹 List Registered Models

tokenkit models list

Filter by Provider

tokenkit models list --provider openai

JSON Output

tokenkit models list --json

🔹 Update Model Data

Default Update (Offline Fallback)

tokenkit update-models

Using OpenAI API Key

tokenkit update-models --openai-key sk-xxxx

From JSON (stdin)

cat newmodels.json | tokenkit update-models

Example Input:

[
  {
    "Id": "gpt-4o-mini",
    "Provider": "OpenAI",
    "MaxTokens": 64000,
    "InputPricePer1K": 0.002,
    "OutputPricePer1K": 0.01,
    "Encoding": "cl100k_base"
  }
]

🔹 Scrape Latest Model Data (Preview)

tokenkit scrape-models --openai-key sk-xxxx

If no key is provided, TokenKit uses the local offline model registry.

Example Output:

🔍 Fetching latest OpenAI model data...
✅ Retrieved 3 models:
  - OpenAI: gpt-4o (128000 tokens)
  - OpenAI: gpt-4o-mini (64000 tokens)
  - OpenAI: gpt-3.5-turbo (4096 tokens)

🔹 CLI Output Modes

JSON Mode

tokenkit analyze "Hello" --model gpt-4o --json

Outputs pure JSON:

{
  "Model": "gpt-4o",
  "Provider": "OpenAI",
  "TokenCount": 7,
  "EstimatedCost": 0.000105,
  "Engine": "simple",
  "Valid": true
}

Quiet Mode

tokenkit analyze "Silent test" --model gpt-4o --quiet

No console output. Log entry saved to tokenkit.log.

🧩 Programmatic SDK Example

using TokenKit.Registry;
using TokenKit.Services;

var model = ModelRegistry.Get("gpt-4o");
var tokenizer = new TokenizerService();

var result = tokenizer.Analyze("Hello from TokenKit!", model!.Id);
var cost = CostEstimator.Estimate(model, result.TokenCount);

Console.WriteLine($"Tokens: {result.TokenCount}, Cost: ${cost}");

📦 Model Registry

TokenKit stores all model metadata in:

Registry/models.data.json

Each entry includes:

{
  "Id": "gpt-4o",
  "Provider": "OpenAI",
  "MaxTokens": 128000,
  "InputPricePer1K": 0.005,
  "OutputPricePer1K": 0.015,
  "Encoding": "cl100k_base"
}

🧪 Testing & Quality Assurance

TokenKit maintains 100% test coverage using xUnit and Codecov.

Run tests locally:

dotnet test --collect:"XPlat Code Coverage"

🧭 Future Enhancements

Feature	Description
🌐 Extended Provider Support	Add Gemini, Claude, and Mistral integrations
💾 Persistent Config Profiles	Store model defaults and pricing overrides per project
🧮 Batch Analysis	Analyze multiple files or prompts in a single command
📊 Report Generation	Export CSV/JSON summaries of token usage and estimated cost
🧠 LLM-Aware Cost Planner	Simulate conversation cost across multi-turn dialogues
🧩 IDE Integrations	VS Code and JetBrains plugins for inline token analysis
⚙️ Custom Encoders	Support community-built encoders and language models

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🧠 TokenKit

✨ Features

⚙️ Installation

📦 As a Library (NuGet)

💻 As a Global CLI Tool

🚀 Usage (All-in-One Guide)

🔹 Analyze Inline Text

🔹 Analyze File Input

🔹 Pipe Input (stdin)

🔹 Validate Prompt Length

🔹 List Registered Models

Filter by Provider

JSON Output

🔹 Update Model Data

Default Update (Offline Fallback)

Using OpenAI API Key

From JSON (stdin)

🔹 Scrape Latest Model Data (Preview)

🔹 CLI Output Modes

JSON Mode

Quiet Mode

🧩 Programmatic SDK Example

📦 Model Registry

🧪 Testing & Quality Assurance

🧭 Future Enhancements

💡 License

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

🧠 TokenKit

✨ Features

⚙️ Installation

📦 As a Library (NuGet)

💻 As a Global CLI Tool

🚀 Usage (All-in-One Guide)

🔹 Analyze Inline Text

🔹 Analyze File Input

🔹 Pipe Input (stdin)

🔹 Validate Prompt Length

🔹 List Registered Models

Filter by Provider

JSON Output

🔹 Update Model Data

Default Update (Offline Fallback)

Using OpenAI API Key

From JSON (stdin)

🔹 Scrape Latest Model Data (Preview)

🔹 CLI Output Modes

JSON Mode

Quiet Mode

🧩 Programmatic SDK Example

📦 Model Registry

🧪 Testing & Quality Assurance

🧭 Future Enhancements

💡 License