Test the speed of OpenAI Chat and Anthropic format models, measuring first token latency and tokens per second.
- Support OpenAI Chat and Anthropic streaming APIs
- Measure first token latency (ms) and tokens per second
- Detect invalid models early via content-type check
- Support
reasoning_contentfield for reasoning models - Export results to CSV
test_prompt: "Please introduce yourself"
models:
- base_url: "https://api.openai.com/v1"
type: "openai-chat"
models:
- "gpt-4o"
- "gpt-4o-mini"
api_key: "sk-xxx"
- base_url: "https://api.anthropic.com"
type: "anthropic"
models:
- "claude-3-5-sonnet-20241022"
api_key: "sk-ant-xxx"pip install openai anthropic pyyaml
python llmspeed.pyUse a custom config file:
python llmspeed.py my_config.yamlResults are saved to results.csv with the following columns:
| Column | Description |
|---|---|
base_url |
API endpoint |
type |
API type (openai-chat or anthropic) |
model |
Model name |
first_token_latency |
First token latency in ms (-1 if failed) |
tokens_per_second |
Tokens per second (-1 if failed) |