Home

Model Chat CLI — Wiki

╔════════════════════════════════════════════════════════════════════════╗
║   A terminal command center for local AI servers                       ║
║   Discover · Chat · Benchmark · Battle · Probe agentic capability      ║
╚════════════════════════════════════════════════════════════════════════╝

Welcome. This wiki is the deep-dive companion to the README. The README answers what the tool does — the wiki answers how it works, why it was built that way, and how to bend it to your setup.

Start here

If you want to…	Go to
Install and run for the first time	Installation · Quick-Start
Understand the network scan	Discovery
Read the streaming / TPS / TTFT internals	Chat
Compare models head-to-head	Arena
Find the best system prompt for your task	Prompt-Arena
Hammer a server with load	Stress-Testing
Score tool-calling capability	Tool-Calling-Benchmark
See how the pieces fit together	Architecture
Look up a function or class	API-Reference
Fix something that broke	Troubleshooting
Add a feature or a new server backend	Contributing

Tier-zero facts

No telemetry. Nothing leaves your machine except the requests you make to your own servers.
No cloud calls. Discovery, chat, arena, stress tests, and the tool bench all run against models you host.
OpenAI + Ollama protocols. Anything that speaks /v1/chat/completions or /api/chat works.
One config file. ~/.model_chat_cache.json — that's it.

Five things people miss on first read

The chat TPS number is decode-only. The timer starts on the first token, so the reported t/s reflects actual generation speed, not wall-clock with TTFT mixed in. See Chat#performance-metrics.
The tool bench has six tiers, not one. Quick (7) is a smoke test. EXTREME (8) includes prompt-injection resistance and conflicting tool sources. See Tool-Calling-Benchmark#difficulty-tiers.
Realistic-user mode uses Poisson arrivals, not a fixed RPS. It models bursty real traffic with multi-turn growing context. See Stress-Testing#realistic-user.
Arena tournament judging is suite-specific. Coding prompts are scored on correctness, not "creativity" or "tone". See Arena#tournament-judging.
Discovery caches healthy servers. First scan is slow, every subsequent launch is instant — re-validation runs in parallel against the cache. See Discovery#caching.

License & origin

MIT. Built by @mtecnic for people who actually run local models and want to know what they're doing.

Model Chat CLI · MIT · repo · issues · No telemetry · No cloud calls · No surprises

Model Chat CLI

Getting started

Features

Internals

Operating

GitHub repo →

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home

Model Chat CLI — Wiki

Start here

Tier-zero facts

Five things people miss on first read

License & origin

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Model Chat CLI

Clone this wiki locally