-
Notifications
You must be signed in to change notification settings - Fork 0
Home
mtecnic edited this page May 28, 2026
·
2 revisions
╔════════════════════════════════════════════════════════════════════════╗
║ A terminal command center for local AI servers ║
║ Discover · Chat · Benchmark · Battle · Probe agentic capability ║
╚════════════════════════════════════════════════════════════════════════╝
Welcome. This wiki is the deep-dive companion to the README. The README answers what the tool does — the wiki answers how it works, why it was built that way, and how to bend it to your setup.
| If you want to… | Go to |
|---|---|
| Install and run for the first time | Installation · Quick-Start |
| Understand the network scan | Discovery |
| Read the streaming / TPS / TTFT internals | Chat |
| Compare models head-to-head | Arena |
| Find the best system prompt for your task | Prompt-Arena |
| Hammer a server with load | Stress-Testing |
| Score tool-calling capability | Tool-Calling-Benchmark |
| See how the pieces fit together | Architecture |
| Look up a function or class | API-Reference |
| Fix something that broke | Troubleshooting |
| Add a feature or a new server backend | Contributing |
- No telemetry. Nothing leaves your machine except the requests you make to your own servers.
- No cloud calls. Discovery, chat, arena, stress tests, and the tool bench all run against models you host.
-
OpenAI + Ollama protocols. Anything that speaks
/v1/chat/completionsor/api/chatworks. -
One config file.
~/.model_chat_cache.json— that's it.
-
The chat TPS number is decode-only. The timer starts on the first token, so the reported
t/sreflects actual generation speed, not wall-clock with TTFT mixed in. See Chat#performance-metrics. -
The tool bench has six tiers, not one.
Quick (7)is a smoke test.EXTREME (8)includes prompt-injection resistance and conflicting tool sources. See Tool-Calling-Benchmark#difficulty-tiers. - Realistic-user mode uses Poisson arrivals, not a fixed RPS. It models bursty real traffic with multi-turn growing context. See Stress-Testing#realistic-user.
- Arena tournament judging is suite-specific. Coding prompts are scored on correctness, not "creativity" or "tone". See Arena#tournament-judging.
- Discovery caches healthy servers. First scan is slow, every subsequent launch is instant — re-validation runs in parallel against the cache. See Discovery#caching.
MIT. Built by @mtecnic for people who actually run local models and want to know what they're doing.
Getting started
Features
Internals
Operating