API Reference

A pointer-style index of the public surface — what each module exposes and where to find it. For walk-through prose, see Architecture.

`main.py`

Symbol	Type	Purpose
`AppState`	enum	DISCOVERY · CHAT · STRESS_TEST · PROMPT_ARENA · ARENA · QUIT
`ModelChatCLI`	class	Main controller; holds `state`, `selected_server`, `selected_model`
`ModelChatCLI.run`	coro	Top-level dispatch loop
`main()`	fn	Entry point — `asyncio.run(app.run())`

`scanner.py`

Symbol	Returns	Purpose
`COMMON_PORTS`	`list[int]`	`[11434, 1234, 5000, 8000, 8080]`
`CACHE_FILE`	`Path`	`~/.model_chat_cache.json`
`check_endpoint(client, url, endpoint)`	`dict \| None`	GET an endpoint; returns JSON or None
`probe_server(ip, port, client, semaphore)`	`dict \| None`	One IP:port probe; returns server dict
`scan_network(progress_callback)`	`list[dict]`	Full subnet scan
`check_server_health(server)`	`dict`	Adds `status` and `response_time` to server
`save_cache(servers)`	`None`	Writes cache JSON
`load_cache()`	`list[dict] \| None`	Reads cache JSON
`quick_validate_cache(servers, progress_callback)`	`list[dict]`	Re-validates and filters healthy

Server dict shape:

{
    "ip": "10.0.1.42",
    "port": 11434,
    "url": "http://10.0.1.42:11434",
    "type": "openai" | "ollama",
    "models": ["model-name", …],
    "status": "discovered" | "healthy" | "error",
    "response_time": 12.4,          # ms, only after check_server_health
}

`client.py`

Symbol	Purpose
`ChatMetrics` (dataclass)	`prompt_tokens, completion_tokens, prompt_eval_duration_ns, eval_duration_ns, ttft`
`ModelClient(server, model)`	The client
`ModelClient.chat_stream(message, history=None, enable_thinking=None)`	Async generator of tokens
`ModelClient.chat(...)`	Non-streaming wrapper that collects full response
`ModelClient.last_metrics`	Most recent `ChatMetrics` (populated by `chat_stream`)

Streaming protocols:

OpenAI: POST /v1/chat/completions with stream: true, parses data: {...} SSE.
Ollama: POST /api/chat with stream: true, parses NDJSON.

`stress_tester.py`

Symbol	Purpose
`TestResult` (dataclass)	Per-request record — see Stress-Testing#shared-mechanics
`TestStats` (dataclass)	Aggregate stats with percentiles + drift
`StressTester(server, model, …)`	Engine
`StressTester.run_throughput_test(concurrency, total, ...)`	Mode 1
`StressTester.run_token_stress_test(...)`	Mode 2
`StressTester.run_sustained_load_test(rpm, duration, ...)`	Mode 3
`StressTester.run_consistency_test(n, ...)`	Mode 4
`StressTester.run_realistic_user_test(arrival_rate, profile, ...)`	Mode 5
`StressTester.run_tool_bench_test(tier, ...)`	Mode 6

All modes return a TestStats (and write a markdown summary to logs/).

`tool_bench.py`

Symbol	Purpose
`safe_calc(expression)`	AST-restricted calculator (no `eval`)
`AgentTask` (dataclass)	One benchmark task definition
`TaskScore` (dataclass)	Per-task result
`Trajectory` (dataclass)	Full run trace: tool calls, errors, final answer
`ToolCallRecord` (dataclass)	One tool invocation: name, args, result, malformed?
`TASKS`	List of all 45 `AgentTask` instances
`HARD_SUBSET`, `BRUTAL_SUBSET`, `REALISTIC_SUBSET`	Frozen sets of task IDs by tier
`score_task(task, trajectory)`	Produces `TaskScore`
`extract_numbers(text)`	Regex-extract numeric tokens (commas, scientific)
`numeric_match(expected, actuals, rel_tol, abs_tol)`	Numeric comparison
`word_present(word, text)`	Word-boundary regex match
`call_matches(call, name, constraints)`	Dict-subset arg matcher

AgentTask key fields:

task_id: str
difficulty: str            # "Q" | "H" | "B" | "R" | "E"
prompt: str
expected_numbers: tuple[float, ...]
expected_words: tuple[tuple[str, ...], ...]    # tuples = synonyms
required_calls: tuple[tuple[str, dict], ...]
forbidden_tools: frozenset[str]
expect_zero_tools: bool
tool_use_required: bool
min_tool_calls: int
max_tool_calls: int | None
max_iterations: int
numeric_rel_tol: float
numeric_abs_tol: float

`prompt_arena.py`

Symbol	Purpose
`SYSTEM_PROMPTS`	Dict of `{key: {name, prompt}}` — 7 built-ins
`TEST_QUESTIONS`	Categorized sample questions for multi-round mode
`PromptResponse` (dataclass)	Generation result for one prompt
`JudgeResult` (dataclass)	One pairwise judgment
`ArenaMatchup` (dataclass)	One A-vs-B matchup with its `JudgeResult`
`ArenaResult` (dataclass)	All matchups for one question
`ArenaStats` (dataclass)	Aggregate wins / win-rate across rounds
`PromptArena`	Engine class

`think_parser.py`

Symbol	Purpose
`split_thinking(text)`	Returns `(thinking_str, content_str)` from a full response
`strip_thinking(text)`	Returns `content_str` only

Both recognize <think>…</think> tags. The streaming chat path uses these on the accumulated buffer; the stress tester uses them post-hoc when computing token counts.

`logger.py`

Symbol	Purpose
`setup_logger(name, log_dir="logs")`	Returns a logger with console+file handlers
`log_request_error(logger, req_id, payload, error)`	Structured request-error entry
`log_vllm_error(logger, response)`	vLLM-specific error context
`log_test_summary(logger, stats)`	Summary block at end of a run

`storage/history.py`

Symbol	Purpose
`ChatHistoryManager()`	Save / load / search / delete conversations
`ChatHistoryManager.save(messages, model, server)`	Write JSON to `~/.model_chat_history/`
`ChatHistoryManager.load(filename)`	Restore a conversation
`ChatHistoryManager.list_all()`	Index of saved conversations
`ChatHistoryManager.search(query)`	Substring search across saved files
`ChatHistoryManager.delete(filename)`	Remove a conversation

Currently not wired into the UI — see Architecture#what-storagehistorypy-is-for.

`ui/`

Module	Class	Purpose
`ui/theme.py`	`APP_THEME`	Rich Theme with semantic color names
`ui/components.py`	(functions)	`create_model_table`, `create_chat_message`, `render_markdown_with_code`, `estimate_tokens`
`ui/discovery.py`	`DiscoveryView`	Scan + cache + model picker
`ui/chat.py`	`ChatView`	Streaming chat with metrics
`ui/multi_arena.py`	`MultiArenaView`	Multi-model arena (quick / battle / tournament)
`ui/arena.py`	`ArenaView`	Prompt-comparison arena
`ui/stress_test.py`	`StressTestView`	Stress test dashboard + tool-bench summary

Each view exposes an async def run() returning a string signal — see Architecture#1-state-machine-in-mainpy.

Model Chat CLI · MIT · repo · issues · No telemetry · No cloud calls · No surprises

Model Chat CLI

Getting started

Features

Internals

Operating

GitHub repo →

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API Reference

API Reference

`main.py`

`scanner.py`

`client.py`

`stress_tester.py`

`tool_bench.py`

`prompt_arena.py`

`think_parser.py`

`logger.py`

`storage/history.py`

`ui/`

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Model Chat CLI

Clone this wiki locally