-
Notifications
You must be signed in to change notification settings - Fork 0
API Reference
mtecnic edited this page May 28, 2026
·
1 revision
A pointer-style index of the public surface — what each module exposes and where to find it. For walk-through prose, see Architecture.
| Symbol | Type | Purpose |
|---|---|---|
AppState |
enum | DISCOVERY · CHAT · STRESS_TEST · PROMPT_ARENA · ARENA · QUIT |
ModelChatCLI |
class | Main controller; holds state, selected_server, selected_model
|
ModelChatCLI.run |
coro | Top-level dispatch loop |
main() |
fn | Entry point — asyncio.run(app.run())
|
| Symbol | Returns | Purpose |
|---|---|---|
COMMON_PORTS |
list[int] |
[11434, 1234, 5000, 8000, 8080] |
CACHE_FILE |
Path |
~/.model_chat_cache.json |
check_endpoint(client, url, endpoint) |
dict | None |
GET an endpoint; returns JSON or None |
probe_server(ip, port, client, semaphore) |
dict | None |
One IP:port probe; returns server dict |
scan_network(progress_callback) |
list[dict] |
Full subnet scan |
check_server_health(server) |
dict |
Adds status and response_time to server |
save_cache(servers) |
None |
Writes cache JSON |
load_cache() |
list[dict] | None |
Reads cache JSON |
quick_validate_cache(servers, progress_callback) |
list[dict] |
Re-validates and filters healthy |
Server dict shape:
{
"ip": "10.0.1.42",
"port": 11434,
"url": "http://10.0.1.42:11434",
"type": "openai" | "ollama",
"models": ["model-name", …],
"status": "discovered" | "healthy" | "error",
"response_time": 12.4, # ms, only after check_server_health
}| Symbol | Purpose |
|---|---|
ChatMetrics (dataclass) |
prompt_tokens, completion_tokens, prompt_eval_duration_ns, eval_duration_ns, ttft |
ModelClient(server, model) |
The client |
ModelClient.chat_stream(message, history=None, enable_thinking=None) |
Async generator of tokens |
ModelClient.chat(...) |
Non-streaming wrapper that collects full response |
ModelClient.last_metrics |
Most recent ChatMetrics (populated by chat_stream) |
Streaming protocols:
- OpenAI: POST
/v1/chat/completionswithstream: true, parsesdata: {...}SSE. - Ollama: POST
/api/chatwithstream: true, parses NDJSON.
| Symbol | Purpose |
|---|---|
TestResult (dataclass) |
Per-request record — see Stress-Testing#shared-mechanics |
TestStats (dataclass) |
Aggregate stats with percentiles + drift |
StressTester(server, model, …) |
Engine |
StressTester.run_throughput_test(concurrency, total, ...) |
Mode 1 |
StressTester.run_token_stress_test(...) |
Mode 2 |
StressTester.run_sustained_load_test(rpm, duration, ...) |
Mode 3 |
StressTester.run_consistency_test(n, ...) |
Mode 4 |
StressTester.run_realistic_user_test(arrival_rate, profile, ...) |
Mode 5 |
StressTester.run_tool_bench_test(tier, ...) |
Mode 6 |
All modes return a TestStats (and write a markdown summary to logs/).
| Symbol | Purpose |
|---|---|
safe_calc(expression) |
AST-restricted calculator (no eval) |
AgentTask (dataclass) |
One benchmark task definition |
TaskScore (dataclass) |
Per-task result |
Trajectory (dataclass) |
Full run trace: tool calls, errors, final answer |
ToolCallRecord (dataclass) |
One tool invocation: name, args, result, malformed? |
TASKS |
List of all 45 AgentTask instances |
HARD_SUBSET, BRUTAL_SUBSET, REALISTIC_SUBSET
|
Frozen sets of task IDs by tier |
score_task(task, trajectory) |
Produces TaskScore
|
extract_numbers(text) |
Regex-extract numeric tokens (commas, scientific) |
numeric_match(expected, actuals, rel_tol, abs_tol) |
Numeric comparison |
word_present(word, text) |
Word-boundary regex match |
call_matches(call, name, constraints) |
Dict-subset arg matcher |
AgentTask key fields:
task_id: str
difficulty: str # "Q" | "H" | "B" | "R" | "E"
prompt: str
expected_numbers: tuple[float, ...]
expected_words: tuple[tuple[str, ...], ...] # tuples = synonyms
required_calls: tuple[tuple[str, dict], ...]
forbidden_tools: frozenset[str]
expect_zero_tools: bool
tool_use_required: bool
min_tool_calls: int
max_tool_calls: int | None
max_iterations: int
numeric_rel_tol: float
numeric_abs_tol: float| Symbol | Purpose |
|---|---|
SYSTEM_PROMPTS |
Dict of {key: {name, prompt}} — 7 built-ins |
TEST_QUESTIONS |
Categorized sample questions for multi-round mode |
PromptResponse (dataclass) |
Generation result for one prompt |
JudgeResult (dataclass) |
One pairwise judgment |
ArenaMatchup (dataclass) |
One A-vs-B matchup with its JudgeResult
|
ArenaResult (dataclass) |
All matchups for one question |
ArenaStats (dataclass) |
Aggregate wins / win-rate across rounds |
PromptArena |
Engine class |
| Symbol | Purpose |
|---|---|
split_thinking(text) |
Returns (thinking_str, content_str) from a full response |
strip_thinking(text) |
Returns content_str only |
Both recognize <think>…</think> tags. The streaming chat path uses these on the accumulated buffer; the stress tester uses them post-hoc when computing token counts.
| Symbol | Purpose |
|---|---|
setup_logger(name, log_dir="logs") |
Returns a logger with console+file handlers |
log_request_error(logger, req_id, payload, error) |
Structured request-error entry |
log_vllm_error(logger, response) |
vLLM-specific error context |
log_test_summary(logger, stats) |
Summary block at end of a run |
| Symbol | Purpose |
|---|---|
ChatHistoryManager() |
Save / load / search / delete conversations |
ChatHistoryManager.save(messages, model, server) |
Write JSON to ~/.model_chat_history/
|
ChatHistoryManager.load(filename) |
Restore a conversation |
ChatHistoryManager.list_all() |
Index of saved conversations |
ChatHistoryManager.search(query) |
Substring search across saved files |
ChatHistoryManager.delete(filename) |
Remove a conversation |
Currently not wired into the UI — see Architecture#what-storagehistorypy-is-for.
| Module | Class | Purpose |
|---|---|---|
ui/theme.py |
APP_THEME |
Rich Theme with semantic color names |
ui/components.py |
(functions) |
create_model_table, create_chat_message, render_markdown_with_code, estimate_tokens
|
ui/discovery.py |
DiscoveryView |
Scan + cache + model picker |
ui/chat.py |
ChatView |
Streaming chat with metrics |
ui/multi_arena.py |
MultiArenaView |
Multi-model arena (quick / battle / tournament) |
ui/arena.py |
ArenaView |
Prompt-comparison arena |
ui/stress_test.py |
StressTestView |
Stress test dashboard + tool-bench summary |
Each view exposes an async def run() returning a string signal — see Architecture#1-state-machine-in-mainpy.
Getting started
Features
Internals
Operating