Skip to content

API Reference

mtecnic edited this page May 28, 2026 · 1 revision

API Reference

A pointer-style index of the public surface — what each module exposes and where to find it. For walk-through prose, see Architecture.


main.py

Symbol Type Purpose
AppState enum DISCOVERY · CHAT · STRESS_TEST · PROMPT_ARENA · ARENA · QUIT
ModelChatCLI class Main controller; holds state, selected_server, selected_model
ModelChatCLI.run coro Top-level dispatch loop
main() fn Entry point — asyncio.run(app.run())

scanner.py

Symbol Returns Purpose
COMMON_PORTS list[int] [11434, 1234, 5000, 8000, 8080]
CACHE_FILE Path ~/.model_chat_cache.json
check_endpoint(client, url, endpoint) dict | None GET an endpoint; returns JSON or None
probe_server(ip, port, client, semaphore) dict | None One IP:port probe; returns server dict
scan_network(progress_callback) list[dict] Full subnet scan
check_server_health(server) dict Adds status and response_time to server
save_cache(servers) None Writes cache JSON
load_cache() list[dict] | None Reads cache JSON
quick_validate_cache(servers, progress_callback) list[dict] Re-validates and filters healthy

Server dict shape:

{
    "ip": "10.0.1.42",
    "port": 11434,
    "url": "http://10.0.1.42:11434",
    "type": "openai" | "ollama",
    "models": ["model-name", …],
    "status": "discovered" | "healthy" | "error",
    "response_time": 12.4,          # ms, only after check_server_health
}

client.py

Symbol Purpose
ChatMetrics (dataclass) prompt_tokens, completion_tokens, prompt_eval_duration_ns, eval_duration_ns, ttft
ModelClient(server, model) The client
ModelClient.chat_stream(message, history=None, enable_thinking=None) Async generator of tokens
ModelClient.chat(...) Non-streaming wrapper that collects full response
ModelClient.last_metrics Most recent ChatMetrics (populated by chat_stream)

Streaming protocols:

  • OpenAI: POST /v1/chat/completions with stream: true, parses data: {...} SSE.
  • Ollama: POST /api/chat with stream: true, parses NDJSON.

stress_tester.py

Symbol Purpose
TestResult (dataclass) Per-request record — see Stress-Testing#shared-mechanics
TestStats (dataclass) Aggregate stats with percentiles + drift
StressTester(server, model, …) Engine
StressTester.run_throughput_test(concurrency, total, ...) Mode 1
StressTester.run_token_stress_test(...) Mode 2
StressTester.run_sustained_load_test(rpm, duration, ...) Mode 3
StressTester.run_consistency_test(n, ...) Mode 4
StressTester.run_realistic_user_test(arrival_rate, profile, ...) Mode 5
StressTester.run_tool_bench_test(tier, ...) Mode 6

All modes return a TestStats (and write a markdown summary to logs/).


tool_bench.py

Symbol Purpose
safe_calc(expression) AST-restricted calculator (no eval)
AgentTask (dataclass) One benchmark task definition
TaskScore (dataclass) Per-task result
Trajectory (dataclass) Full run trace: tool calls, errors, final answer
ToolCallRecord (dataclass) One tool invocation: name, args, result, malformed?
TASKS List of all 45 AgentTask instances
HARD_SUBSET, BRUTAL_SUBSET, REALISTIC_SUBSET Frozen sets of task IDs by tier
score_task(task, trajectory) Produces TaskScore
extract_numbers(text) Regex-extract numeric tokens (commas, scientific)
numeric_match(expected, actuals, rel_tol, abs_tol) Numeric comparison
word_present(word, text) Word-boundary regex match
call_matches(call, name, constraints) Dict-subset arg matcher

AgentTask key fields:

task_id: str
difficulty: str            # "Q" | "H" | "B" | "R" | "E"
prompt: str
expected_numbers: tuple[float, ...]
expected_words: tuple[tuple[str, ...], ...]    # tuples = synonyms
required_calls: tuple[tuple[str, dict], ...]
forbidden_tools: frozenset[str]
expect_zero_tools: bool
tool_use_required: bool
min_tool_calls: int
max_tool_calls: int | None
max_iterations: int
numeric_rel_tol: float
numeric_abs_tol: float

prompt_arena.py

Symbol Purpose
SYSTEM_PROMPTS Dict of {key: {name, prompt}} — 7 built-ins
TEST_QUESTIONS Categorized sample questions for multi-round mode
PromptResponse (dataclass) Generation result for one prompt
JudgeResult (dataclass) One pairwise judgment
ArenaMatchup (dataclass) One A-vs-B matchup with its JudgeResult
ArenaResult (dataclass) All matchups for one question
ArenaStats (dataclass) Aggregate wins / win-rate across rounds
PromptArena Engine class

think_parser.py

Symbol Purpose
split_thinking(text) Returns (thinking_str, content_str) from a full response
strip_thinking(text) Returns content_str only

Both recognize <think>…</think> tags. The streaming chat path uses these on the accumulated buffer; the stress tester uses them post-hoc when computing token counts.


logger.py

Symbol Purpose
setup_logger(name, log_dir="logs") Returns a logger with console+file handlers
log_request_error(logger, req_id, payload, error) Structured request-error entry
log_vllm_error(logger, response) vLLM-specific error context
log_test_summary(logger, stats) Summary block at end of a run

storage/history.py

Symbol Purpose
ChatHistoryManager() Save / load / search / delete conversations
ChatHistoryManager.save(messages, model, server) Write JSON to ~/.model_chat_history/
ChatHistoryManager.load(filename) Restore a conversation
ChatHistoryManager.list_all() Index of saved conversations
ChatHistoryManager.search(query) Substring search across saved files
ChatHistoryManager.delete(filename) Remove a conversation

Currently not wired into the UI — see Architecture#what-storagehistorypy-is-for.


ui/

Module Class Purpose
ui/theme.py APP_THEME Rich Theme with semantic color names
ui/components.py (functions) create_model_table, create_chat_message, render_markdown_with_code, estimate_tokens
ui/discovery.py DiscoveryView Scan + cache + model picker
ui/chat.py ChatView Streaming chat with metrics
ui/multi_arena.py MultiArenaView Multi-model arena (quick / battle / tournament)
ui/arena.py ArenaView Prompt-comparison arena
ui/stress_test.py StressTestView Stress test dashboard + tool-bench summary

Each view exposes an async def run() returning a string signal — see Architecture#1-state-machine-in-mainpy.

Clone this wiki locally