Structure-aware code search that cuts LLM token burn by 37%.
Wonk indexes your codebase with Tree-sitter to understand code structure β definitions, call graphs, imports, and scopes β then ranks search results so definitions surface first and tests sort last. A built-in MCP server exposes 22 tools for AI coding assistants, and a background daemon keeps the index fresh. Single static binary, zero runtime dependencies.
Searching for Blueprint in the Flask repo -- ripgrep returns 225 lines of unsorted noise (changelogs, docs, tests, definitions all mixed together). Wonk returns the same matches ranked and deduplicated: definitions first, usages next, comments and tests last, with re-exports collapsed into (+N other locations) annotations.
| rg Blueprint (225 lines) | wonk search Blueprint (213 lines) |
|---|---|
|
|
Same data, structured for an LLM context window -- definitions surface instantly instead of buried on line 67.
LLM coding agents grep aggressively. A single query can stuff hundreds of noisy, unranked lines into the context window -- raw matches with no sense of what is a definition, what is a test, and what is a re-export. That is wasted tokens and wasted money.
Wonk pre-indexes your codebase with Tree-sitter so it understands code structure: definitions vs. usages, symbol kinds, scopes, imports, and dependencies. When you search, results come back ranked, deduplicated, and grouped by relevance -- definitions first, tests last. The index stays fresh via a background file watcher, and a built-in MCP server exposes 22 tools for AI coding assistants.
βββββββββββ ββββββββββ ββββββββββββββββββββββββββββ ββββββββββ ββββββββββ
β Query ββββ>β Router ββββ>β SQLite index ββββ>β Ranker ββββ>β Budget βββ> Output
β (CLI or β β β β (symbols, refs, imports, β β Def > β β--bud- β
β MCP) β β β β deps, call edges) β β Use > β β get N β
βββββββββββ β ββββ>β grep fallback ββββ>β Test β ββββββββββ
ββββββββββ ββββββββββββββββββββββββββββ ββββββββββ
β²
βββββββββββ΄βββββββββββ
β Daemon (notify) β
β keeps index fresh β
ββββββββββββββββββββββ
| Mode | What it does |
|---|---|
wonk search <pattern> |
Full-text search ranked by code structure. Definitions first, tests last, re-exports collapsed. Add --semantic for hybrid RRF fusion. |
wonk sym / sig / show / ref |
Direct symbol lookup. Find definitions, view signatures, read full source, or trace references β no regex needed. |
wonk ask <query> |
Semantic search via Ollama embeddings. Natural-language queries over code meaning. Requires Ollama + nomic-embed-text. |
Search
- Smart ranking: definitions first, tests last, re-exports deduplicated
- Semantic search via Ollama embeddings (
wonk ask) - Hybrid RRF fusion blends structural + semantic results (
--semantic)
Code intelligence
- Symbol lookup, signatures, and full source display (
sym,sig,show) - Call graph traversal: callers, callees, shortest call path
- Blast radius analysis with severity tiers and risk levels
- Execution flow tracing from entry points
- Changed symbol detection with blast/flow chaining
Architecture
- Single static binary -- SQLite, tree-sitter grammars, and grep engine bundled
- 12 languages: TypeScript/TSX, JavaScript, Python, Rust, Go, Java, C, C++, Ruby, PHP, C#
- Background daemon keeps index fresh via filesystem watcher
- Worktree isolation -- separate index per git worktree
- 22 MCP tools for AI coding assistants (JSON-RPC 2.0 over stdio)
- Token budget (
--budget N) caps output and preserves top-ranked results
25 code-understanding tasks across 5 real-world repos (ripgrep, tokio, httpx, pydantic, fastify), 5 runs each, median reported. Measures Claude Code token consumption with vs without wonk.
| Category | Baseline (avg) | Wonk (avg) | Reduction | Quality (BβW) |
|---|---|---|---|---|
| symbol_location | 100k | 61k | 33% | 0.85β0.85 |
| reference_tracing | 96k | 57k | 28% | 0.92β0.88 |
| architecture | 162k | 101k | 29% | 0.90β0.96 |
| multi_step | 143k | 104k | 23% | 0.93β0.93 |
| structural | 130k | 69k | 46% | 0.95β0.88 |
Overall: 37.4% total reduction (median per-task 29.7%, best 68.5%). Quality maintained at 0.90 vs 0.91 baseline.
curl -fsSL https://raw.githubusercontent.com/etr/wonk/main/install.sh | shcargo install wonkgit clone https://github.com/etr/wonk.git && cd wonk
cargo build --release
# Binary: target/release/wonkcd your-project
wonk search "handleRequest" # ranked full-text search
wonk sym "UserService" # find symbol definitions
wonk callers "dispatch" # who calls this?
wonk blast "processPayment" # what breaks if this changes?
wonk changes --blast --flows # changed symbols + impact analysis
wonk ask "error handling logic" # semantic search (requires Ollama)Indexing happens automatically on first use.
Wonk's primary audience is AI coding agents. Three integration paths:
MCP server β wonk mcp serve exposes 22 JSON-RPC tools over stdio. Agents call wonk_search, wonk_sym, wonk_callers, wonk_blast, etc. with structured parameters and JSON responses. See MCP server.
Claude Code plugin β the wonk plugin bundles the MCP server, a skill that teaches Claude when to prefer wonk over grep/glob, and a session hook. See Claude Code plugin.
CLI via Bash tool β agents run wonk commands directly. Use --format toon -q for compact output and --budget N to cap token consumption.
| Task | Command |
|---|---|
| Find a definition | wonk sym X or wonk show X |
| Full context (def + callers + callees) | wonk context X --budget 4000 |
| Trace forward call graph | wonk callees X --depth 3 |
| Shortest path A β B | wonk callpath A B |
| Impact of a change | wonk blast X or wonk changes --blast |
| Module overview | wonk summary src/api --depth 1 |
The wonk plugin integrates wonk into Claude Code as a native tool provider. It bundles the MCP server, an agent skill that teaches Claude when to prefer wonk over grep/glob, and a session hook that keeps the index fresh.
# Recommended: install via Groundwork Marketplace
claude plugin marketplace add https://github.com/etr/groundwork-marketplace
claude plugin install wonkSee the wonk-plugin repo for alternative installation methods.
Wonk includes a built-in MCP server for AI coding assistants. Add to your .mcp.json:
{
"mcpServers": {
"wonk": {
"command": "wonk",
"args": ["mcp", "serve"]
}
}
}22 tools exposed: search, sym, ref, sig, show, deps, rdeps, callers, callees, callpath, summary, flows, blast, changes, context, ask, cluster, impact, init, update, status, repos. All tools accept an optional repo parameter for multi-repo setups.
| Command | Description |
|---|---|
| Search | |
search <pattern> |
Full-text search with smart ranking, dedup, --semantic fusion |
ask <query> |
Semantic search via embedding similarity |
| Symbol lookup | |
sym <name> |
Symbol definitions by name, kind, or exact match |
ref <name> |
Find references to a symbol |
sig <name> |
Show function/method signatures |
show <name> |
Show full source body (--shallow for containers) |
| Code structure | |
ls [path] |
List files and symbols (--tree for structure) |
deps <file> |
Show file dependencies (imports) |
rdeps <file> |
Show reverse dependencies |
summary <path> |
Structural summary with optional --semantic description |
| Call graph | |
callers <name> |
Find callers with transitive --depth expansion |
callees <name> |
Find callees with transitive --depth expansion |
callpath <from> <to> |
Shortest call chain between two symbols |
| Program analysis | |
flows [entry] |
Detect entry points and trace execution flows |
blast <symbol> |
Blast radius with severity tiers and risk levels |
changes |
Changed symbols with optional --blast / --flows chaining |
context <name> |
Full symbol context: callers, callees, flows, children |
impact <file> |
Symbol-level change impact analysis |
| Semantic | |
cluster <path> |
Cluster symbols by semantic similarity (K-Means) |
| Index management | |
init |
Build index (auto-runs on first query) |
update |
Rebuild index |
status |
Show index stats |
repos list|clean |
Manage tracked repositories |
| Daemon | |
daemon start|stop|status|list |
Manage background file watcher |
| Integration | |
mcp serve |
Start MCP server (JSON-RPC 2.0 over stdio) |
Full flag reference: docs/commands.md
| wonk | ripgrep | ctags/LSP | |
|---|---|---|---|
| Structural ranking | Definitions first, tests last | No ranking | N/A |
| Deduplication | Re-export collapsing | None | N/A |
| Call graph | Callers, callees, callpath, blast radius | No | LSP only (running server) |
| Semantic search | Embedding similarity (Ollama) | No | No |
| Token budget | --budget N caps output |
No | No |
| Setup | Single binary, auto-indexes | Single binary | Language server per language |
| MCP server | 22 tools built-in | No | Via adapter |
| Output | grep-compatible + JSON + TOON | grep + JSON | Protocol-specific |
grep (default) -- standard grep-compatible format, pipe-friendly:
src/main.rs:42:fn main() {}
json (--format json) -- NDJSON, one object per line:
{"file":"src/main.rs","line":42,"col":1,"content":"fn main() {}"}toon (--format toon) -- compact, indentation-based, minimal punctuation:
file: src/main.rs
line: 42
content: fn main() {}
TypeScript (TSX), JavaScript (JSX), Python, Rust, Go, Java, C, C++, Ruby, PHP, C#
Wonk's core features work out of the box with zero external dependencies. Advanced features require:
- Ollama -- for semantic search and AI-generated summaries. Pull
nomic-embed-text(embeddings) andllama3.2:3b(summaries). - git -- only needed for
wonk impact --sinceandwonk changes --scope compare. Most likely already installed.
Layered TOML config: built-in defaults < ~/.wonk/config.toml < <repo>/.wonk/config.toml.
Full reference: docs/configuration.md
Built with Claude Code and Groundwork.
MIT