Skip to content

etr/wonk

Repository files navigation

Wonk

CI Crates.io License: MIT

Structure-aware code search that cuts LLM token burn by 37%.

Wonk indexes your codebase with Tree-sitter to understand code structure β€” definitions, call graphs, imports, and scopes β€” then ranks search results so definitions surface first and tests sort last. A built-in MCP server exposes 22 tools for AI coding assistants, and a background daemon keeps the index fresh. Single static binary, zero runtime dependencies.

Before / after

Searching for Blueprint in the Flask repo -- ripgrep returns 225 lines of unsorted noise (changelogs, docs, tests, definitions all mixed together). Wonk returns the same matches ranked and deduplicated: definitions first, usages next, comments and tests last, with re-exports collapsed into (+N other locations) annotations.

rg Blueprint (225 lines)wonk search Blueprint (213 lines)
./src/flask/app.py:1119:  ...Blueprint`
./src/flask/app.py:1427:  ...Blueprint...
./src/flask/blueprints.py:10:from ...
./src/flask/blueprints.py:11:from ...
./src/flask/blueprints.py:18:class Blueprint...
./src/flask/__init__.py:3:from ...
./src/flask/debughelpers.py:8:from ...
./src/flask/debughelpers.py:146:  ...Blueprint
  ... 217 more unsorted lines ...
-- definitions --
src/flask/blueprints.py:18:class Blueprint(...)
  (+13 other locations)
src/flask/sansio/blueprints.py:119:class Blueprint(...)
  (+13 other locations)
-- usages --
src/flask/debughelpers.py:146:  ...Blueprint
src/flask/sansio/app.py:374:  self.blueprints: ...
  ... sorted by relevance ...
-- tests --
tests/test_blueprints.py:9:  ...

Same data, structured for an LLM context window -- definitions surface instantly instead of buried on line 67.

The problem

LLM coding agents grep aggressively. A single query can stuff hundreds of noisy, unranked lines into the context window -- raw matches with no sense of what is a definition, what is a test, and what is a re-export. That is wasted tokens and wasted money.

How it works

Wonk pre-indexes your codebase with Tree-sitter so it understands code structure: definitions vs. usages, symbol kinds, scopes, imports, and dependencies. When you search, results come back ranked, deduplicated, and grouped by relevance -- definitions first, tests last. The index stays fresh via a background file watcher, and a built-in MCP server exposes 22 tools for AI coding assistants.

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Query  │───>β”‚ Router │───>β”‚  SQLite index             │───>β”‚ Ranker │───>β”‚ Budget │──> Output
β”‚ (CLI or β”‚    β”‚        β”‚    β”‚  (symbols, refs, imports,  β”‚    β”‚ Def >  β”‚    β”‚--bud-  β”‚
β”‚   MCP)  β”‚    β”‚        β”‚    β”‚   deps, call edges)        β”‚    β”‚ Use >  β”‚    β”‚ get N  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚        │───>β”‚  grep fallback             │───>β”‚ Test   β”‚    β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜
               β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                          β–²
                                β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                                β”‚  Daemon (notify)    β”‚
                                β”‚  keeps index fresh  β”‚
                                β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Search modes

Mode What it does
wonk search <pattern> Full-text search ranked by code structure. Definitions first, tests last, re-exports collapsed. Add --semantic for hybrid RRF fusion.
wonk sym / sig / show / ref Direct symbol lookup. Find definitions, view signatures, read full source, or trace references β€” no regex needed.
wonk ask <query> Semantic search via Ollama embeddings. Natural-language queries over code meaning. Requires Ollama + nomic-embed-text.

Features at a glance

Search

  • Smart ranking: definitions first, tests last, re-exports deduplicated
  • Semantic search via Ollama embeddings (wonk ask)
  • Hybrid RRF fusion blends structural + semantic results (--semantic)

Code intelligence

  • Symbol lookup, signatures, and full source display (sym, sig, show)
  • Call graph traversal: callers, callees, shortest call path
  • Blast radius analysis with severity tiers and risk levels
  • Execution flow tracing from entry points
  • Changed symbol detection with blast/flow chaining

Architecture

  • Single static binary -- SQLite, tree-sitter grammars, and grep engine bundled
  • 12 languages: TypeScript/TSX, JavaScript, Python, Rust, Go, Java, C, C++, Ruby, PHP, C#
  • Background daemon keeps index fresh via filesystem watcher
  • Worktree isolation -- separate index per git worktree
  • 22 MCP tools for AI coding assistants (JSON-RPC 2.0 over stdio)
  • Token budget (--budget N) caps output and preserves top-ranked results

Benchmarks

25 code-understanding tasks across 5 real-world repos (ripgrep, tokio, httpx, pydantic, fastify), 5 runs each, median reported. Measures Claude Code token consumption with vs without wonk.

Category Baseline (avg) Wonk (avg) Reduction Quality (B→W)
symbol_location 100k 61k 33% 0.85β†’0.85
reference_tracing 96k 57k 28% 0.92β†’0.88
architecture 162k 101k 29% 0.90β†’0.96
multi_step 143k 104k 23% 0.93β†’0.93
structural 130k 69k 46% 0.95β†’0.88

Overall: 37.4% total reduction (median per-task 29.7%, best 68.5%). Quality maintained at 0.90 vs 0.91 baseline.

Installation

curl (Linux / macOS)

curl -fsSL https://raw.githubusercontent.com/etr/wonk/main/install.sh | sh

Cargo

cargo install wonk

Building from source

git clone https://github.com/etr/wonk.git && cd wonk
cargo build --release
# Binary: target/release/wonk

Quick start

cd your-project
wonk search "handleRequest"       # ranked full-text search
wonk sym "UserService"            # find symbol definitions
wonk callers "dispatch"           # who calls this?
wonk blast "processPayment"       # what breaks if this changes?
wonk changes --blast --flows      # changed symbols + impact analysis
wonk ask "error handling logic"   # semantic search (requires Ollama)

Indexing happens automatically on first use.

How agents use wonk

Wonk's primary audience is AI coding agents. Three integration paths:

MCP server β€” wonk mcp serve exposes 22 JSON-RPC tools over stdio. Agents call wonk_search, wonk_sym, wonk_callers, wonk_blast, etc. with structured parameters and JSON responses. See MCP server.

Claude Code plugin β€” the wonk plugin bundles the MCP server, a skill that teaches Claude when to prefer wonk over grep/glob, and a session hook. See Claude Code plugin.

CLI via Bash tool β€” agents run wonk commands directly. Use --format toon -q for compact output and --budget N to cap token consumption.

Workflow patterns

Task Command
Find a definition wonk sym X or wonk show X
Full context (def + callers + callees) wonk context X --budget 4000
Trace forward call graph wonk callees X --depth 3
Shortest path A β†’ B wonk callpath A B
Impact of a change wonk blast X or wonk changes --blast
Module overview wonk summary src/api --depth 1

Claude Code plugin

The wonk plugin integrates wonk into Claude Code as a native tool provider. It bundles the MCP server, an agent skill that teaches Claude when to prefer wonk over grep/glob, and a session hook that keeps the index fresh.

# Recommended: install via Groundwork Marketplace
claude plugin marketplace add https://github.com/etr/groundwork-marketplace
claude plugin install wonk

See the wonk-plugin repo for alternative installation methods.

MCP server

Wonk includes a built-in MCP server for AI coding assistants. Add to your .mcp.json:

{
  "mcpServers": {
    "wonk": {
      "command": "wonk",
      "args": ["mcp", "serve"]
    }
  }
}

22 tools exposed: search, sym, ref, sig, show, deps, rdeps, callers, callees, callpath, summary, flows, blast, changes, context, ask, cluster, impact, init, update, status, repos. All tools accept an optional repo parameter for multi-repo setups.

Commands

Command Description
Search
search <pattern> Full-text search with smart ranking, dedup, --semantic fusion
ask <query> Semantic search via embedding similarity
Symbol lookup
sym <name> Symbol definitions by name, kind, or exact match
ref <name> Find references to a symbol
sig <name> Show function/method signatures
show <name> Show full source body (--shallow for containers)
Code structure
ls [path] List files and symbols (--tree for structure)
deps <file> Show file dependencies (imports)
rdeps <file> Show reverse dependencies
summary <path> Structural summary with optional --semantic description
Call graph
callers <name> Find callers with transitive --depth expansion
callees <name> Find callees with transitive --depth expansion
callpath <from> <to> Shortest call chain between two symbols
Program analysis
flows [entry] Detect entry points and trace execution flows
blast <symbol> Blast radius with severity tiers and risk levels
changes Changed symbols with optional --blast / --flows chaining
context <name> Full symbol context: callers, callees, flows, children
impact <file> Symbol-level change impact analysis
Semantic
cluster <path> Cluster symbols by semantic similarity (K-Means)
Index management
init Build index (auto-runs on first query)
update Rebuild index
status Show index stats
repos list|clean Manage tracked repositories
Daemon
daemon start|stop|status|list Manage background file watcher
Integration
mcp serve Start MCP server (JSON-RPC 2.0 over stdio)

Full flag reference: docs/commands.md

Comparison with alternatives

wonk ripgrep ctags/LSP
Structural ranking Definitions first, tests last No ranking N/A
Deduplication Re-export collapsing None N/A
Call graph Callers, callees, callpath, blast radius No LSP only (running server)
Semantic search Embedding similarity (Ollama) No No
Token budget --budget N caps output No No
Setup Single binary, auto-indexes Single binary Language server per language
MCP server 22 tools built-in No Via adapter
Output grep-compatible + JSON + TOON grep + JSON Protocol-specific

Output formats

grep (default) -- standard grep-compatible format, pipe-friendly:

src/main.rs:42:fn main() {}

json (--format json) -- NDJSON, one object per line:

{"file":"src/main.rs","line":42,"col":1,"content":"fn main() {}"}

toon (--format toon) -- compact, indentation-based, minimal punctuation:

file: src/main.rs
line: 42
content: fn main() {}

Supported languages

TypeScript (TSX), JavaScript (JSX), Python, Rust, Go, Java, C, C++, Ruby, PHP, C#

Optional dependencies

Wonk's core features work out of the box with zero external dependencies. Advanced features require:

  • Ollama -- for semantic search and AI-generated summaries. Pull nomic-embed-text (embeddings) and llama3.2:3b (summaries).
  • git -- only needed for wonk impact --since and wonk changes --scope compare. Most likely already installed.

Configuration

Layered TOML config: built-in defaults < ~/.wonk/config.toml < <repo>/.wonk/config.toml.

Full reference: docs/configuration.md

Acknowledgments

Built with Claude Code and Groundwork.

License

MIT

About

Structure-aware code search that cuts LLM token burn. Tree-sitter indexing, ranked results, grep-compatible output.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors