Add size limits to prevent context overflow in large repos #19

JordanCoin · 2026-01-29T06:49:12Z

Summary

Session-start hook now uses adaptive depth based on repo size (depth 2-4)
Both hook and MCP get_structure enforce 60KB max output (~15k tokens, <10% of context)
Truncates cleanly at line boundaries with helpful message

Problem

Large repos (10k+ files like Rails monoliths) were outputting 1.3MB+ of tree structure on session start, consuming 250%+ of Claude Code's context window before any conversation even started.

Root Cause: Hook Output Goes to "Messages"

The critical insight is that hook output gets injected into the "Messages" portion of Claude's context, not into system prompt or tools. This means:

Hook output competes directly with conversation history
/clear doesn't help because hooks re-run on session start
Even with no conversation, "Messages" can show 500k+ tokens
Users see "Context limit reached" immediately on fresh sessions

Why This Matters for Hook Architecture

Hooks need to be context-aware. A hook that's helpful for a 500-file project becomes destructive for a 10k-file project. Current hooks assume unlimited output is fine - it's not.

Principles for future hook design:

Hooks should NEVER output more than ~10% of context window (~20k tokens / ~80KB)
Output should scale inversely with repo size (bigger repo = less detail)
Consider making hooks return structured data instead of free-form text
Add a global hook output budget that hooks share

The Quick Fix (This PR)

Adaptive depth: >5000 files → depth 2, >2000 files → depth 3, else depth 4
Hard cap: 60KB max output with clean truncation
Applies to both CLI hooks and MCP get_structure tool

Future Refactoring Ideas

Hook output budget: Global limit shared across all hooks
Structured hook responses: Return JSON that Claude Code can format/truncate
Lazy loading: Show summary first, let user request details
Project-specific config: Allow .codemap/config.json to tune hook behavior
Hook priority system: Critical info first, nice-to-have truncated

Test Results

Tested on gumroad repo (10,144 files, 303MB):
- Before: 1,375,873 bytes (1.3MB) → 500k+ tokens
- After: 60,973 bytes (61KB) → ~15k tokens
Both codemap and codemap-mcp rebuild successfully
Claude Code sessions now start with normal context usage

Files Changed

cmd/hooks.go: Adaptive depth + size limit for session-start hook
mcp/main.go: Size limit for get_structure MCP tool

🤖 Generated with Claude Code

- Session-start hook now uses adaptive depth based on repo size: - >5000 files: depth 2 - >2000 files: depth 3 - Otherwise: depth 4 - Both hook and MCP get_structure enforce 60KB max output (~15k tokens) - Truncates cleanly at line boundaries with helpful message - Prevents consuming >10% of LLM context window Fixes issue where 10k+ file repos (like Rails monoliths) would output 1.3MB+ of tree structure, overwhelming Claude Code's context. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Hook output goes directly into Claude's "Messages" context, not system prompt. This means hook output competes with conversation history for the ~200k token limit. A 1.3MB output (like a full tree of a 10k file repo) equals ~500k tokens, causing instant context overflow. The size limits (adaptive depth + 60KB cap) are critical safeguards. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

JordanCoin and others added 2 commits January 29, 2026 01:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add size limits to prevent context overflow in large repos #19

Add size limits to prevent context overflow in large repos #19

Uh oh!

JordanCoin commented Jan 29, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add size limits to prevent context overflow in large repos #19

Are you sure you want to change the base?

Add size limits to prevent context overflow in large repos #19

Uh oh!

Conversation

JordanCoin commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Root Cause: Hook Output Goes to "Messages"

Why This Matters for Hook Architecture

The Quick Fix (This PR)

Future Refactoring Ideas

Test Results

Files Changed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

JordanCoin commented Jan 29, 2026 •

edited

Loading