Local search memory for Markdown document folders.
This is a personal project I built for searching converted Markdown documents from an agent workflow. I am sharing it because the local vector, DirectML, NPU, and CodeGraph workflow may save someone else some setup time.
DocMemory builds a SQLite index inside a target documentation folder. It supports:
- SQLite FTS keyword search
- optional local vector search with FastEmbed
- hybrid keyword + vector search
- experimental DirectML and OpenVINO/NPU vector builds
- a read-only MCP server for agents
DocMemory pairs well with CodeGraph, a separate open-source tool for source code structure analysis. CodeGraph is not part of DocMemory; the two tools are complementary:
- Use CodeGraph for source code structure, callers, callees, and impact analysis.
- Use DocMemory for design docs, converted PDFs, specs, operation notes, and historical Markdown.
- Ask the agent to compare both before making code changes.
Recommended agent workflow:
1. Use CodeGraph to locate the relevant code path.
2. Use DocMemory to find the matching design/spec documentation.
3. Compare code behavior against the document.
4. Report mismatches with file paths, doc paths, and line references.
5. Only then edit code or docs.
Example prompt:
Use CodeGraph to trace the call path for OrderService.
Then use DocMemory to search the docs for "order retry timeout".
Compare the implementation with the design docs and list any mismatches.
Do not edit files yet.
Another useful prompt:
Use CodeGraph to find what calls formatOrderPayload.
Use DocMemory to find the document that describes order payload fields.
Tell me whether the current code matches the latest documented field rules.
DocMemory is designed for agent workflows where loading an entire documentation folder into context is too expensive.
Instead of reading hundreds of Markdown files, an agent can call docmemory_search and receive only the most relevant snippets:
large doc folder -> ranked snippets -> smaller prompt
This usually saves tokens in three ways:
- the agent avoids scanning unrelated files
- search results include only focused chunks and line ranges
- repeated questions reuse the local SQLite/vector index instead of re-reading raw docs
Hypothetical context reduction:
context reduction ~= 1 - (tokens returned by search / tokens in full docs)
These numbers are examples, not benchmarks. They assume a large Markdown folder and snippets around 300 tokens each; actual results depend on your chunk size and document structure.
Document-only comparison:
| Approach | Context sent to agent | Approx. tokens | Approx. reduction vs full-load baseline |
|---|---|---|---|
| Load full docs | Entire Markdown folder | 500,000 | 0% |
| Search top 10 | 10 snippets x 300 tokens | 3,000 | 99.4% |
| Search top 5 | 5 snippets x 300 tokens | 1,500 | 99.7% |
| Search top 3 | 3 snippets x 300 tokens | 900 | 99.8% |
Code + docs workflow estimate:
| Workflow | Context sent to agent | Approx. tokens | Approx. reduction vs full-load baseline |
|---|---|---|---|
| Load code + docs directly | Source tree + documentation folder | 800,000 | 0% |
| CodeGraph only | Relevant code symbols and call paths | 8,000 | 99.0% |
| DocMemory only | Top 5 doc snippets | 1,500 | 99.8% |
| CodeGraph + DocMemory | Code path + top 5 doc snippets | 9,500 | 98.8% |
The combined workflow may use slightly more tokens than DocMemory alone, but it answers a harder question: whether code and docs agree.
Compared with loading a whole documentation folder, search-first workflows can send far less context to the agent. The exact reduction depends on how many snippets and source files you open.
For best results, keep search limits small:
uv run --extra vector docmemory search <DOC_DIR> "background worker retry behavior" --hybrid -n 5For agents, prefer CodeGraph for code context, then MCP search for docs, then open original files only when the snippet is relevant.
- Convert or drop documents into a Markdown folder.
- Initialize the folder once to write DocMemory config.
- Sync when Markdown files change to rebuild the index.
- Search from the CLI, or let an agent search through MCP.
Markdown docs -> .docmemory/docmemory.sqlite -> CLI / MCP search
Clone and run locally:
git clone https://github.com/kuchris/DocMemory.git
cd DocMemory
uv run docmemory --helpThe examples use PowerShell because this project was built and tested on Windows.
Initialize an index inside a document folder:
uv run docmemory init -i <DOC_DIR>Here, -i means "initialize this target folder" and writes .docmemory/config.ini inside the document folder.
Build keyword + vector indexes:
uv run --extra vector docmemory sync <DOC_DIR> --vectorSearch:
uv run --extra vector docmemory search <DOC_DIR> "payment retry design" --hybrid -n 5Check status:
uv run docmemory status <DOC_DIR>Keyword search uses SQLite FTS:
uv run docmemory search <DOC_DIR> "API-REFERENCE"Vector-only search is useful when the query has few exact words:
uv run --extra vector docmemory search <DOC_DIR> "which document explains the background worker architecture" --vector -n 5Hybrid search combines keyword and vector results:
uv run --extra vector docmemory search <DOC_DIR> "payment retry timeout design" --hybrid -n 5Default model:
sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
FastEmbed stores model files under:
.models/
Override the model cache if needed:
$env:DOCMEMORY_MODEL_DIR = "D:\models\docmemory"Use the model name in commands, not the local cache folder name.
The DirectML command is separate from the stable CPU command:
uv run --extra directml docmemory-dml sync <DOC_DIR> --vector --model sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2Defaults:
model: sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
batch: 32
The first embedding batch may pause while ONNX Runtime compiles the graph. Later batches print timing.
Probe DirectML before a long rebuild:
uv run --extra directml python scripts\probe_directml_fastembed.pyThe NPU path uses a separate Python environment at .venv-npu because onnxruntime-openvino may conflict with other ONNX Runtime builds.
Defaults:
model: sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
batch: 32
device: NPU
precision: FP16
parallel: 2
max chars: 600
Useful NPU knobs:
$env:DOCMEMORY_NPU_BATCH_SIZE = "32"
$env:DOCMEMORY_NPU_PARALLEL = "2"
$env:DOCMEMORY_NPU_MAX_CHARS = "600"
$env:DOCMEMORY_NPU_PRECISION = "FP16"DOCMEMORY_NPU_MAX_CHARS trims long chunks before embedding to reduce wasted tokenization and inference work. The full text still stays in the SQLite text index.
NPU cache:
.models/openvino-cache/
Probe NPU:
<DOCMEMORY_DIR>\.venv-npu\Scripts\python.exe scripts\probe_npu_fastembed.pyOptional .bat launchers can be placed beside a document folder for one-click rebuilds.
Recommended launcher behavior:
- rebuild vectors for the folder containing the
.bat - keep the database inside that folder at
.docmemory/docmemory.sqlite - use MiniLM for daily sync
- keep NPU/GPU launchers separate from the stable CPU launcher
DocMemory includes a read-only MCP server for agents.
Tools:
docmemory_status
docmemory_search
Example Codex config:
[mcp_servers.docmemory]
command = "uv"
args = ["run", "--directory", "<DOCMEMORY_DIR>", "--extra", "mcp", "docmemory-mcp"]
enabled = true
[mcp_servers.docmemory.env]
DOCMEMORY_TARGET = "<DOC_DIR>"Use CLI or .bat files to rebuild vectors. Use MCP for agent search.
Ignored by default:
.docmemory
_history
.git
.svn
__pycache__
node_modules
Add more ignored folders during init:
uv run docmemory init -i <DOC_DIR> --ignore old --ignore backupThe database is stored inside the target folder:
<DOC_DIR>/.docmemory/docmemory.sqlite
Model files are stored inside the DocMemory project by default:
<DOCMEMORY_DIR>/.models/
Apache-2.0. See LICENSE.
If this project saves you time, please consider giving it a GitHub star. It helps other people find the repo.