Spectral Cortex is a compact Rust implementation of a Spectral Memory Graph (SMG) designed to be used as a short-term and long-term memory store for AI agents that reason over a project's git history. It converts commit messages and other short text chunks into embeddings, builds a spectral graph of semantic relationships, clusters related content, and exposes a retrieval API tuned for agent workflows.
This README is targeted at developers who want a local, explainable memory backing for AI agents that need to answer questions, recall past decisions, or link present context to repository history.
Highlights
- Purpose-built for agent memory over git history (commits, PR messages, notes).
- AST-Aware Split Mode: Use
tree-sitter(Rust, TS, JS, Python) to map commit messages directly to code symbols. - Structural Fusion: Fuses semantic similarity with the repo's call-graph and structural links.
- Embedding Deduplication: Batch-level de-duplication of identical content (e.g., multiple AST symbols in one commit) to drastically reduce embedding costs.
- Hybrid Search: Combines vector similarity with keyword-based metadata boosting for symbols and file paths.
Contents
- Quick start
- MCP server (markdown-first tools)
- Agent-oriented workflows & examples
- CLI reference (important flags)
- Temporal re-ranking behavior (defaults & control)
- Library API & data model
- Persistence format
- Extensibility notes (hooks for agents)
- Testing & development
- Contributing and license
Clone and build:
# Clone and build
git clone https://github.com/mrorigo/spectral-cortex.git
cd spectral-cortex
cargo build --releaseInstall from this repository (single binary with CLI + MCP subcommand):
cargo install --path crates/spectral-cortex-cli --forceOn macOS, build/install copies Torch runtime dylibs to a sibling libtorch/ directory and embeds an rpath to @executable_path/libtorch.
cargo install ...installs:~/.cargo/bin/spectral-cortex~/.cargo/bin/libtorch/*.dylib
- local
cargo buildplaces dylibs under:target/<profile>/libtorch/*.dylib
Run with:
spectral-cortex --helpIf you move/copy the binary manually on macOS, keep libtorch/ beside it (same directory level) so dylib loading continues to work.
Ingest a repository and build the SMG (recommended CLI flow):
spectral-cortex ingest --repo /path/to/repo --out smg.jsonQuery the saved SMG programmatically (JSON output suitable for agents):
spectral-cortex query --query "why did we add X" --smg smg.json --json --top-k 10A dedicated MCP subcommand is available for agent workflows that need compact, markdown-first responses instead of verbose JSON.
Run it over stdio:
spectral-cortex mcp --smg smg.jsonAvailable tools:
graph_summary: compact graph metadata for an SMG filequery_graph: semantic query with markdown tables and compact related-note summariesinspect_note: inspect one note and related notes with spectral similaritylong_range_links: list top long-range links in markdown table formatget_structural_hotspots: find the most frequently modified AST symbols (the "brittle" parts)inspect_symbol_history: deep dive into the chronological evolution of a specific structural symbol (class/function)
MCP client wiring example (recommended):
{
"mcpServers": {
"spectral-cortex": {
"command": "spectral-cortex",
"args": ["mcp", "--smg", "/path/to/smg.json"]
}
}
}If the binary is not on PATH, use an absolute path:
{
"mcpServers": {
"spectral-cortex": {
"command": "/absolute/path/to/spectral-cortex",
"args": ["mcp", "--smg", "/absolute/path/to/smg.json"]
}
}
}Development fallback (build+run from source each launch):
{
"mcpServers": {
"spectral-cortex": {
"command": "cargo",
"args": ["run", "-p", "spectral-cortex", "--release", "--", "mcp", "--smg", "smg.json"],
"cwd": "/Users/origo/src/spectral-cortex"
}
}
}Tool input examples:
graph_summary
{}query_graph
{
"query": "mcp protocol",
"top_k": 5,
"links_k": 3
}inspect_note
{
"note_id": 5071,
"links_k": 10
}long_range_links
{
"top_k": 20
}All MCP tool responses are markdown-first and intentionally compact to reduce token usage.
The typical flow for an agent using the SMG as memory:
- Periodic ingestion: run the ingest job (cron / CI hook) and persist
smg.json. - At runtime, load
smg.jsononce per agent process or cache it in memory. - For a user or agent query:
- Get top-K relevant turn IDs and associated note metadata via the CLI or library API (JSON).
- Retrieve the source commit ids, timestamps, and content snippets for context.
- Use the returned snippets + candidate commit ids as evidence to feed into your agent's prompt or grounding layer.
- Optionally: store agent feedback (relevance labels) externally for tuning ranking weights in future enhancements.
Why this is suited to agents
- Small and self-contained: you can run entirely on a developer machine or container.
- Deterministic local embedder available for tests; real MiniLM used by default for realistic retrieval.
- Outputs structured JSON that an agent can parse to build prompts or context windows.
- Temporal re-ranking biases results toward recent, likely more actionable history — useful for agents that should prefer recent fixes or regression-causing commits.
The spectral-cortex binary exposes: ingest, update, query, note, and mcp.
Ingest (collect commits -> SMG):
cargo run -p spectral-cortex --release -- ingest --repo /path/to/repo --out smg.jsonUpdate (incremental append ingest; only new commits are embedded):
cargo run -p spectral-cortex --release -- \
update --repo /path/to/repo --out smg.json --git-filter-preset git-noiseQuery (default, temporal enabled):
cargo run -p spectral-cortex --release -- \
query --query "refactor" --smg smg.json --json --top-k 10Inspect one note:
cargo run -p spectral-cortex --release -- \
note --smg smg.json --note-id 42 --jsonRun MCP server with preloaded SMG:
cargo run -p spectral-cortex --release -- \
mcp --smg smg.jsonmcp also accepts --smd as an alias for --smg.
Key query flags (agent-friendly):
--top-k <n>: how many final results to return (default 5).--candidate-k <n>: how many candidates to retrieve from vector search before filtering (defaults totop_k * 5).--min-score <float>: inclusive threshold applied to the combinedfinal_score(default 0.7).--no-temporal: disable temporal re-ranking for this query (temporal is enabled by default).--temporal-weight <0..1>: control recency influence (default 0.20).--temporal-half-life-days <float>: half-life for exponential decay (default 14.0).--file <string>: filter results by file path (substring match).--symbol <string>: filter results by symbol ID (substring match).--keyword-weight <float>: weight for hybrid metadata boosting (default 0.3).--json: emit machine-readable JSON (recommended for agents).
Key ingest/update filtering flags:
--git-filter-preset git-noise: drop common metadata lines (e.g.Co-authored-by,Signed-off-by).--git-filter-drop <regex>: repeatable custom line-drop regex.--git-filter-case-insensitive: case-insensitive regex matching.--git-commit-split-mode <off|auto|strict|ast>: split multi-change commit messages.astuses tree-sitter.--git-commit-split-max-segments <n>: cap segments per commit.--git-commit-split-min-confidence <0..1>: confidence threshold forauto.--num-spectral-dims <n>: number of spectral dimensions (k) to compute (default 8).--min-clusters <n>: minimum clusters allowed (default 2).--max-clusters <n>: maximum clusters allowed (default 8).
For local agent memory that stays fresh automatically, wire the update command into a git post-commit hook.
Example .git/hooks/post-commit:
#!/usr/bin/env bash
set -euo pipefail
spectral-cortex update \
--repo . \
--out smg.json \
--git-filter-preset git-noiseMake it executable:
chmod +x .git/hooks/post-commitTemporal re-ranking is enabled by default because agents typically benefit from fresher context when interpreting repository state. The default strategy is:
- Mode: exponential decay
- Weight: 0.20 (20% recency influence)
- Half-life: 14 days
Combination formula (final score): final = (1 - weight) * semantic_score + weight * temporal_score
Notes:
- Missing timestamps are treated as very old (temporal_score = 0).
--no-temporaldisables temporal scoring when you need canonical, time-agnostic retrieval.--min-scoreis applied tofinal_score, so agent clients can filter noisy candidates consistently.
ingest and update always rebuild spectral structures after ingesting turns.
When tuning the spectral structure (clusters and dimensions), consider the following:
- Architectural Match:
max_clustersshould roughly correspond to the number of top-level modules or logical "areas" in your codebase. For a medium-sized project, 5–15 clusters is usually a good starting point. - Spectral Resolution:
num_spectral_dims(k) defines the dimensionality of the spectral embedding before clustering. It should typically be greater than or equal tomax_clusters. Higher values capture more structural nuance but can introduce noise. - Eigengap Heuristic: The internal algorithm uses the "eigengap" (the largest jump between sorted eigenvalues) to choose the optimal cluster count within your
min/maxbounds. A strong gap indicates a "natural" partition in the semantic/structural graph. - Stickiness: Settings used during
ingestare stored in the SMG JSON metadata and automatically reused duringupdateunless overridden by CLI flags.
Use the library if you embed the SMG directly inside an agent process.
Primary types:
-
SpectralMemoryGraphnew() -> Result<Self>: initializes embedder and structures.ingest_turn(&mut self, turn: &ConversationTurn) -> Result<()>: add a turn.build_spectral_structure(&mut self) -> Result<()>: compute spectral embeddings & clusters.retrieve_with_scores(&self, query: &str, candidate_k: usize) -> Result<Vec<(u64, f32)>>: returns per-turn final scores (semantic + temporal + cluster boosts). Callers may re-rank with a customTemporalConfigif you prefer different defaults.
-
ConversationTurnpub struct ConversationTurn { pub turn_id: u64, pub speaker: String, pub content: String, pub topic: String, pub entities: Vec<String>, pub commit_id: Option<String>, pub timestamp: u64, // unix epoch seconds pub symbol_id: Option<String>, pub ast_node_type: Option<String>, pub file_path: Option<String>, }
-
SMGNote- Internal note stored per embedded turn; includes:
raw_contentembedding: Vec<f32>source_turn_ids: Vec<u64>source_commit_ids: Vec<Option<String>>source_timestamps: Vec<u64>symbol_id: Option<String>ast_node_type: Option<String>file_path: Option<String>related_note_links: Vec<(u32, f32)>
- Internal note stored per embedded turn; includes:
The JSON format is strict and versioned (metadata.format_version = "spectral-cortex-v1").
SMG persistence uses a compact JSON representation (see src/lib.rs helpers):
// Save
save_smg_json(&smg, Path::new("smg.json"))?;
// Load
let smg = load_smg_json(Path::new("smg.json"))?;The persisted structure stores notes in stable sorted order, optional cluster labels, and centroids. Spectral matrices are not persisted (they are recomputable via build_spectral_structure()).
- Retrieval diagnostics: query JSON includes per-result
score,turn_id,note_id,related_notes, and where availablecommit_idandcluster_label. Top-level JSON includestemporalsettings used for the query. - Re-ranking: you can override the default re-ranker by calling
re_rank_with_temporalwith a customTemporalConfig(weight, half-life, mode). - Incremental ingestion:
ingest_turnappends turns — you can build an ingestion pipeline that streams new commits into a long-running agent process. - Feedback loop: collect agent judgments (useful/not useful) in a separate store and use those signals to adjust
temporal_weightor to implement a learned ranker later.
- Run unit tests:
cargo test -p spectral-cortex- Use deterministic fake embedder for tests (the project auto-selects a deterministic fake embedder under
cfg(test)so CI is reproducible). - Linting & formatting:
cargo fmt
cargo clippy -- -D warnings- The embedder bundles MiniLM assets via a companion
rust_embedrepo; no network fetch is required at runtime. - Default settings assume agents should prefer recent context; tune via CLI or library
TemporalConfigfor domain needs (e.g., security audits vs. active feature work). - If you plan to serve the SMG from a shared service, snapshot
smg.jsonand load it into worker processes to avoid repeated rebuilds.
If you improve retrieval, temporal defaults, or add learning-to-rank, please:
- Fork and create a feature branch.
- Add unit tests and integration tests for retrieval ordering and temporal logic.
- Open a PR describing the change and expected agent behavior.
MIT. See LICENSE for details.