Lerim turns completed agent runs into evidence-backed context records so the next agent starts with trusted operating context instead of another raw transcript.
Website Β· Docs Β· Benchmarks Β· Examples Β· PyPI Β· License
Trace to answer: support trace input, Lerim CLI import, MiniMax/BAML extraction, stored context, and cited answer.
Completed session in -> durable context out -> the next agent starts with evidence
Lerim is a source-session context compiler for AI agent workflows.
Observability shows what happened. Lerim decides what was worth learning from it.
Lerim reads completed agent traces, filters noisy execution history into durable signals, and writes compact context records for future runs.
Instead of replaying raw traces or losing what happened after each run, Lerim keeps:
- decisions
- constraints
- preferences
- facts
- handoffs
- evidence linked back to the source session
| Moment | Lerim does | Future agents get |
|---|---|---|
| A completed agent run lands | Imports a source session from an adapter, MCP submit, or clean custom JSONL | A stable source boundary instead of a transcript paste |
| The trace is noisy | Compacts the run and filters for reusable decisions, constraints, facts, preferences, and handoffs | Durable context, not another log index |
| Someone asks later | Retrieves relevant records and answers with citations back to stored evidence | A shorter start with less re-explaining |
pip install lerim
lerim init
lerim connect auto --mode auto
lerim project add .
lerim upNative adapters let Lerim ingest completed local sessions where a stable trace store exists. MCP setup writes Lerim tool entries for compatible agents; live recall or trace-submit acceptance is claimed only where the integration matrix lists installed-client/tool-call evidence:
lerim connect auto --mode mcp --dry-run
lerim connect auto --mode mcpThen ask Lerim what a future agent should know:
lerim answer "What context should I know before working in this project?"AI agents now triage tickets, investigate incidents, research markets, prepare handoffs, review policies, and change software.
Every run leaves a trace. Most traces are too long, too noisy, and too platform-specific for the next agent to reuse directly.
Without a durable context layer:
- decisions get re-debated
- constraints get rediscovered
- preferences get ignored
- every new session starts too close to zero
Lerim fixes that by turning raw traces into reusable context records and making them queryable from agent tools and product workflows.
Lerim is meant for any trace-producing agent workflow. Today, native source adapters are strongest for coding agents, and documented custom-trace paths cover support and incident workflows:
- coding agents: repo conventions, architecture decisions, setup facts, failed paths, test lessons, release handoffs
- support operations: customer constraints, known fixes, failed fixes, escalation reasons, policy evidence, handoffs
- operations and incidents: root causes, mitigations, rejected hypotheses, runbook gaps, incident handoffs, follow-up risks
- Trace-to-context extraction.
ingestreads supported sources and custom clean-trace folders, extracts reusable signal, and can archive routine runs without creating noisy durable records. - Shared context across agents. What one agent learns can become useful context for a different agent or workflow later.
- MCP access for compatible agents.
lerim mcpexposes context tools, andlerim connect <agent> --mode mcpwrites client config with backups and verification. - Context curation. Lerim consolidates overlap, archives weak records, and keeps the context layer compact.
- Derived context graph. Lerim links related decisions, constraints, evidence, facts, and handoffs for curation and future/hosted visualization.
- Query and startup context. Agents can ask questions against accumulated context or start from a compact context brief.
- Evidence-backed memory. Useful decisions, constraints, preferences, facts, and handoffs stay linked to the work that produced them.
- Custom source profiles. Coding, support, and incident workflows share one compiler, and teams can register YAML profiles for their own verticals with focus, noise, evidence, and scope rules.
- Not a raw transcript replay tool.
- Not a broad
memory_savebucket for agents to write arbitrary memories. - Not a replacement for observability. Observability keeps the trace; Lerim compiles reusable context from completed source sessions.
- Not a claim that every listed agent has native completed-session ingestion. MCP recall is useful, but it is different from native trace ingestion.
Lerim is intentionally selective:
- Read a completed source session from a native adapter, custom trace folder, or MCP
lerim_trace_submit. - Normalize and compact the trace while preserving source evidence.
- Extract only reusable decisions, constraints, preferences, facts, handoffs, and episodes.
- Make that context available through CLI, MCP tools, context briefs, and retrieval-backed answers.
Most routine traces should produce no durable record. Lerim's value is compact, cited context, not more logs.
flowchart LR
source[Completed source session] --> adapter[Native adapter or trace submit]
adapter --> normalized[Canonical trace]
normalized --> extractor[BAML / LangGraph extraction]
extractor --> records[Evidence-backed context records]
records --> curate[Curate and link]
curate --> tools[CLI and MCP tools]
tools --> agent[Future agent]
Lerim has two integration layers:
- Native trace adapters read completed local sessions and feed Lerim's compiler.
- MCP support lets compatible agents query Lerim context and submit completed sessions through
lerim_trace_submit.
| Support level | Agents and sources |
|---|---|
| Native adapter plus MCP config writer | Claude Code, Codex CLI, Cursor, OpenCode |
| MCP config writer; live recall/submit only where verified | Gemini CLI, Cline, Claude Desktop, OpenClaw, Hermes, Goose, Roo Code, Kilo Code, Windsurf |
| Native adapter, no MCP claim | pi |
| Experimental or user-owned path | OpenHuman, custom JSONL, generic MCP trace submit |
MCP support is not the same as native trace ingestion. Native adapters are best when the agent has a stable local session store. MCP config entries expose Lerim tools; live recall or completed-session submission is claimed only where the matrix lists installed-client/tool-call evidence. See the integration matrix for the exact public support boundary and evidence level per agent.
Install Lerim and register your project:
pip install lerim
lerim init
lerim connect auto
lerim project add .Install Lerim into an MCP client:
lerim connect gemini-cli --mode mcp --dry-run
lerim connect gemini-cli --mode mcpOr use a generic MCP client config:
{
"mcpServers": {
"lerim": {
"command": "/absolute/path/to/python",
"args": ["-m", "lerim.mcp_server"]
}
}
}lerim connect writes the absolute Python command automatically. That avoids
client startup failures when an MCP client launches with a smaller PATH than
your shell.
Available MCP tools:
lerim_context_brieflerim_context_answerlerim_context_searchlerim_records_listlerim_trace_submitlerim_ingest_status
Lerim intentionally does not expose a broad memory_save primitive. Completed sessions go through lerim_trace_submit, then Lerim's extraction pipeline decides what is durable.
Benchmark numbers live in docs, not in a marketing scoreboard inside the README. Start with Benchmark Overview for the map and reporting rules:
- Benchmark Suite: plain-English explanation of each benchmark surface and boundary.
- Lerim Results: first-party raw artifacts, commands, and boundaries.
- Market Comparison: source-backed market rows with provenance for each external number.
Current public artifacts are backed by raw report.json files and were
validated with the clean/tracked public benchmark gate for the v0.3.0 release.
Retrieval and context-budget artifacts are retrieval-only, not official
LongMemEval QA scores. The extraction artifact is an aggregate-only diagnostic
from an internal MiniMax M2.7 run, not a public market-comparison score.
| Surface | Current evidence |
|---|---|
| LongMemEval-S retrieval | Full 500-question hybrid and lexical retrieval-only artifacts |
| Context budget | Full 500-question context-selection artifact with recall beside token reduction |
| Retrieval latency | Local search timing over LongMemEval-S sessions |
| Trace ingestion cost/performance | Small public-trace sample with measured LLM calls and unavailable-cost disclosure |
| MCP integration | Config writers, local stdio tools/context probes, trace-submit idempotency, 0 trace-submit extraction acceptances in the current artifact, and one Gemini CLI live context-tool call |
| Extraction quality | Aggregate-only 47-case diagnostic report; competitors not run on this private eval |
Before publishing a benchmark claim, require the exact command, git commit,
dataset snapshot, raw report.json, generated report, model/provider,
hardware/runtime metadata, and failure count.
- Support operations: documented custom-trace path; preserve triage decisions, escalation evidence, policy-backed facts, known fixes, and customer constraints.
- Operations and incidents: documented custom-trace path; preserve root causes, mitigations, rejected hypotheses, runbook gaps, owner decisions, and follow-up risks.
- Coding agents: retain architecture decisions, failed paths, repo conventions, setup facts, release handoffs, and constraints.
Research, revenue, security, and other verticals can use the same custom-trace path today when the user owns export, cleaning, and redaction. The first product wedge and strongest examples are coding plus support and incident operations.
Built-in connect adapters monitor the supported sources available today:
Claude Code, Codex CLI, Cursor, OpenCode, and pi.
For another agent or business workflow, register already-clean Lerim canonical JSONL traces:
python clean_to_lerim_jsonl.py \
--input ./raw-support-agent-traces \
--output ~/lerim-traces/support-clean
lerim project add ~/lerim-traces/support-clean --type custom
lerim ingest --agent customEach .jsonl file is one completed source session. Each line must be a
canonical user or assistant event:
{"type":"user","message":{"role":"user","content":"Customer asked for renewal approval."},"timestamp":"2026-05-16T09:00:00Z"}
{"type":"assistant","message":{"role":"assistant","content":"Agent found approval is required above EUR 500."},"timestamp":"2026-05-16T09:02:00Z"}Custom mode has no Lerim adapter and no compaction step. The source owner owns export, cleaning, redaction, and retention before files enter the custom folder.
For explicit business traces, import with a source profile and domain scope:
lerim trace import docs/examples/traces/support-agent-run.jsonl \
--source-name support-agent \
--source-profile support \
--scope-type domain \
--scope support-ops
lerim context records --profile support
lerim context records --profile support --type factlerim status
lerim status --live
lerim logs --follow
lerim queue
lerim queue --failed
lerim ingest
lerim curate
lerim context-brief show
lerim context-brief status
lerim answer "What decisions exist about caching?"Setup and management:
lerim connect auto
lerim project list
lerim project remove <name>
lerim skill installAlternative to the background service:
lerim serveuv venv && source .venv/bin/activate
uv pip install -e '.[test]'
tests/run_tests.sh unit
tests/run_tests.sh smoke
tests/run_tests.sh integration
tests/run_tests.sh e2eBefore release, verify the affected path with the relevant suites:
tests/smoke/β short LLM-backed runtime checks; not benchmark evidencetests/integration/β LLM-backed extract, curate, and semantic answer coveragetests/e2e/β full runtime-cycle checks over ingest, curate, and answer
Release-readiness checks:
uv run python scripts/release_preflight.py --version <version>after the version and changelog are updateduv run pytest tests/unit -quv run mkdocs build --strictuv builduv run python benchmarks/scripts/validate_public_artifacts.pyuv run python benchmarks/scripts/validate_public_artifacts.py --require-cleanbefore launch-grade benchmark claimsuv run python benchmarks/scripts/validate_public_artifacts.py --require-tracked-public-filesbefore release packaging- clean-environment install and
lerim mcpstartup check - README/docs/asset review for unsupported benchmark, support, or comparison claims
Start here if you want to read the codebase:
- src/lerim/README.md
- src/lerim/skills/cli-reference.md
- docs/concepts/source-session-context-compiler.md
- docs/concepts/mcp-vs-native-adapters.md
- docs/concepts/how-it-works.md
- docs/concepts/context-model.md
Lerim core is Apache-2.0. The local CLI, runtime, MCP server, native adapters, context DB schema, benchmark scripts, and integration docs should remain useful without a paid account.
The planned commercial path is hosted/team infrastructure: sync, hosted private MCP, dashboards, review workflows, governance, SSO, audit logs, managed retention, evaluation monitoring, private deployments, and enterprise support.
See COMMERCIAL.md for the open-core boundary.
Contributions are welcome.
Good starting points include:
- trace-source adapters and custom trace-folder examples
- extraction quality
- context curation quality
- context graph link quality
- docs and demo examples
Helpful links:
- Contributing Guide
- Open issues
- Trace-source adapter examples:
src/lerim/adapters/

