MemTrace is a state-aware memory runtime and profiler for long-horizon LLM agents. It records agent traces, builds an execution state tree, writes structured memories, retrieves context with state awareness, gates unsafe or stale memories before prompt injection, and reports every retrieval decision.
Vector memory alone can recall the wrong thing at the wrong time: a failed branch, another workspace's preference, stale endpoint guidance, or risky tool evidence. MemTrace treats memory as runtime infrastructure rather than a generic RAG store:
- Trace first: raw runs, steps, and events are persisted before memory extraction.
- State-aware retrieval: active execution paths influence candidate selection and scoring.
- Admission gate: failed/rolled-back, stale, superseded, cross-workspace, secret, and risky memories are rejected or degraded before context packing.
- Replayable observability: access logs, gate logs, profiler events, and replay APIs explain why a memory entered or missed the prompt.
flowchart TD
Agent[Agent / demo loop] --> Runtime[MemoryRuntime facade]
Runtime --> Trace[Trace Collector]
Runtime --> State[Execution State Tree]
Runtime --> Writer[Rule / LLM Write Pipeline]
Runtime --> Retrieval[Retrieval Controller]
Retrieval --> Gate[Admission Gate]
Gate --> Packer[Context Packer]
Runtime --> Profiler[Profiler]
Trace --> PG[(PostgreSQL + pgvector)]
State --> PG
Writer --> PG
Retrieval --> PG
Profiler --> PG
Runtime --> Reports[JSON / Markdown / HTML Reports]
Prerequisites:
- Python 3.12+
- uv
Install dependencies and generate all deterministic showcase reports:
uv sync --extra dev
./scripts/reproduce.shThe script runs these entrypoints:
uv run python -m app.demo.run_demo --out reports
uv run python -m app.benchmark.runner --output-dir reports
uv run python -m app.observability.reports --output-dir reportsGenerated artifacts are ignored by git and can be regenerated at any time:
reports/demo_report.mdreports/demo_report.jsonreports/benchmark_report.mdreports/benchmark_results.jsonreports/observability_report.jsonreports/observability_report.mdreports/observability_report.html
The deterministic benchmark passes only when reports/benchmark_results.json contains acceptance.passed=true.
The canonical demo is Bun vs Node.js with failed-branch isolation:
- The user states that the project uses Bun, not Node.js.
- A failed branch tries
npm testand is rolled back. - A recovery step asks how to run tests.
baseline_1recalls the failednpm testevidence and is contaminated.variant_2uses state-aware retrieval plus the gate, rejects the rolled-back branch, and choosesbun test.
Run only the demo:
uv run python -m app.demo.run_demo --out reportsRun only the deterministic benchmark:
uv run python -m app.benchmark.runner --output-dir reportsStrategies:
baseline_0: no memory.long_context: includes every retrievable workspace memory with hard/risk/state policies disabled and an effectively unbounded budget, exposing token bloat and failed-branch contamination while preserving the same trace/gate logging path; non-bypassable quarantine/secret/destructive/tool-sensitive/redaction safety floors still apply.baseline_1: vector/lexical memory without state-aware isolation or the full admission gate; non-bypassable quarantine/secret/destructive/tool-sensitive/redaction safety floors still apply.variant_1: state-aware retrieval with failed/rolled-back branch rejection relaxed for ablation, while hard/risk safety policy remains enabled.variant_2: state-aware retrieval plus admission gate.variant_3: state-aware + gate + deterministic reflection-lite / retention-rerank (placeholder for the ROADMAP §3.2 Reflection scheduler).
The benchmark covers project preference, failed-branch isolation, workspace isolation, tool-call safety, explicit correction, completed-run reuse, stale rejection, no-memory failure recovery, over-budget context compaction retention, safe failure learning (case_10), sanitized destructive-failure handling (case_11), and reflection-retention under a tight budget (case_12_reflection_retention).
Generate a static observability report fixture:
uv run python -m app.observability.reports --output-dir reportsThe runtime also exposes observability APIs when served through FastAPI:
GET /healthPOST /v1/context/retrieveGET /v1/access/{access_id}GET /v1/replay/access/{access_id}GET /v1/replay/runs/{run_id}GET /v1/observability/summaryPOST /v1/observability/reportsGET /v1/dashboard/tables
When retrieval exceeds the token budget, MemTrace does not silently discard low-priority context. It emits protected compacted_constraints / compaction_notice blocks, persists each ContextCompactionLog, includes compaction metrics in observability summaries, surfaces retained facts in JSON/Markdown/HTML reports, and lets replay flag compaction_drift if a later rerun would compact differently. The deterministic benchmark includes case_9_over_budget_compaction, which checks constraint retention and unsafe-compaction leakage rather than relying on compression ratio alone.
Phase 3.5 adds an installable Python SDK and proves the same runtime behavior is reachable from three entrypoints: an embedded in-process backend, the FastAPI /v1 HTTP API, and the memtrace CLI. All paths go through MemoryRuntime, so state-aware retrieval, admission gating, context compaction, negative evidence, profiler logs, and replay semantics stay shared.
from memtrace_sdk import MemTrace
from memtrace_sdk.types import EventRole, EventType, StartRunRequest, StartStepRequest, WriteEventRequest
client = MemTrace.in_memory(default_workspace_id="ws_demo")
run = await client.start_run(StartRunRequest(session_id="demo-session", task="remember project facts"))
step = await client.start_step(StartStepRequest(run_id=run.run_id, intent="record preference"))
await client.write_event(
WriteEventRequest(
run_id=run.run_id,
step_id=step.step_id,
role=EventRole.user,
event_type=EventType.message,
content="This project uses Bun, not Node.js",
)
)Use MemTrace.in_memory(...) for deterministic local demos/tests, or wrap an existing runtime with MemTrace.in_process(runtime).
Start the API server as shown below, then point the same SDK facade at it:
from memtrace_sdk import MemTrace
client = MemTrace.http("http://localhost:8000", api_key="demo-token-if-auth-enabled")The HTTP backend mirrors the /v1 surface and maps HTTP 404/400 responses to SDK NotFoundError / BadRequestError. api_key is optional unless the server is started with MEMTRACE_AUTH_ENABLED=true, in which case it is sent as a Bearer token. The backend also preserves backend isomorphism for list-shaped reads such as timeline, state tree, steps, profile, and memories.
MemTraceLangGraphAdapter provides framework-light lifecycle hooks without requiring langgraph at SDK import time:
from memtrace_sdk import MemTrace, MemTraceLangGraphAdapter
client = MemTrace.in_memory(default_workspace_id="ws_graph")
adapter = MemTraceLangGraphAdapter(client, run_id=run.run_id)
step, context = await adapter.before_node("planner", "How should I run tests?")
write_result, finish_result = await adapter.after_node(step.step_id, content="Use bun test")See examples/langgraph_adapter for a minimal graph that runs when LangGraph is installed and skips cleanly otherwise.
Run the one-shot deterministic CLI demo:
uv run --package memtrace-sdk memtrace demo --in-processOperational CLI commands require --http because each shell invocation is a new process and cannot share throwaway in-memory state:
uv run --package memtrace-sdk memtrace --http http://localhost:8000 start-run --session-id demo --task "trace my agent"
uv run --package memtrace-sdk memtrace --http http://localhost:8000 retrieve --run-id <run_id> --query "How do I run tests?" --jsonFor runnable end-to-end examples, start with examples/README.md, examples/simple_agent, and examples/langgraph_adapter.
The deterministic quickstart above does not require Docker. To explore the SQL-backed runtime, start pgvector PostgreSQL with docker-compose.yml:
docker-compose up -d
uv run alembic upgrade head
uv run uvicorn app.main:app --app-dir apps/api --reloadThen check:
curl http://localhost:8000/healthThe compose file uses pgvector/pgvector:pg16 on host port 5433. Existing PG15 volumes are not compatible with the PG16 image; switching images may require removing the old volume.
The real LLM bench is manual and opt-in because it requires network access and a live OpenAI-compatible API key:
MEMTRACE_LLM_API_KEY=... \
MEMTRACE_LLM_BASE_URL=https://ark.cn-beijing.volces.com/api/v3 \
MEMTRACE_LLM_MODEL=deepseek-v4-pro-260425 \
uv run python -m app.benchmark.llm_bench --output-dir reportsIt writes reports/llm_bench_report.json and reports/llm_bench_report.md.
Run the full local smoke bundle:
./scripts/smoke.shOr run the pieces directly:
uv run pytest -q
./scripts/reproduce.sh
uv run python -m app.benchmark.runner --output-dir reportsThe completed MVP, Phase 3-A observability work, Context Compaction C0-C5, Failure-aware Negative Memory Injection I1-I6, Phase 3.5 SDK/LangGraph adapter/CLI work, the completed 6-strategy benchmark/eval-table slice, and future priorities are tracked in docs/design/ROADMAP.md. For a narrative overview of the core idea, read docs/blog/why-agent-memory-is-not-just-rag.md. ROADMAP §13 Security & Consistency Hardening is complete through H18, including migration policy checks, redacted trace bundle export/validation, and deterministic dogfood harnesses; the next recommended areas are Provider Registry / Controlled Memory Key Ontology (§10/§11), unless deferred I7 compaction-negative retention is explicitly selected first.