You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This document proposes a comprehensive architecture and design for lance-context, a high-performance, evolvable context management solution for AI Agents. The design establishes a clean separation between the logical information layer and the physical storage layer (LanceDB), defines a clear set of Agent-facing interfaces, and introduces a layered data model (L0-L2) to balance retrieval effectiveness and cost. It also includes specifications for data governance, multi-tenancy, and a phased implementation roadmap, aiming to provide a robust foundation for building and scaling complex Agent systems.
Motivation / Problem Statement
Current Agent development faces several challenges:
Fragmented Context: Information is often scattered across various systems, leading to inconsistent and incomplete context for the Agent.
Suboptimal Retrieval: Existing retrieval mechanisms lack the sophistication to balance precision, recall, and cost effectively.
Disorderly Information Growth: Without a proper strategy, an Agent's memory and knowledge base can expand indefinitely, leading to performance degradation and increased costs.
Lack of Extensibility: Tightly coupled business logic and storage implementations make it difficult to evolve the system, such as introducing new storage engines or adapting to new Agent capabilities.
The initial version of lance-context provides a solid starting point, but a more systematic architecture is required to address these issues and support the long-term growth of sophisticated AI Agents.
Design Overview
The proposed architecture is centered around three key concepts:
Information Layer on LanceDB: A logical Information Layer is introduced on top of the physical LanceDB storage. This layer provides a stable, semantic view of the data, abstracting away the underlying implementation details. It organizes data into three distinct families: ctx_agent (core Agent business data), ctx_kb (external knowledge), and ctx_meta (internal system metadata).
Layered Data Model (L0-L2): Inspired by industry best practices, we adopt a three-layer data model to optimize retrieval and processing:
L2 (Raw Content): The immutable source of truth.
L1 (Structured & Vector): The primary retrieval target, containing cleaned, chunked, and vectorized data.
L0 (Abstract/Summary): A high-level summary layer for efficient pre-filtering and low-cost relevance assessment.
Task-Oriented Agent Interfaces: A set of high-level interfaces (Add, Search, Explain, Trace, Prune, Archive) are defined to provide Agents with intuitive, task-oriented capabilities for managing their context, memory, and knowledge.
Detailed Design
Overall Architecture and Data Layers
We propose an overall architecture that includes an Information Layer, which provides a stable logical view for Agent applications on top of the physical storage (LanceDB).
flowchart TD
subgraph "Agent Application"
Agent["AI Agent"]
end
subgraph "Information Layer (lance-context)"
direction LR
subgraph "Agent Interfaces"
direction TB
Add["Add"]
Search["Search"]
Prune["Prune"]
Archive["Archive"]
end
subgraph "Table Families"
direction TB
ctx_agent["ctx_agent (Core)"]
ctx_kb["ctx_kb (Knowledge)"]
ctx_meta["ctx_meta (Internal)"]
end
subgraph "Data Layers"
direction TB
L0["L0 (Summary)"]
L1["L1 (Structured/Vector)"]
L2["L2 (Raw Content)"]
end
end
subgraph "Physical Storage"
LanceDB["LanceDB"]
end
Agent --> Add
Agent --> Search
Agent --> Prune
Agent --> Archive
Add --> L2
L2 --> L1
L1 --> L0
Search -- "queries" --> L0
Search -- "queries" --> L1
L1 -- "links to" --> L2
ctx_agent --> LanceDB
ctx_kb --> LanceDB
ctx_meta --> LanceDB
Loading
The core of this architecture is a governance strategy based on data layering and separation.
Data Layers: L0, L1, and L2
we process and store data in three logical layers to optimize retrieval efficiency and reduce LLM token consumption.
L2 (Raw Content Layer)
Semantics: Unprocessed raw data, serving as the "Source of Truth" for all information. Examples include complete conversation logs, user-uploaded original documents, and full tool-call logs.
Generation Pipeline: Data enters the system via the Add interface and is directly stored in the corresponding L2 table.
Storage Strategy: Stored in tables like ctx_agent.agent_l2_raw or ctx_kb.kb_l2_raw_documents in binary or text format.
Retrieval Strategy: Not directly involved in retrieval by default. It is accessed only for "evidence traceability" or deep analysis, via links from L1/L0.
L1 (Structured & Vector Layer)
Semantics: The result of cleaning, chunking, extracting metadata from, and generating vector embeddings for L2 data. This is the primary retrieval target of the system, balancing information density and contextual granularity.
Generation Pipeline: Triggered by a background task or write pipeline, it processes new L2 data to generate L1 records.
Storage Strategy: Chunked text and metadata are stored in ctx_agent.agent_l1_chunks, and vectors are stored in ctx_agent.agent_l1_embeddings.
Retrieval Strategy: This is the main layer for hybrid retrieval (Scalar + FTS + Vector). An Agent's Search request first retrieves candidates from this layer using vector similarity, keywords, and metadata filtering.
L0 (Abstract/Summary Layer)
Semantics: A brief summary generated for a group of L1 Chunks or an L2 object (like a session or a document). Its core function is pre-filtering, helping the Agent or retrieval strategy quickly determine if a larger entity (like an entire session) is worth exploring in-depth.
Generation Pipeline: Triggered by a background task or after L1 processing is complete, it calls an LLM to summarize L1 Chunks or L2 content.
Storage Strategy: Stored in the ctx_agent.agent_l0_summaries table and linked to the corresponding L1/L2 entities.
Retrieval Strategy: Acts as the first line of defense in retrieval. For instance, in cross-session retrieval, it can quickly filter L0 summaries of all sessions to locate the most relevant ones before performing a precise search within their L1 Chunks.
Retrieval Chain
The standard retrieval chain follows the L0 → L1 → L2 sequence:
L0 Pre-filtering: Based on the query intent, a quick, low-cost match is first performed on the L0 summary layer to identify highly relevant entities (e.g., sessions, documents).
L1 Main Retrieval: Within the scope of entities filtered by L0, or directly in the global L1 Chunks, a hybrid search is executed to recall the most relevant atomic information blocks.
L2 Evidence Traceability: The L1 retrieval results are presented to the LLM. If more complete context or fact-verification is needed, the LLM or user can trace back to the L2 raw data using the links saved in the L1 records.
Table Families and Naming Conventions
To decouple business logic from internal management, we have designed three independent table families (which can be mapped to different databases or directories in LanceDB), each with clear naming conventions and responsibilities.
ctx_agent Family: Agent Business Core
Stores runtime data directly interacting with the Agent.
Used for storing relatively static, shareable background knowledge. Its structure is similar to ctx_agent but with an independent lifecycle and management strategy.
1. Concurrently query agent_l1_chunks (FTS) and agent_l1_embeddings (Vector). 2. Use RRF to fuse results. 3. Perform scalar filtering.
400, 404
Explain
entity_id: str, entity_type: str
graph: Dict
Recursively trace relationships from the agent_relations table.
404
Trace
session_id: str
events: List[...]
Fetch all records for session_id from agent_l1_chunks and agent_tool_calls and sort by timestamp.
404
Prune
policy: Dict
job_id: str
1. Filter candidates in meta_mem_candidates. 2. Create a prune task in meta_jobs. 3. Background task performs soft delete/archiving.
400
Archive
session_id: str
job_id: str
Create an archive task in meta_jobs to migrate all session data.
404
Daily Compaction Mechanism
This mechanism distills incremental conversation data (Episodes) into valuable long-term memory (Profiles) and cleans up low-value information.
Scoring and Candidacy
A daily background task scans recent agent_sessions and agent_l1_chunks.
Scoring Dimensions: Activity, Importance, Reusability, Time Decay.
Based on a weighted score, each item is marked in meta_mem_candidates as PROMOTION, RETENTION, or PRUNING.
Promotion to Episode/Profile
A background task processes records marked for PROMOTION.
It aggregates content, calls an LLM to generate a narrative Episode or update a structured User Profile, and stores the result in a long-term memory table.
The operation is transactional and can be rolled back.
Limitations / Open Questions
Performance at Scale: While LanceDB is highly performant, the FTS and complex scalar query capabilities might become a bottleneck under heavy load. The hybrid storage model in Phase 2 is designed to mitigate this.
Write Concurrency Management: The proposed write queue adds a layer of complexity. Its implementation and tuning will be critical for high-throughput scenarios.
Cost of L0 Generation: Generating L0 summaries via LLM calls for every L1/L2 update can be costly. A selective or batch-based strategy for L0 generation might be needed.
Complexity of Query Router: The Query Router in Phase 3 is a significant engineering effort and will require careful design to handle query parsing, distribution, and result fusion correctly.
Rollout Plan / Roadmap
We recommend a phased approach to implement this design.
Phase 1: Unified Information Layer View based on LanceDB
Goal: Implement the complete L0/L1/L2 data model and core APIs (Add, Search) using only LanceDB.
Outcome: A functionally complete but limited-performance context database.
Phase 2: Hybrid Storage and Query Offloading
Goal: Introduce external specialized engines (e.g., Elasticsearch for FTS) where LanceDB's native capabilities are insufficient.
Outcome: A hybrid system with better performance and stronger query capabilities.
Phase 3: Engine Adapter Layer and Query Router
Goal: Evolve lance-context into a universal "context virtualization layer" that supports any combination of backend storage engines.
Outcome: A highly scalable context database platform completely decoupled from the underlying storage.
Checklist
Finalize table schemas for all three families.
Implement Phase 1 Add interface and async processing pipeline.
Implement Phase 1 Search interface with hybrid search capabilities.
Set up meta_jobs and a basic daily compaction framework.
Develop adapters for LangChain and LlamaIndex.
Benchmark performance of the Phase 1 implementation.
Document all public APIs and data models.
Impact Assessment
Performance: The layered data model and hybrid retrieval strategy are expected to significantly improve query performance and reduce LLM context size. Write latency will be managed via asynchronous processing and a write queue.
Cost: While LLM calls for L0/L1 generation introduce costs, the overall architecture aims to reduce token consumption during retrieval, potentially leading to net savings. The phased rollout allows for cost-effective scaling.
Compatibility: The design is framework-agnostic. By providing clean DTOs and adapters, it ensures easy integration with existing and future Agent frameworks. The reliance on LanceDB in Phase 1 simplifies initial deployment and dependencies.
Architecture & Design Discussion
Summary
This document proposes a comprehensive architecture and design for
lance-context, a high-performance, evolvable context management solution for AI Agents. The design establishes a clean separation between the logical information layer and the physical storage layer (LanceDB), defines a clear set of Agent-facing interfaces, and introduces a layered data model (L0-L2) to balance retrieval effectiveness and cost. It also includes specifications for data governance, multi-tenancy, and a phased implementation roadmap, aiming to provide a robust foundation for building and scaling complex Agent systems.Motivation / Problem Statement
Current Agent development faces several challenges:
The initial version of
lance-contextprovides a solid starting point, but a more systematic architecture is required to address these issues and support the long-term growth of sophisticated AI Agents.Design Overview
The proposed architecture is centered around three key concepts:
Information Layer on LanceDB: A logical
Information Layeris introduced on top of the physical LanceDB storage. This layer provides a stable, semantic view of the data, abstracting away the underlying implementation details. It organizes data into three distinct families:ctx_agent(core Agent business data),ctx_kb(external knowledge), andctx_meta(internal system metadata).Layered Data Model (L0-L2): Inspired by industry best practices, we adopt a three-layer data model to optimize retrieval and processing:
Task-Oriented Agent Interfaces: A set of high-level interfaces (
Add,Search,Explain,Trace,Prune,Archive) are defined to provide Agents with intuitive, task-oriented capabilities for managing their context, memory, and knowledge.Detailed Design
Overall Architecture and Data Layers
We propose an overall architecture that includes an Information Layer, which provides a stable logical view for Agent applications on top of the physical storage (LanceDB).
flowchart TD subgraph "Agent Application" Agent["AI Agent"] end subgraph "Information Layer (lance-context)" direction LR subgraph "Agent Interfaces" direction TB Add["Add"] Search["Search"] Prune["Prune"] Archive["Archive"] end subgraph "Table Families" direction TB ctx_agent["ctx_agent (Core)"] ctx_kb["ctx_kb (Knowledge)"] ctx_meta["ctx_meta (Internal)"] end subgraph "Data Layers" direction TB L0["L0 (Summary)"] L1["L1 (Structured/Vector)"] L2["L2 (Raw Content)"] end end subgraph "Physical Storage" LanceDB["LanceDB"] end Agent --> Add Agent --> Search Agent --> Prune Agent --> Archive Add --> L2 L2 --> L1 L1 --> L0 Search -- "queries" --> L0 Search -- "queries" --> L1 L1 -- "links to" --> L2 ctx_agent --> LanceDB ctx_kb --> LanceDB ctx_meta --> LanceDBThe core of this architecture is a governance strategy based on data layering and separation.
Data Layers: L0, L1, and L2
we process and store data in three logical layers to optimize retrieval efficiency and reduce LLM token consumption.
L2 (Raw Content Layer)
Addinterface and is directly stored in the corresponding L2 table.ctx_agent.agent_l2_raworctx_kb.kb_l2_raw_documentsin binary or text format.L1 (Structured & Vector Layer)
ctx_agent.agent_l1_chunks, and vectors are stored inctx_agent.agent_l1_embeddings.Searchrequest first retrieves candidates from this layer using vector similarity, keywords, and metadata filtering.L0 (Abstract/Summary Layer)
ctx_agent.agent_l0_summariestable and linked to the corresponding L1/L2 entities.Retrieval Chain
The standard retrieval chain follows the L0 → L1 → L2 sequence:
Table Families and Naming Conventions
To decouple business logic from internal management, we have designed three independent table families (which can be mapped to different databases or directories in LanceDB), each with clear naming conventions and responsibilities.
ctx_agentFamily: Agent Business CoreStores runtime data directly interacting with the Agent.
agent_sessionssession_id(PK),agent_id,user_id,status,start_time,end_time,metadata(JSON)agent_id,user_id,start_timeagent_l2_rawraw_id(PK),source_id(e.g., session_id),content(bytes/text),content_type,created_atsource_id,created_atagent_l1_chunkschunk_id(PK),raw_id,content(text),metadata(JSON),created_at,agent_idcontent; B-Tree:agent_id,created_atagent_l1_embeddingschunk_id(FK),vector(fixed_size_list),model_name,created_atvectoragent_l0_summariestarget_id(PK),target_type(session/doc),summary(text),updated_at,agent_idsummary; B-Tree:agent_id,target_typeagent_skillsskill_id(PK),name,schema(JSON),description,agent_id,versionagent_id,nameagent_tool_callscall_id(PK),session_id,tool_name,params(JSON),result(text),status,timestampsession_id,tool_name,timestampagent_relationssource_id,target_id,relation_type(e.g., 'cites', 'triggers'),agent_id,created_atsource_id,target_id,agent_idctx_kbFamily: External Knowledge BaseUsed for storing relatively static, shareable background knowledge. Its structure is similar to
ctx_agentbut with an independent lifecycle and management strategy.kb_l2_raw_documentsdoc_id(PK),source_uri,content,metadata(JSON),imported_atsource_urikb_l1_chunkschunk_id(PK),doc_id,content,metadata(JSON)contentkb_l1_embeddingschunk_id(FK),vector(fixed_size_list),model_namevectorctx_metaFamily: Internal MetadataThe system's "Information Schema," used for self-management and internal task scheduling, transparent to the Agent.
meta_tablestable_name,db_name,table_type,versionmeta_columnstable_name,column_name,data_type,is_time,is_vectormeta_jobsjob_id(PK),job_type(prune/index),payload,status,scheduled_atmeta_mem_candidatessession_id,chunk_id,score,candidate_level(promote/prune)Temporal Attributes and Indexing Conventions
created_at,timestamp) should use a uniform UTC Timestamp type.vectorfield is the primary target for vector indexing. IVF_PQ is recommended.contentorsummaryfields should have a Full-Text Search (FTS) index.Multi-Tenancy, Concurrency, and Data Governance
Multi-Tenancy and Versioning:
ctx_agentmust include anagent_id.agent_id.Concurrency and Isolation:
Addoperations.Index Maintenance and Archiving:
meta_jobsduring off-peak hours.Pruneinterface, performs a soft delete and migrates data to cold storage via a background task.Agent Interface Mapping
The interfaces provided by
lance-contextto the Agent should be task-oriented and highly abstract.flowchart TD subgraph Agent direction LR A[Add] S[Search] E[Explain] T[Trace] P[Prune] AR[Archive] end subgraph Backend direction TB subgraph ctx_agent agent_l2["agent_l2_raw"] agent_l1["agent_l1_chunks/embeddings"] agent_l0["agent_l0_summaries"] agent_relations["agent_relations"] agent_tool_calls["agent_tool_calls"] end subgraph ctx_meta meta_jobs["meta_jobs"] meta_mem_candidates["meta_mem_candidates"] end end A -- "Writes to" --> agent_l2 A -- "Triggers async write to" --> agent_l1 A -- "Triggers async write to" --> agent_l0 S -- "Queries" --> agent_l1 E -- "Traverses" --> agent_relations T -- "Fetches from" --> agent_l1 T -- "Fetches from" --> agent_tool_calls P -- "Filters in" --> meta_mem_candidates P -- "Creates job in" --> meta_jobs AR -- "Creates job in" --> meta_jobsCore Interface Prototypes (DTOs)
Addcontent: Union[str, bytes],content_type: str,session_id: Optional[str],metadata: Dictjob_id: stragent_l2_raw(L2).2. Trigger background task: write to
agent_l1_chunks/embeddings(L1).3. (Optional) Trigger LLM to write to
agent_l0_summaries(L0).Searchquery: str,filter: Dict,top_k: int,search_type: Literal[...]results: List[Chunk]agent_l1_chunks(FTS) andagent_l1_embeddings(Vector).2. Use RRF to fuse results.
3. Perform scalar filtering.
Explainentity_id: str,entity_type: strgraph: Dictagent_relationstable.Tracesession_id: strevents: List[...]session_idfromagent_l1_chunksandagent_tool_callsand sort by timestamp.Prunepolicy: Dictjob_id: strmeta_mem_candidates.2. Create a
prunetask inmeta_jobs.3. Background task performs soft delete/archiving.
Archivesession_id: strjob_id: strarchivetask inmeta_jobsto migrate all session data.Daily Compaction Mechanism
This mechanism distills incremental conversation data (Episodes) into valuable long-term memory (Profiles) and cleans up low-value information.
Scoring and Candidacy
agent_sessionsandagent_l1_chunks.meta_mem_candidatesasPROMOTION,RETENTION, orPRUNING.Promotion to Episode/Profile
PROMOTION.Limitations / Open Questions
Rollout Plan / Roadmap
We recommend a phased approach to implement this design.
Phase 1: Unified Information Layer View based on LanceDB
Add,Search) using only LanceDB.Phase 2: Hybrid Storage and Query Offloading
Phase 3: Engine Adapter Layer and Query Router
lance-contextinto a universal "context virtualization layer" that supports any combination of backend storage engines.Checklist
Addinterface and async processing pipeline.Searchinterface with hybrid search capabilities.meta_jobsand a basic daily compaction framework.Impact Assessment