🗄️ ThemisDB

High-Performance Multi-Model Database with Native AI/LLM Integration

_{ℹ️ Each badge links to a short explanation of what it shows and where to find the source of truth. See docs/en/badges for the full overview.}

What is ThemisDB?

ThemisDB is a multi-model database (scientific research) that combines relational, graph, vector, and document models in a single system with full ACID transaction support. Built on RocksDB for high performance and reliability.

"ThemisDB keeps its own llamas." – Optional native LLM integration with llama.cpp for AI workloads directly in your database.

Key Features

🔒 ACID Transactions - Full snapshot isolation with MVCC
🔍 Multi-Model - Relational, Graph, Vector, Document in one database
🚀 High Performance - 45K writes/s, 120K reads/s, CPU-optimized vector search (GPU planned for v2.x)
🛡️ Enterprise Security - TLS 1.3, RBAC, field-level encryption, audit logging
🧠 AI-Ready - Optional LLM engine, vector search, image analysis, voice assistant, autonomous prompt optimization
🌐 Modern Protocols - HTTP/2, WebSocket, gRPC, MQTT, PostgreSQL Wire, GraphQL
🏗️ Modular Architecture (v1.4.0+) - Optional modular build for faster compilation and selective features
🛡️ Production Resilience (v1.4.1+) - Circuit breakers, auto-retry, 99.99% corruption detection, network timeouts
📊 Observability & Automation (v1.4.1+) - Health checks, alerting interface, automated backup scheduling (K8s-ready)

📚 Full Documentation · 🚀 Quick Start · ❓ FAQ · Release Notes · 📁 Projektstruktur

📚 Module Documentation

ThemisDB has comprehensive documentation for all 44 modules (139 files total) with production-ready standards:

🏗️ Foundation Layer (7 modules)

Core - ConcernsContext DI framework, ILogger/ITracer/IMetrics/ICache interfaces, adapter implementations (Future Enhancements)
Storage - RocksDB MVCC wrapper, 7 blob backends (S3/Azure/GCS/MinIO/local/memory/hybrid), backup/PITR (Future Enhancements)
Transaction - MVCC concurrency control, SAGA orchestration, deadlock detection (Future Enhancements)
Themis - Core framework, module loading with X.509/GPG signatures, edition management (Future Enhancements)
Base - Base utilities and common infrastructure (Future Enhancements)
Utils - General utility functions and helpers (Future Enhancements)
Config - Backward-compatible config path resolution, LRU caching, JSON Schema validation, Prometheus metrics (Future Enhancements)

🔍 Query & Index Layer (6 modules)

Query - AQL parser/optimizer/executor, 100+ functions, CTE support (Future Enhancements)
AQL - Multi-paradigm query language (based on ArangoDB AQL), LLM integration (INFER/RAG/EMBED), hybrid queries (Future Enhancements)
Index - HNSW GPU vector search, B-tree, graph, spatial, adaptive indexes (Future Enhancements)
Search - Full-text search with BM25 ranking (Future Enhancements)
Temporal - Time-travel queries, AS OF, bitemporal support (Future Enhancements)
TimeSeries - Time-series optimized storage and queries (Future Enhancements)

🔒 Security & Auth (3 modules)

Security - AES-256-GCM encryption, Vault/HSM/PKI integration, RBAC, compliance (SOC 2/NIST/GDPR) (Future Enhancements)
Auth - JWT, Kerberos/GSSAPI, MFA (TOTP), rate limiting (Future Enhancements)
Governance - Data governance policies and compliance frameworks (Future Enhancements)

🌐 Server & Network (4 modules)

Server - 7 protocols (HTTP/1.1/2/3, WebSocket, MQTT, PostgreSQL, gRPC), 40+ API handlers (Future Enhancements)
Network - Wire protocol, connection pooling, TLS/mTLS, zero-copy I/O (Future Enhancements)
API - REST API layer implementation (Future Enhancements)
Sharding - Horizontal partitioning and distribution (Future Enhancements)

🧠 Intelligence Layer (6 modules)

RAG - 23 components: RAG Judge (faithfulness/relevance/completeness), Knowledge Gap Detector, LLM Bridge, Bias Detector (Future Enhancements)
LLM - LLM integration framework with llama.cpp (Future Enhancements)
Analytics - OLAP (CUBE/ROLLUP), process mining, CEP, SIMD vectorization (4.5x-6.9x speedup) (Future Enhancements)
Voice - NLU pipeline (STT→LLM→TTS), speaker diarization, meeting protocols (Future Enhancements)
Prompt Engineering - Prompt template lifecycle (CRUD, versioning, A/B testing), self-improvement orchestrator, injection detection (Future Enhancements)
Training - Domain-specific LLM fine-tuning: auto-labeling, incremental LoRA adapter training, knowledge graph enrichment (Future Enhancements)

📊 Operations (4 modules)

Performance - Cycle metrics, RCU lock-free reads, LIRS cache, mimalloc, feature flags (Future Enhancements)
Observability - Prometheus integration, profiling, flame graphs, automated issue detection (Future Enhancements)
Updates - Hot-reload (zero-downtime), schema migration, atomic rollback (Future Enhancements)
Scheduler - Cron scheduling, 3-stage hybrid retention (Gorilla→Adaptive→Time-based, 99.9% compression) (Future Enhancements)

🔄 Data Integration (5 modules)

Importers - Data import from various sources (Future Enhancements)
Exporters - Data export to multiple formats (Future Enhancements)
CDC - Change Data Capture for real-time data replication (Future Enhancements)
Plugins - Plugin system for extensibility (Future Enhancements)
Ingestion - Multi-source data intake (filesystem, HuggingFace, REST API), rate limiting, checkpointing, quarantine queue (Future Enhancements)

🌍 Distributed Systems (2 modules)

Replication - Raft consensus, multi-master with vector clocks, WAL shipping, 50K-100K writes/sec (Future Enhancements)
Sharding - Horizontal scaling and data distribution (Future Enhancements)

🎯 Specialized (4 modules)

Graph - 5 traversal algorithms (BFS/DFS/Dijkstra/A*/Bidirectional), 12 constraint types (Future Enhancements)
Chimera - Vendor-neutral CHIMERA benchmark adapter (Future Enhancements)
Geo - Advanced geospatial features and queries (Future Enhancements)
Acceleration - Hardware acceleration (GPU, SIMD, etc.) (Future Enhancements)

🛠️ Utility (4 modules)

Metadata - Schema introspection and system catalog (Future Enhancements)
GPU - GPU utilities and memory management (Future Enhancements)
Cache - Multi-level caching layer (Future Enhancements)
Content - Content management utilities (Future Enhancements)

📖 Documentation Standards

Each module includes enterprise-grade documentation:

✅ Module Purpose & Scope - Clear description with boundaries
✅ Key Components - Main classes, functions, and structures
✅ Architecture - Design patterns with ASCII diagrams
✅ Integration Points - Dependencies and module interactions
✅ API/Usage Examples - 50+ working code examples per major module
✅ Performance Characteristics - Benchmarks and tuning guides
✅ Known Limitations - Current constraints and workarounds
✅ Production Status - Readiness indicators
✅ Future Roadmap - Planned features with target versions
✅ Research Foundation - 100+ peer-reviewed paper citations

Total Documentation: 139 files · 500+ code examples · 80+ architecture diagrams · ~1MB technical content

Quick Start

Request Flow Overview

flowchart LR
    A[Client Request] --> B{Protocol}
    B -->|REST/HTTP| C[HTTP Server]
    B -->|gRPC| D[gRPC Server]
    B -->|WebSocket| E[WebSocket Server]
    
    C & D & E --> F[Authentication]
    F --> G[Rate Limiting]
    G --> H[Query Parser]
    H --> I[Query Optimizer]
    I --> J[Execution Engine]
    
    J --> K{Operation Type}
    K -->|Read| L[MVCC Read]
    K -->|Write| M[Transaction]
    K -->|Query| N[Index Lookup]
    
    L & M & N --> O[Storage Layer]
    O --> P[Response]
    P --> Q[Client]
    
    style A fill:#e1f5ff
    style O fill:#ffe1e1
    style Q fill:#e1ffe1

🐳 Docker (Recommended)

# Pull and run the latest version
docker pull themisdb/themisdb:latest

# Run with Docker
docker run -d \
  --name themis \
  -p 8080:8080 \
  -p 18765:18765 \
  -p 4318:4318 \
  -v themis_data:/data \
  themisdb/themisdb:latest

# Verify installation
curl http://localhost:8080/health

Default Ports:

8080 - HTTP/REST API, GraphQL
18765 - Binary Wire Protocol, gRPC
4318 - OpenTelemetry/Prometheus metrics

📖 Complete Port Reference: See docs/de/deployment/PORT_REFERENCE.md

💻 From Source

Quick Build with CMake Presets (Recommended)

# Clone repository
git clone https://github.com/makr-code/ThemisDB.git
cd ThemisDB

# Initialize submodules (vcpkg, llama.cpp)
git submodule update --init --recursive

# Configure with a preset
cmake --preset community-release

# Build
cmake --build --preset community-release

# Start server
./build-community-release/bin/themis_server --config config.yaml

Traditional Build Method

# Clone repository
git clone https://github.com/makr-code/ThemisDB.git
cd ThemisDB

# Setup and build (Linux/macOS)
./scripts/setup.sh
./scripts/build.sh

# Setup and build (Windows)
.\scripts\setup.ps1
.\scripts\build.ps1

# Start server
./build/themis_server --config config.yaml

📖 Build Documentation:

CMake Presets Guide - Use presets for simplified builds

Cross-Compilation Guide - Build for ARM64, ARMv7, Windows

Build Strategy Guide - Detailed build instructions

Edition Comparison - Choose the right edition

🔧 Modular Build (v1.4.0+): Enable modular architecture to resolve Windows COFF symbol limits and improve build times:
cmake -B build -DTHEMIS_BUILD_MODULAR=ON
cmake --build build
See docs/architecture/MODULARIZATION_GUIDE.md for details.

Deployment Architecture

graph TB
    subgraph "Production Deployment"
        subgraph "Edge Layer"
            CDN[CDN/Edge Cache]
            WAF[Web Application Firewall]
        end
        
        subgraph "Application Layer"
            APP1[Client Application 1]
            APP2[Client Application 2]
            APP3[Client Application 3]
        end
        
        subgraph "Database Layer"
            subgraph "ThemisDB Cluster"
                DB1[ThemisDB Node 1<br/>Leader]
                DB2[ThemisDB Node 2<br/>Follower]
                DB3[ThemisDB Node 3<br/>Follower]
            end
        end
        
        subgraph "Monitoring & Observability"
            PROM[Prometheus]
            GRAF[Grafana]
            JAEGER[Jaeger Tracing]
        end
        
        subgraph "Backup & Recovery"
            BACKUP[Backup Storage<br/>S3/Object Store]
        end
    end
    
    CDN --> WAF
    WAF --> APP1 & APP2 & APP3
    APP1 & APP2 & APP3 --> DB1
    DB1 -.Replication.-> DB2 & DB3
    
    DB1 --> PROM
    PROM --> GRAF
    DB1 --> JAEGER
    DB1 -.Backup.-> BACKUP
    
    style DB1 fill:#e1ffe1
    style DB2 fill:#e1ffe1
    style DB3 fill:#e1ffe1
    style PROM fill:#e1f5ff
    style GRAF fill:#e1f5ff

📦 Package Managers

Linux (Debian/Ubuntu):

# Download the latest release from GitHub
wget https://github.com/makr-code/ThemisDB/releases/latest/download/themisdb_amd64.deb
sudo apt install ./themisdb_amd64.deb
sudo systemctl start themisdb

macOS (Homebrew):

brew install themisdb
brew services start themisdb

Windows (Chocolatey):

choco install themisdb

5-Minute Tutorial

Data Models Integration

graph TB
    subgraph "Application Use Cases"
        UC1[User Profiles<br/>Document Model]
        UC2[Social Graph<br/>Graph Model]
        UC3[Recommendations<br/>Vector Search]
        UC4[Metrics<br/>Time-Series]
    end
    
    subgraph "ThemisDB Unified API"
        API[Single API Endpoint]
    end
    
    subgraph "Query Processing"
        PARSER[AQL Parser]
        OPT[Query Optimizer]
    end
    
    subgraph "Execution Layer"
        DOC[Document Engine]
        GRAPH[Graph Engine]
        VECTOR[Vector Engine]
        TS[Time-Series Engine]
    end
    
    subgraph "Storage"
        STORAGE[RocksDB<br/>Unified Key-Value Store]
    end
    
    UC1 --> API
    UC2 --> API
    UC3 --> API
    UC4 --> API
    
    API --> PARSER
    PARSER --> OPT
    
    OPT --> DOC
    OPT --> GRAPH
    OPT --> VECTOR
    OPT --> TS
    
    DOC --> STORAGE
    GRAPH --> STORAGE
    VECTOR --> STORAGE
    TS --> STORAGE
    
    style API fill:#e1f5ff
    style STORAGE fill:#ffe1e1

# 1. Check server health
curl http://localhost:8080/health

# 2. Create an entity
curl -X PUT http://localhost:8080/entities/users:alice \
  -H "Content-Type: application/json" \
  -d '{"blob":"{\"name\":\"Alice\",\"age\":30,\"city\":\"Berlin\"}"}'

# 3. Create an index
curl -X POST http://localhost:8080/index/create \
  -H "Content-Type: application/json" \
  -d '{"table":"users","column":"city"}'

# 4. Query by index
curl -X POST http://localhost:8080/query \
  -H "Content-Type: application/json" \
  -d '{"table":"users","predicates":[{"column":"city","value":"Berlin"}],"return":"entities"}'

# 5. View metrics
curl http://localhost:8080/metrics

💡 Learn More:

🚀 10-Minute Quickstart - Hello World and CRUD operations
📚 Examples Index - Browse 37+ examples by feature
🎓 Learning Paths - Guided paths for different roles

Schema Management API

ThemisDB provides a comprehensive Schema Manager for database introspection and schema customization:

# Get all table schemas
curl http://localhost:8080/api/v1/schema

# Get specific table schema
curl http://localhost:8080/api/v1/schema/tables/users

# Create/update custom schema
curl -X PUT http://localhost:8080/api/v1/schema/products \
  -H "Content-Type: application/json" \
  -d '{
    "name": "products",
    "type": "relational",
    "properties": [
      {"name": "id", "type": "integer", "indexed": true, "nullable": false},
      {"name": "name", "type": "string", "nullable": true},
      {"name": "price", "type": "double", "nullable": false}
    ],
    "indexes": [
      {"name": "id", "type": "regular", "unique": true, "columns": ["id"]}
    ]
  }'

# Partial update (PATCH)
curl -X PATCH http://localhost:8080/api/v1/schema/products \
  -H "Content-Type: application/json" \
  -d '{
    "properties": [
      {"name": "description", "type": "string", "nullable": true}
    ]
  }'

# Get database capabilities
curl http://localhost:8080/api/v1/capabilities

Supported Schema Types:

relational - Traditional table with structured columns
document - Flexible document/JSON storage
graph_node - Graph database nodes
graph_edge - Graph database edges/relationships
vector - Vector embeddings for AI/ML

Supported Property Types:

string, integer, double, boolean, vector, binary, null

Supported Index Types:

regular, range, sparse, geo, ttl, fulltext, composite

Features:

✅ Automatic schema discovery from data
✅ Custom schema definitions with validation
✅ Partial updates (PATCH)
✅ Persistent storage in RocksDB
✅ Thread-safe caching with 60s TTL
✅ Comprehensive validation (names, types, references)

📖 More Info: Operations Handbook - Schema Management

Core Capabilities

Architecture Overview

graph TB
    subgraph "Client Layer"
        C1[REST API]
        C2[GraphQL]
        C3[gRPC]
        C4[Wire Protocol]
        C5[Native SDKs]
    end
    
    subgraph "API & Server Layer"
        S1[HTTP Server]
        S2[Authentication]
        S3[Rate Limiting]
        S4[Load Shedding]
    end
    
    subgraph "Query Layer"
        Q1[AQL Parser]
        Q2[Query Optimizer]
        Q3[Execution Engine]
        Q4[Function Libraries]
        Q5[CTE Cache]
        Q6[Semantic Cache]
    end
    
    subgraph "Transaction & Concurrency Layer"
        T1[MVCC]
        T2[Transaction Manager]
        T3[SAGA Coordinator]
        T4[Deadlock Detection]
        T5[WAL Management]
    end
    
    subgraph "Index Layer"
        I1[Vector HNSW]
        I2[Graph]
        I3[Secondary]
        I4[Spatial]
        I5[Fulltext]
        I6[GPU Acceleration]
        I7[SIMD Optimization]
    end
    
    subgraph "Storage Layer"
        ST1[RocksDB LSM-tree]
        ST2[Key Schema]
        ST3[Compression]
        ST4[WAL]
        ST5[Snapshot Management]
        ST6[Compaction]
    end
    
    subgraph "Cross-Cutting Concerns"
        X1[Security]
        X2[Replication]
        X3[Sharding]
        X4[Monitoring]
        X5[CDC]
    end
    
    C1 & C2 & C3 & C4 & C5 --> S1
    S1 --> S2 --> S3 --> S4
    S4 --> Q1 --> Q2 --> Q3
    Q3 --> Q4 & Q5 & Q6
    Q3 --> T1
    T1 --> T2 --> T3
    T2 --> T4 & T5
    T3 --> I1 & I2 & I3 & I4 & I5
    I1 & I2 --> I6 & I7
    I1 & I2 & I3 & I4 & I5 --> ST1
    ST1 --> ST2 & ST3 & ST4 & ST5 & ST6
    ST1 -.-> X1 & X2 & X3 & X4 & X5
    
    style I1 fill:#e1f5ff
    style I2 fill:#e1f5ff
    style I3 fill:#e1f5ff
    style I4 fill:#e1f5ff
    style I5 fill:#e1f5ff
    style ST1 fill:#ffe1e1
    style X1 fill:#fff3cd
    style X2 fill:#fff3cd
    style X3 fill:#fff3cd
    style X4 fill:#fff3cd
    style X5 fill:#fff3cd

Multi-Model Database

Relational: SQL-like queries with secondary indexes
Graph: BFS, Dijkstra, A* traversals with path constraints
Vector: HNSW and FAISS for similarity search (CPU-optimized, GPU via FAISS)
Document: JSON storage with flexible schema
Time-Series: Gorilla compression, continuous aggregates

graph LR
    subgraph "Unified Storage"
        LSM[RocksDB LSM-Tree]
    end
    
    subgraph "Data Models"
        REL[Relational Model<br/>Tables & Rows]
        GRAPH[Graph Model<br/>Nodes & Edges]
        VECTOR[Vector Model<br/>Embeddings]
        DOC[Document Model<br/>JSON Documents]
        TS[Time-Series<br/>Metrics & Events]
    end
    
    REL --> LSM
    GRAPH --> LSM
    VECTOR --> LSM
    DOC --> LSM
    TS --> LSM
    
    style LSM fill:#ffe1e1
    style REL fill:#e1ffe1
    style GRAPH fill:#e1ffe1
    style VECTOR fill:#e1ffe1
    style DOC fill:#e1ffe1
    style TS fill:#e1ffe1

Transaction Support

sequenceDiagram
    participant Client
    participant TxManager as Transaction Manager
    participant MVCC as MVCC Engine
    participant Storage as RocksDB Storage
    
    Client->>TxManager: BEGIN TRANSACTION
    TxManager->>MVCC: Get Snapshot (timestamp)
    MVCC-->>TxManager: Snapshot ID
    TxManager-->>Client: Transaction Handle
    
    Client->>TxManager: READ (key)
    TxManager->>MVCC: Read at Snapshot
    MVCC->>Storage: Get versioned data
    Storage-->>MVCC: Data with version
    MVCC-->>TxManager: Consistent read
    TxManager-->>Client: Data
    
    Client->>TxManager: WRITE (key, value)
    TxManager->>MVCC: Check conflicts
    MVCC-->>TxManager: No conflicts
    TxManager->>Storage: Write with version
    Storage-->>TxManager: Written
    TxManager-->>Client: OK
    
    Client->>TxManager: COMMIT
    TxManager->>MVCC: Validate & commit
    MVCC->>Storage: Apply changes atomically
    Storage-->>MVCC: Success
    MVCC-->>TxManager: Committed
    TxManager-->>Client: Transaction Complete

Full ACID guarantees with snapshot isolation
Write-write conflict detection
Atomic updates across all index types

Security & Compliance

graph TB
    subgraph "Client Layer"
        CLIENT[Client Application]
    end
    
    subgraph "Transport Security"
        TLS[TLS 1.3<br/>Certificate Validation]
        MTLS[Mutual TLS<br/>Client Certificates]
    end
    
    subgraph "Authentication & Authorization"
        AUTH[Authentication<br/>JWT/OAuth2]
        RBAC[Role-Based Access Control<br/>Permissions Matrix]
        POLICY[Policy Engine<br/>Apache Ranger]
    end
    
    subgraph "Application Security"
        RATELIMIT[Rate Limiting<br/>DDoS Protection]
        AUDIT[Audit Logging<br/>SIEM Integration]
        INPUT[Input Validation<br/>SQL Injection Prevention]
    end
    
    subgraph "Data Security"
        ENCRYPT[Field-Level Encryption<br/>AES-256-GCM]
        HSM[Hardware Security Module<br/>Key Management]
        MASKING[Data Masking<br/>PII Protection]
    end
    
    subgraph "Storage Security"
        STORAGE[Encrypted Storage<br/>At-Rest Encryption]
        BACKUP[Encrypted Backups<br/>Secure Recovery]
    end
    
    CLIENT --> TLS
    TLS --> MTLS
    MTLS --> AUTH
    AUTH --> RBAC
    RBAC --> POLICY
    POLICY --> RATELIMIT
    RATELIMIT --> INPUT
    INPUT --> AUDIT
    AUDIT --> ENCRYPT
    ENCRYPT --> HSM
    HSM --> MASKING
    MASKING --> STORAGE
    STORAGE --> BACKUP
    
    style TLS fill:#ffe1e1
    style AUTH fill:#ffe1e1
    style ENCRYPT fill:#ffe1e1
    style STORAGE fill:#ffe1e1

TLS 1.3 with mTLS support
Role-Based Access Control (RBAC)
Field-level encryption
Audit logging with SIEM integration

🔒 Compliance & Audit Framework (v1.4.1+):

ThemisDB maintains comprehensive compliance with international security standards through a structured audit framework:

Standards Coverage: ISO 27001, NIST CSF, OWASP ASVS Level 2, BSI C5, SOC 2, SLSA Level 3
Automated Audits: Continuous SAST/DAST scanning, dependency checks, coverage analysis
Audit Documentation: docs/audit-framework/
- Audit Charter & Planning - Framework governance and methodology
- Audit Gate Template - 113-point checklist for release audits
- Audit Runbook - Step-by-step execution guide
- Compliance Mapping - 400+ controls mapped to ThemisDB features
CI/CD Integration: Automated audit checks on every PR (audit-check.yml)

📋 See also: Security Policy | Compliance Documentation

Distribution & Scaling

graph TB
    subgraph "Client Applications"
        APP[Applications]
    end
    
    subgraph "Routing Layer"
        SR[Shard Router<br/>VCC-URN Partitioning]
        SM[Shard Manager<br/>Metadata & Health]
        REBAL[Auto Rebalancer<br/>Load Distribution]
    end
    
    subgraph "ThemisDB Cluster - RAID Modes"
        subgraph "MIRROR Mode RF=2"
            subgraph "Shard 1"
                S1P[Primary Node]
                S1R[Replica Node]
            end
            
            subgraph "Shard 2"
                S2P[Primary Node]
                S2R[Replica Node]
            end
        end
        
        subgraph "PARITY Mode 4+2"
            S3[Data Shard 1]
            S4[Data Shard 2]
            S5[Data Shard 3]
            S6[Data Shard 4]
            P1[Parity Shard 1]
            P2[Parity Shard 2]
        end
    end
    
    subgraph "Observability"
        MON[Monitoring<br/>Metrics & Health]
    end
    
    APP --> SR
    SR --> SM
    SM --> REBAL
    
    SR --> S1P & S2P
    S1P -.Replication.-> S1R
    S2P -.Replication.-> S2R
    
    SR --> S3 & S4 & S5 & S6
    S3 & S4 & S5 & S6 -.Parity.-> P1 & P2
    
    SM --> MON
    REBAL -.Auto-Balance.-> S1P & S2P & S3 & S4
    
    style SR fill:#e1f5ff
    style S1P fill:#e1ffe1
    style S2P fill:#e1ffe1
    style S3 fill:#e1ffe1
    style S4 fill:#e1ffe1
    style S5 fill:#e1ffe1
    style S6 fill:#e1ffe1
    style P1 fill:#fff3cd
    style P2 fill:#fff3cd

VCC-URN based sharding with consistent hashing (Enterprise)
RAID-like redundancy modes: MIRROR, STRIPE, PARITY, GEO_MIRROR (Enterprise)
Auto-rebalancing with zero-downtime migration (Enterprise)
Multi-region deployment support (Enterprise)

→ View All Features

Production Resilience (v1.4.1+)

ThemisDB includes comprehensive safe-fail mechanisms for production reliability:

🛡️ Circuit Breaker Patterns

GPU/LLM Safe-Fail Manager - Automatic CPU fallback when GPU fails

State machine: HEALTHY → DEGRADED → CIRCUIT_OPEN
Memory pressure monitoring (OOM prevention)
Operation timeouts detect hung kernels
< 1µs overhead per operation

Database Connection Manager - Connection pooling with health monitoring

2-10 connections (configurable), 40% overhead reduction
Exponential backoff retry (100ms → 30s)
Automatic stale connection removal
~10µs overhead per acquire/release

Network Timeout Handler - Prevents hanging connections

Accept/read/write timeouts (5s/30s/30s defaults)
TCP keepalive & TCP_NODELAY
Protection against Slowloris DoS attacks
~5-10µs overhead per operation

Transaction Auto-Retry - Automatic retry with exponential backoff

Intelligent error classification (retryable vs non-retryable)
Jitter support prevents thundering herd
Circuit breaker integration
~3µs overhead on success path

🔒 Data Integrity

Research-Backed Protection (Based on Bairavasundaram et al. 2008, Bonwick et al. 2010)

Paranoid checks: 99.99% corruption detection (~5% read overhead)
XXH3 checksums: 3x faster than CRC32 (~2% read overhead)
Background verification: During compaction (0% read overhead)
mmap disabled: Prevents hidden I/O errors (< 1% overall impact)

📊 Reliability Metrics

Metric	Before v1.4.1	After v1.4.1	Improvement
Availability	99.5%	99.95%+	+0.45%
Automatic Recovery	Manual	99.9%	+99.9%
Corruption Detection	None	99.99%	+99.99%
Manual Intervention	High	-90%	-90%
Transaction Success	~95%	99.9%	+4.9%

Total System Overhead: < 1% (safe-fail) + ~7% read (integrity checks, configurable)

📚 Documentation:

Safe-Fail Mechanisms - Technical guide
Database File Robustness - Academic research
Network Timeout Handling - Complete guide
Transaction Auto-Retry - Retry strategies
mmap Performance Impact - Detailed analysis

Editions

Edition	License	Features	Use Case
🔹 Minimal	Open Source (MIT)	Core database only	Embedded systems, IoT, edge devices
🆓 Community	Open Source (MIT)	Full-featured single-node	Development, startups, single-server
🔒 Enterprise	Commercial	+ Horizontal scaling, HA, replication	Large-scale production deployments

→ Minimal Edition Details | → Enterprise Edition Details

Edge AI & SoC Deployment

ThemisDB supports native LLM integration with llama.cpp on System-on-Chip (SoC) devices for edge AI deployments.

🎯 Supported Platforms

Raspberry Pi 4/5 - ARM64, NEON-optimized
Orange Pi 5 / Rock 5B - ARM Mali GPU, NPU acceleration
NVIDIA Jetson - CUDA GPU acceleration
AI Accelerators - Coral TPU, Hailo, Intel NCS2

🚀 Quick Setup Example (Raspberry Pi 5)

# config/config-rpi5-llm.yaml
llm:
  enabled: true
  model_path: "/data/models/phi-3-mini-4k-instruct.Q4_K_M.gguf"
  context_size: 4096
  threads: 4
  enable_caching: true

Performance: ~2-3 tokens/second (Phi-3-Mini 3.8B)

📚 Documentation

🌟 Complete SoC Guide - Comprehensive guide (German)
⚡ Quick Reference - Fast configuration reference
🔧 Raspberry Pi Tuning - System optimization

Key Features:

✅ Local AI inference without cloud dependency
✅ Data sovereignty and privacy
✅ 10-50x more energy efficient than desktop GPUs
✅ Models: TinyLlama (1B), Phi-3 (3.8B), Mistral (7B)
✅ RAG, embeddings, chat, and text generation
✅ Autonomous prompt optimization with A/B testing and rollback (learn more)

Documentation

📚 Complete Documentation Hub: https://makr-code.github.io/ThemisDB/

🎯 Documentation Quick Access

Category	Description	Link
📑 Category Index	Browse all docs by category	View Index →
🚀 Quick Start	5-minute setup guide	Get Started →
💡 Use Cases	E-Commerce, IoT, RAG/LLM, SaaS	Browse →
🎓 Tutorials	Hands-on learning paths	Learn →
🏆 Certification	Professional certifications	Get Certified →
📚 Knowledge Base	Troubleshooting & tips	Search →

Documentation Structure

graph TB
    HUB[📚 Documentation Hub]
    
    HUB --> START[🚀 Getting Started]
    HUB --> USECASE[💡 Use Cases]
    HUB --> TUTORIAL[🎓 Tutorials]
    HUB --> CERT[🏆 Certification]
    HUB --> KB[📚 Knowledge Base]
    HUB --> CORE[📖 Core Docs]
    
    START --> QS[Quick Start]
    START --> INSTALL[Installation]
    START --> FIRST[First Steps]
    
    USECASE --> ECOM[E-Commerce]
    USECASE --> IOT[IoT & Sensors]
    USECASE --> RAG[RAG & LLM]
    USECASE --> SAAS[SaaS Multi-Tenancy]
    
    TUTORIAL --> CRUD[CRUD Operations]
    TUTORIAL --> SCHEMA[Schema Design]
    TUTORIAL --> BP[Best Practices]
    TUTORIAL --> VIDEO[Video Tutorials]
    
    CERT --> FUND[Fundamentals]
    CERT --> QUERY[Query Expert]
    CERT --> OPS[Operations]
    CERT --> SEC[Security]
    
    KB --> TROUBLE[Troubleshooting]
    KB --> PERF[Performance Tips]
    KB --> MIG[Migration Guides]
    KB --> BACKUP[Backup & Recovery]
    
    CORE --> ARCH[Architecture]
    CORE --> AQL[AQL Language]
    CORE --> API[API Reference]
    CORE --> SECURITY[Security]
    
    style HUB fill:#e1f5ff
    style USECASE fill:#ffe1e1
    style CERT fill:#e1ffe1
    style KB fill:#fff3cd

📖 Core Documentation Categories

Getting Started:

🚀 Quick Start - Get up and running in 5 minutes
🐳 Docker Deployment - Container-based deployment
🔧 Building from Source - Compile from source code

Core Concepts:

🏗️ Architecture Overview - System design and components
💾 Multi-Model Design - Unified storage architecture
🔄 Transaction Management - ACID and MVCC details
🔍 AQL Query Language - Advanced Query Language syntax
🔀 Git/GitOps Research - Version control concepts comparison

Features:

🎯 Vector Search - Similarity search and embeddings
🕸️ Graph Operations - Graph traversals and algorithms
📈 Time-Series Engine - Time-series data handling
🔐 Security & Compliance - Security features

Operations:

⚙️ Configuration Guide - Server configuration
📊 Monitoring & Metrics - Prometheus and Grafana
💾 Backup & Recovery - Comprehensive data protection guide
⚡ Performance Tuning - Optimization tips

Development:

🤝 Contributing - How to contribute
🌿 Branching Strategy - Git Flow workflow
📖 API Reference - REST and GraphQL APIs
📦 Client SDKs - Available client libraries

LLM/LoRA System:

✅ LLM Core Status (Master) - Single source of truth for implementation status
📊 Comprehensive Audit Report - Detailed code audit findings
🔍 Decision Matrix - Resolution of conflicting documentation
📋 Progress Checklist - Detailed task tracking
📚 Archived Docs - Historical documentation (superseded)
✅ Status: Core 100% production-ready, Integration 95% complete
🎓 NEW: Legal LoRA Training Pipeline - Multi-source ingestion + auto-labeling + knowledge graph enrichment for domain-specific legal AI training
- Multi-source data ingestion (HuggingFace, filesystem, OCR support)
- Auto-labeling with Legal Modality Analyzer (PR #1 integration)
- Knowledge graph enrichment for contextual training
- Incremental training with version management
- Tutorial: Custom Document Ingestion

Audit Reports:

📋 v1.4.1 Audit Reports - Complete audit package for v1.4.1
- Executive Summary - Overall audit opinion: ✅ APPROVED WITH CONDITIONS (89.3/100)
- Code Quality Audit - SAST analysis, TODO inventory, metrics (89/100)
- Security Controls Audit - 58 controls assessed (90/100)
- Test Coverage Audit - Unit 87%, Integration 95%, E2E 72% (88/100)
- Compliance Audit - ISO 27001, NIST, OWASP, BSI C5, SOC 2, GDPR (95/100)
- Findings & Risks - 62 findings: 3 critical, 7 high, 22 medium, 30 low
- Performance Audit - 45K writes/s, 123K reads/s (92/100)
🔒 Audit Framework - Comprehensive audit methodology and tools
📊 Compliance: 95.3% across 428 controls (ISO 27001, NIST, OWASP, BSI C5, SOC 2, GDPR)
🎯 Status: Production-ready with v1.4.2 remediation required (3 critical findings)

Performance

Test Environment: Release build, Windows x64, 20 cores @ 3696 MHz

Operation	Throughput	Latency (avg)
📝 Entity PUT	45,000 ops/s	0.02 ms
📖 Entity GET	120,000 ops/s	0.008 ms
🔍 Indexed Query	3.4M queries/s	0.29 μs
🕸️ Graph Traverse	9.56M ops/s	0.105 μs
🎯 Vector Search	59.7M queries/s	0.017 μs
📊 Vector Insert (384D)	411k vectors/s	2.44 μs

Note: Benchmarks represent optimal conditions. Actual performance varies based on hardware, data size, and workload.

CHIMERA Suite - Scientific Benchmark Framework

ThemisDB performance is evaluated using the CHIMERA Suite (Comprehensive Hybrid Inferencing & Multi-model Evaluation Resource Assessment) - an industry-leading, vendor-neutral benchmark framework for multi-model databases with AI integration.

Key Features:

🔬 IEEE/ACM compliant scientific methodology
🎯 Multi-model workload testing (Graph, Vector, Relational, Document)
🤖 Native AI/LLM benchmark support (inference, LoRA, RAG)
🌐 Vendor-neutral, color-blind friendly reporting
📊 Statistical rigor with confidence intervals

📊 CHIMERA Suite Documentation | Complete Benchmark Results

Independent Benchmarking

ThemisDB performance can be independently evaluated using the CHIMERA Suite - a vendor-neutral, IEEE-compliant benchmarking framework that supports fair comparison across multiple database systems.

CHIMERA Suite features:

Vendor-neutral reporting and visualization
Statistical rigor (IEEE Std 2807-2022 compliant)
Color-blind friendly design
Support for multiple database systems (PostgreSQL, MongoDB, Neo4j, ThemisDB, and more)

Learn more: CHIMERA Suite Documentation

Performance Dashboard & Monitoring

ThemisDB includes a comprehensive Performance Dashboard for visualizing benchmark trends, detecting regressions, and monitoring performance across releases and branches.

Features:

📊 Real-time Grafana Dashboard - Throughput, latency, error rates
🔍 Automatic Regression Detection - CI/CD integration with configurable thresholds
📈 Historical Tracking - Performance trends over time
🌿 Branch Comparisons - Compare main, develop, and feature branches
🏷️ Release Tracking - Performance evolution across versions
🖥️ Hardware Comparison - Test on different configurations
🚨 Alerts & Notifications - Slack/Email alerts for regressions

Quick Start:

# Start dashboard
cd grafana && docker-compose up -d

# Access at http://localhost:3000 (admin/admin)

📊 Performance Dashboard Documentation | Quick Start Guide | Example Charts

Community & Support

Resource	Description	Link
📚 Documentation	Complete guides and API reference	Docs Site
🚀 Production Ops	Deployment, monitoring, troubleshooting	Operations Guide
🐛 Issues	Report bugs or request features	GitHub Issues
💬 Discussions	Community Q&A and discussions	GitHub Discussions
🤝 Contributing	How to contribute to ThemisDB	Contributing Guide
🔒 Security	Responsible disclosure policy	Security Policy

License

Community Edition: Released under the MIT License - Free to use, modify, and distribute.

Enterprise Edition: Available under commercial license with additional features (horizontal sharding, advanced analytics, HA/replication).

Enterprise Inquiries: sales@themisdb.com

Acknowledgments

ThemisDB builds upon excellent open-source projects:

RocksDB - High-performance LSM-Tree storage engine
FAISS - Efficient similarity search library
llama.cpp - LLM inference engine (optional)
ArangoDB - Multi-model architecture inspiration
CozoDB - Hybrid relational-graph-vector design inspiration

→ Complete Attribution & Dependencies
→ Implementation Origins & Code Attribution (Historical)

Contributing & Community

We welcome contributions! Please see our:

🤝 Contributing Guide - Development workflow and guidelines
📋 Code of Conduct - Community standards
💬 Support - How to get help
🔒 Security Policy - Reporting security issues

CI/CD Architecture

ThemisDB uses a modern, consolidated CI/CD architecture (February 2026):

20 workflows (down from 53, 62% reduction)
12 entry workflows for PR validation, releases, security, testing
7 reusable workflows for shared functionality
8 composite actions for common steps

Key Workflows:

ci-pull-request.yml - Fast PR validation (~15-30 min)
ci-release.yml - Complete release pipeline
security.yml - Comprehensive security scanning
nightly.yml - Extended test suite

Documentation:

📖 CI/CD Architecture - Complete architecture guide
🔧 Workflow README - All workflows documented
📁 Archived Workflows - Historical workflows (51 archived)

All changes are automatically validated through CI/CD pipelines ensuring code quality, security, and performance standards.

Built with ❤️ for the database community

⭐ Star us on GitHub · 📖 Read the Docs · 🤝 Contribute

Name		Name	Last commit message	Last commit date
Latest commit History 7,160 Commits
.devcontainer		.devcontainer
.github		.github
.tools		.tools
.vscode.example		.vscode.example
.vscode		.vscode
adapters		adapters
aql		aql
archive/releases		archive/releases
artifacts		artifacts
benchmarks		benchmarks
certs		certs
clients		clients
cmake		cmake
compendium		compendium
config		config
data		data
debian		debian
deploy		deploy
docker		docker
docs		docs
examples		examples
ffmpeg @ 7e3781e		ffmpeg @ 7e3781e
fuzz		fuzz
gh		gh
grafana		grafana
helm/themisdb		helm/themisdb
include		include
issues		issues
llama.cpp @ 1e8924f		llama.cpp @ 1e8924f
llm_cache		llm_cache
openapi		openapi
operator		operator
packaging		packaging
plugins		plugins
ports-overlays/xsimd		ports-overlays/xsimd
ports/faiss		ports/faiss
projects		projects
prometheus		prometheus
proto		proto
releases/themisdb-1.4.1-dev-alpha-windows-x64-dev		releases/themisdb-1.4.1-dev-alpha-windows-x64-dev
scripts		scripts
sdks		sdks
security		security
src		src
symbols		symbols
tests		tests
tools		tools
vcpkg/downloads		vcpkg/downloads
wordpress-plugins		wordpress-plugins
.clang-format		.clang-format
.clang-tidy		.clang-tidy
.copilot-cross-compile-prompt.md		.copilot-cross-compile-prompt.md
.copilot-cross-compile-rules.json		.copilot-cross-compile-rules.json
.coverage		.coverage
.cppcheck		.cppcheck
.cppcheck-suppressions		.cppcheck-suppressions
.dockerignore		.dockerignore
.docs-validation.example.yml		.docs-validation.example.yml
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitleaks.toml		.gitleaks.toml
.gitmodules		.gitmodules
.license-policy.json		.license-policy.json
.markdownlint.json		.markdownlint.json
.pre-commit-config.yaml		.pre-commit-config.yaml
.secret-scan-allowlist.txt		.secret-scan-allowlist.txt
.secrets.baseline		.secrets.baseline
ARCHITECTURE.md		ARCHITECTURE.md
CHANGELOG.md		CHANGELOG.md
CMakeLists.txt		CMakeLists.txt
CMakePresets.json		CMakePresets.json
CMakeUserPresets.json		CMakeUserPresets.json
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
CURRENT_STATUS.md		CURRENT_STATUS.md
Dockerfile		Dockerfile
Dockerfile.community-simple		Dockerfile.community-simple
Dockerfile.prebuilt-helper		Dockerfile.prebuilt-helper
Dockerfile.prebuilt-local		Dockerfile.prebuilt-local
EUROPEAN_ACADEMIC_MODELS_GUIDE.md		EUROPEAN_ACADEMIC_MODELS_GUIDE.md
GAP_ANALYSIS_SUMMARY.md		GAP_ANALYSIS_SUMMARY.md
GAP_IMPLEMENTATION_SUMMARY.md		GAP_IMPLEMENTATION_SUMMARY.md
INDEX.md		INDEX.md
LICENSE		LICENSE
LLM_DEPLOYMENT_PLUGIN_SUMMARY.md		LLM_DEPLOYMENT_PLUGIN_SUMMARY.md
METADATA_IMPLEMENTATION_SUMMARY.md		METADATA_IMPLEMENTATION_SUMMARY.md
README.md		README.md
RELEASE_TYPE		RELEASE_TYPE
REVIEW_SUMMARY.txt		REVIEW_SUMMARY.txt
SECURITY.md		SECURITY.md
SETUP.md		SETUP.md
SUPPORT.md		SUPPORT.md
VERSION		VERSION
build-log.txt		build-log.txt
docker-compose.user-storage.yml		docker-compose.user-storage.yml
docker-compose.yml		docker-compose.yml
feature_enhancement.md		feature_enhancement.md
mkdocs-nopdf.yml		mkdocs-nopdf.yml
mkdocs.yml		mkdocs.yml

Folders and files

Latest commit

History

Repository files navigation

🗄️ ThemisDB

What is ThemisDB?

Key Features

📚 Module Documentation

📖 Documentation Standards

Quick Start

Request Flow Overview

🐳 Docker (Recommended)

💻 From Source

Quick Build with CMake Presets (Recommended)

Traditional Build Method

Deployment Architecture

📦 Package Managers

5-Minute Tutorial

Data Models Integration

Schema Management API

Core Capabilities

Architecture Overview

Multi-Model Database

Transaction Support

Security & Compliance

Distribution & Scaling

Production Resilience (v1.4.1+)

🛡️ Circuit Breaker Patterns

🔒 Data Integrity

📊 Reliability Metrics

Editions

Edge AI & SoC Deployment

🎯 Supported Platforms

🚀 Quick Setup Example (Raspberry Pi 5)

📚 Documentation

Documentation

🎯 Documentation Quick Access

Documentation Structure

📖 Core Documentation Categories

Performance

CHIMERA Suite - Scientific Benchmark Framework

Independent Benchmarking

Performance Dashboard & Monitoring

Community & Support

License

Acknowledgments

Contributing & Community

CI/CD Architecture

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages