Master the art of designing optimal database schemas for ThemisDB. Learn multi-model design patterns, normalization strategies, and real-world examples.
- ✅ Multi-model schema design principles
- ✅ Normalization vs. denormalization trade-offs
- ✅ Index strategy and placement
- ✅ Relationship modeling patterns
- ✅ Real-world schema examples
- ✅ Performance optimization techniques
Prerequisites: CRUD Tutorial, Batch Operations
Time Required: 45 minutes
Difficulty: Intermediate to Advanced
- Design Principles
- Entity-Attribute Model
- Normalization Strategies
- Relationship Patterns
- Index Design
- Real-World Examples
- Migration Patterns
Before designing, answer:
- How will data be queried most often?
- What are the read/write ratios?
- What are the latency requirements?
- What consistency guarantees are needed?
Example:
E-commerce scenario:
- Products: 90% reads, 10% writes → Optimize for reads
- Orders: 50% reads, 50% writes → Balance both
- Analytics: 100% reads → Aggressive denormalization OK
ThemisDB supports multiple models. Choose based on use case:
| Model | Best For | Example |
|---|---|---|
| Key-Value | Simple lookups | User sessions, cache |
| Document | Hierarchical data | Product catalogs, CMS |
| Graph | Relationships | Social networks, recommendations |
| Vector | Similarity search | Image search, semantic search |
✅ Good approach:
1. Start with simple schema
2. Monitor performance
3. Add indexes where needed
4. Denormalize hot paths
5. Split large entities
❌ Bad approach:
1. Over-engineer from day 1
2. Premature optimization
3. Complex schema before understanding access patterns
In ThemisDB, an entity is a unique identifier with attributes:
Format: namespace:key
Example: users:alice, products:12345, orders:ord-2024-001
Best Practices:
- Use meaningful namespaces
- Keep keys human-readable when possible
- Use consistent naming conventions
Two approaches:
1. Blob Storage (Simple)
{
"entity_id": "users:alice",
"blob": "{\"name\":\"Alice\",\"email\":\"alice@example.com\",\"age\":30}"
}- Pros: Simple, flexible
- Cons: Can't query individual fields efficiently
2. Structured Attributes (Advanced)
{
"entity_id": "users:alice",
"attributes": {
"name": "Alice",
"email": "alice@example.com",
"age": 30,
"created_at": "2025-01-24T10:00:00Z"
}
}- Pros: Query individual fields, selective indexing
- Cons: Slightly more complex
💡 Pro Tip: Start with blobs, migrate to structured attributes when you need to query specific fields.
Example: E-commerce
// users:alice
{
"entity_id": "users:alice",
"attributes": {
"name": "Alice Johnson",
"email": "alice@example.com"
}
}
// orders:order-001
{
"entity_id": "orders:order-001",
"attributes": {
"user_id": "users:alice",
"total": 129.99,
"status": "shipped"
}
}
// order_items:item-001
{
"entity_id": "order_items:item-001",
"attributes": {
"order_id": "orders:order-001",
"product_id": "products:laptop-001",
"quantity": 1,
"price": 129.99
}
}Pros:
- ✅ No data duplication
- ✅ Easy to update
- ✅ Consistent data
Cons:
- ❌ Multiple queries to fetch related data
- ❌ Slower read performance
// orders:order-001 (with embedded data)
{
"entity_id": "orders:order-001",
"attributes": {
"user": {
"id": "users:alice",
"name": "Alice Johnson",
"email": "alice@example.com"
},
"items": [
{
"product_id": "products:laptop-001",
"product_name": "ThinkPad X1",
"quantity": 1,
"price": 129.99
}
],
"total": 129.99,
"status": "shipped",
"created_at": "2025-01-24T10:00:00Z"
}
}Pros:
- ✅ Single query to fetch everything
- ✅ Fast reads
- ✅ Perfect for display
Cons:
- ❌ Data duplication
- ❌ Harder to update (must update multiple places)
- ❌ Possible inconsistency
Normalize by default, denormalize read-heavy paths:
// orders:order-001 (hybrid)
{
"entity_id": "orders:order-001",
"attributes": {
"user_id": "users:alice",
"user_name": "Alice Johnson", // Denormalized for display
"item_ids": ["order_items:item-001"],
"item_summary": "1 item(s)", // Denormalized for quick display
"total": 129.99,
"status": "shipped"
}
}When to denormalize:
- Read-heavy data (90%+ reads)
- Display-only fields (names, summaries)
- Performance-critical paths
When to normalize:
- Frequently updated data
- Source of truth
- Data integrity is critical
Example: User → Profile
// users:alice
{
"entity_id": "users:alice",
"attributes": {
"name": "Alice",
"email": "alice@example.com",
"profile_id": "profiles:alice"
}
}
// profiles:alice
{
"entity_id": "profiles:alice",
"attributes": {
"bio": "Software Engineer",
"avatar": "https://...",
"website": "https://alice.dev"
}
}Alternative (Embedded):
// users:alice
{
"entity_id": "users:alice",
"attributes": {
"name": "Alice",
"email": "alice@example.com",
"profile": {
"bio": "Software Engineer",
"avatar": "https://...",
"website": "https://alice.dev"
}
}
}Example: User → Orders
Approach 1: Reference in child (Normalized)
// users:alice
{
"entity_id": "users:alice",
"attributes": {"name": "Alice"}
}
// orders:order-001
{
"entity_id": "orders:order-001",
"attributes": {
"user_id": "users:alice",
"total": 99.99
}
}
// Query: Find all orders for Alice
POST /query
{
"table": "orders",
"predicates": [{"column": "user_id", "value": "users:alice"}]
}Approach 2: Array of references in parent (Denormalized)
// users:alice
{
"entity_id": "users:alice",
"attributes": {
"name": "Alice",
"order_ids": ["orders:order-001", "orders:order-002"]
}
}Example: Students ↔ Courses
Approach: Junction entity
// students:alice
{
"entity_id": "students:alice",
"attributes": {"name": "Alice"}
}
// courses:cs101
{
"entity_id": "courses:cs101",
"attributes": {"name": "Computer Science 101"}
}
// enrollments:alice-cs101
{
"entity_id": "enrollments:alice-cs101",
"attributes": {
"student_id": "students:alice",
"course_id": "courses:cs101",
"enrolled_at": "2025-01-01",
"grade": "A"
}
}Query: Find all courses for Alice
curl -X POST http://localhost:8080/query \
-d '{
"table": "enrollments",
"predicates": [{"column": "student_id", "value": "students:alice"}],
"return": "entities"
}'For complex relationships, use graph model:
# Create edge between entities
curl -X POST http://localhost:8080/graph/edge \
-d '{
"from": "users:alice",
"to": "users:bob",
"type": "follows",
"properties": {"since": "2025-01-01"}
}'
# Query: Find all users Alice follows
curl -X POST http://localhost:8080/graph/traverse \
-d '{
"start": "users:alice",
"direction": "outbound",
"edge_type": "follows",
"depth": 1
}'Golden Rules:
- Index columns used in WHERE clauses
- Index columns used in ORDER BY
- Index foreign key columns
- Don't over-index (slows writes)
1. B-Tree Index (General Purpose)
# Create B-Tree index on age for range queries
curl -X POST http://localhost:8080/index/create \
-d '{
"table": "users",
"column": "age",
"type": "btree"
}'
# Supports: =, !=, <, >, <=, >=, BETWEEN2. Hash Index (Exact Match)
# Create Hash index on email for fast lookups
curl -X POST http://localhost:8080/index/create \
-d '{
"table": "users",
"column": "email",
"type": "hash"
}'
# Supports: = only (but VERY fast)3. Composite Index
# Create composite index for multi-column queries
curl -X POST http://localhost:8080/index/create \
-d '{
"table": "orders",
"columns": ["user_id", "status", "created_at"],
"type": "btree"
}'
# Optimizes: WHERE user_id=? AND status=? ORDER BY created_at4. Vector Index (Similarity Search)
# Create vector index for embeddings
curl -X POST http://localhost:8080/index/create \
-d '{
"table": "documents",
"column": "embedding",
"type": "vector",
"dimension": 768,
"metric": "cosine"
}'Example: E-commerce queries
// Common queries:
1. Find product by ID → Hash index on product_id
2. Search products by category → B-Tree index on category
3. Filter by price range → B-Tree index on price
4. Sort by popularity → B-Tree index on sales_count
5. Similar products → Vector index on embedding
Indexes to create:
✅ products.product_id (hash)
✅ products.category (btree)
✅ products.price (btree)
✅ products.(category, price) (composite)
✅ products.embedding (vector)
❌ Bad: Too many indexes
users.id (hash)
users.email (hash)
users.name (btree)
users.age (btree)
users.city (btree)
users.country (btree)
users.created_at (btree)
users.updated_at (btree)
// 8 indexes = slow writes!
✅ Good: Strategic indexes
users.id (hash) ← Primary key lookup
users.email (hash) ← Login authentication
users.(city, age) (composite) ← Common filter
// 3 indexes = fast reads + fast writes
Entities:
// users:alice
{
"entity_id": "users:alice",
"attributes": {
"username": "alice",
"display_name": "Alice Johnson",
"bio": "Software Engineer",
"follower_count": 1520, // Denormalized for performance
"following_count": 342
}
}
// posts:post-12345
{
"entity_id": "posts:post-12345",
"attributes": {
"author_id": "users:alice",
"author_name": "Alice Johnson", // Denormalized
"content": "Just deployed ThemisDB!",
"like_count": 45, // Denormalized
"comment_count": 12, // Denormalized
"created_at": "2025-01-24T10:00:00Z"
}
}
// follows:alice-bob (graph edge)
{
"entity_id": "follows:alice-bob",
"attributes": {
"from_user": "users:alice",
"to_user": "users:bob",
"created_at": "2025-01-20T08:00:00Z"
}
}Indexes:
# User lookups
POST /index/create {"table": "users", "column": "username", "type": "hash"}
# Timeline queries (posts by followed users)
POST /index/create {"table": "posts", "columns": ["author_id", "created_at"], "type": "btree"}
# Graph traversal (followers/following)
POST /index/create {"table": "follows", "column": "from_user", "type": "btree"}
POST /index/create {"table": "follows", "column": "to_user", "type": "btree"}Why this design?
- Denormalized counts for fast display
- Graph edges for relationship queries
- Composite index for timeline optimization
// products:laptop-001
{
"entity_id": "products:laptop-001",
"attributes": {
"sku": "LAPTOP-001",
"name": "ThinkPad X1 Carbon",
"description": "High-performance laptop",
"price": 1299.99,
"original_price": 1499.99,
"discount_percent": 13,
"category": "Electronics > Laptops",
"tags": ["business", "ultralight", "premium"],
"stock": 15,
"rating_avg": 4.7, // Denormalized
"review_count": 234, // Denormalized
"embedding": [0.1, 0.2, ...] // For similar products
}
}
// orders:order-2024-001
{
"entity_id": "orders:order-2024-001",
"attributes": {
"user_id": "users:alice",
"user_name": "Alice Johnson", // Denormalized
"status": "shipped",
"total": 1299.99,
"shipping_address": {...},
"items": [ // Embedded for performance
{
"product_id": "products:laptop-001",
"product_name": "ThinkPad X1 Carbon",
"quantity": 1,
"price": 1299.99
}
],
"created_at": "2025-01-24T10:00:00Z"
}
}
// reviews:review-001
{
"entity_id": "reviews:review-001",
"attributes": {
"product_id": "products:laptop-001",
"user_id": "users:alice",
"user_name": "Alice J.", // Partial denormalization
"rating": 5,
"title": "Excellent laptop!",
"content": "Best purchase ever...",
"verified_purchase": true,
"created_at": "2025-01-25T14:30:00Z"
}
}Indexes:
# Product search and filtering
POST /index/create {"table": "products", "column": "category", "type": "btree"}
POST /index/create {"table": "products", "columns": ["category", "price"], "type": "btree"}
POST /index/create {"table": "products", "column": "embedding", "type": "vector", "dimension": 768}
# Order management
POST /index/create {"table": "orders", "columns": ["user_id", "status"], "type": "btree"}
POST /index/create {"table": "orders", "columns": ["status", "created_at"], "type": "btree"}
# Reviews
POST /index/create {"table": "reviews", "column": "product_id", "type": "btree"}
POST /index/create {"table": "reviews", "columns": ["product_id", "created_at"], "type": "btree"}// sensors:temp-sensor-001
{
"entity_id": "sensors:temp-sensor-001",
"attributes": {
"name": "Temperature Sensor 1",
"location": "Building A, Floor 3",
"type": "temperature",
"unit": "celsius",
"last_reading": 22.5, // Denormalized for quick check
"last_update": "2025-01-24T15:30:00Z",
"status": "online"
}
}
// readings:temp-sensor-001-2025-01-24-15-30-00
{
"entity_id": "readings:temp-sensor-001-2025-01-24-15-30-00",
"attributes": {
"sensor_id": "sensors:temp-sensor-001",
"sensor_type": "temperature", // Denormalized for queries
"value": 22.5,
"timestamp": "2025-01-24T15:30:00Z",
"quality": "good"
}
}Time-Series Optimization:
# Partition readings by time (hourly buckets)
# readings:temp-sensor-001-2025-01-24-15
# Contains all readings for that hour
# Index for time-range queries
POST /index/create {"table": "readings", "columns": ["sensor_id", "timestamp"], "type": "btree"}Adding a Field:
# Old entities don't need migration - just use default
curl -X GET http://localhost:8080/entities/users:alice
# Check if 'bio' exists, use default if notChanging Field Type:
# Migrate string ID to integer
def migrate_user_ids():
users = query_all_users()
for user in users:
old_id = user['attributes']['id'] # String
new_id = int(old_id) # Integer
user['attributes']['id'] = new_id
update_entity(user)Splitting an Entity:
# Split user entity into user + profile
def split_user_profile():
users = query_all_users()
for user in users:
# Create new profile entity
profile = {
"entity_id": f"profiles:{user['entity_id'].split(':')[1]}",
"attributes": {
"bio": user['attributes'].get('bio'),
"avatar": user['attributes'].get('avatar')
}
}
create_entity(profile)
# Remove profile fields from user
del user['attributes']['bio']
del user['attributes']['avatar']
update_entity(user)-
Design for queries, not structure
- Understand access patterns first
- Optimize for common queries
-
Use namespaces consistently
users:* products:* orders:* -
Denormalize read-heavy data
- Display names
- Counts and aggregates
- Frequently accessed relationships
-
Version your schema
{ "entity_id": "users:alice", "schema_version": 2, "attributes": {...} } -
Monitor and iterate
- Start simple
- Add indexes as needed
- Refactor based on metrics
-
Don't over-normalize
- Causes excessive joins
- Poor read performance
-
Don't create indexes blindly
- Each index slows writes
- Only index queried columns
-
Don't store large BLOBs in entities
- Keep entities < 1MB
- Store large files separately
-
Don't ignore data growth
- Plan for partitioning
- Archive old data
Before going to production:
- Analyzed access patterns
- Created indexes on queried columns
- Tested with realistic data volumes
- Monitored query performance
- Implemented caching strategy
- Planned for data growth
- Documented schema design decisions
- ✅ Multi-model design principles
- ✅ Normalization vs. denormalization trade-offs
- ✅ Index strategy and placement
- ✅ Real-world schema patterns
- ✅ Migration strategies
- ✅ Performance optimization
- Best Practices Guide - Production patterns
- Performance Guide - Optimization techniques
- Try Examples: Build a CRM or E-Commerce
Questions? Ask in GitHub Discussions