Skip to content
/ zeph Public

Lightweight AI agent runtime in Rust — hybrid inference (Ollama, Claude, OpenAI, HuggingFace), multi-model orchestration, skill-based tool use with self-learning, semantic memory via Qdrant, code RAG with tree-sitter, MCP client, A2A protocol. CLI + Telegram + TUI

License

Notifications You must be signed in to change notification settings

bug-ops/zeph

Repository files navigation

Zeph

The AI agent that respects your resources.

Single binary. Minimal hardware. Maximum context efficiency.
Every token counts — Zeph makes sure none are wasted.

CI codecov Trivy MSRV License: MIT

Quick Start · Efficiency · Security · Docs · Architecture


The Problem

Most AI agent frameworks are token furnaces. They dump every tool description, every skill, every raw command output into the context window — and bill you for it. They need beefy servers, Python runtimes, container orchestrators, and a generous API budget just to get started.

Zeph takes the opposite approach: automated context engineering. Only relevant data enters the context. Everything else is filtered, compressed, or retrieved on demand. The result — dramatically lower costs, faster responses, and an agent that runs on hardware you already have.

Installation

Tip

curl -fsSL https://github.com/bug-ops/zeph/releases/latest/download/install.sh | sh
Other installation methods
# From source
cargo install --git https://github.com/bug-ops/zeph

# Docker
docker pull ghcr.io/bug-ops/zeph:latest

Pre-built binaries for Linux, macOS, and Windows: GitHub Releases · Docker

Quick Start

# Interactive setup wizard — configures vault backend, provider, memory, and channel settings
zeph init

# Run the agent
zeph

# Or with TUI dashboard (requires `tui` feature)
zeph --tui

Manual configuration is also supported:

# Local models — no API costs
ollama pull mistral:7b && ollama pull qwen3-embedding
zeph

# Cloud providers
ZEPH_LLM_PROVIDER=claude ZEPH_CLAUDE_API_KEY=sk-ant-... zeph
ZEPH_LLM_PROVIDER=openai ZEPH_OPENAI_API_KEY=sk-... zeph

# Multi-model routing — primary Claude, fallback Ollama
ZEPH_LLM_PROVIDER=orchestrator zeph  # configure via `zeph init`

# Any OpenAI-compatible API (Together AI, Groq, Fireworks, etc.)
ZEPH_LLM_PROVIDER=compatible ZEPH_COMPATIBLE_BASE_URL=https://api.together.xyz/v1 \
  ZEPH_COMPATIBLE_API_KEY=... zeph

Tip

Full setup walkthrough: Installation · Configuration · Secrets management

CLI Usage

zeph                     Run the agent (default)
zeph init                Interactive configuration wizard
zeph init -o path.toml   Write generated config to a specific path
zeph --tui               Run with TUI dashboard
zeph --config <path>     Use a custom config file
zeph --vault <backend>   Secrets backend: env or age
zeph --vault-key <path>  Path to age identity key file
zeph --vault-path <path> Path to age-encrypted vault file
zeph --version           Print version
zeph --help              Show help

zeph vault init          Generate age keypair and empty encrypted vault
zeph vault set KEY VAL   Encrypt and store a secret
zeph vault get KEY       Decrypt and print a secret value
zeph vault list          List stored secret keys (no values)
zeph vault rm KEY        Remove a secret from the vault

Automated Context Engineering

This is the core idea behind Zeph. Every byte that enters the LLM context window is there because it's useful for the model — not because the framework was too lazy to filter it.

Semantic Skill Selection — O(K), Not O(N)

Most frameworks inject all tool descriptions into every prompt. 50 tools installed? 50 descriptions in every request.

Zeph embeds skills and MCP tools as vectors at startup (concurrent embedding via buffer_unordered), then retrieves only the top-K relevant per query via cosine similarity. Install 500 skills — the prompt sees only the 5 that matter.

When two candidates score within a configurable threshold of each other, structured intent classification resolves the ambiguity: the agent calls the LLM with a typed IntentClassification schema and reorders candidates accordingly — no hallucination, no guessing. How skills work →

Smart Output Filtering — 70-99% Token Savings

Raw tool output is the #1 context window polluter. A cargo test run produces 300+ lines; the model needs 3. Zeph applies command-aware filters before context injection:

Filter What It Does Typical Savings
Test Cargo test/nextest — failures-only mode 94-99%
Git Compact status/diff/log/push 80-99%
Clippy Group warnings by lint rule 70-90%
Directory Hide noise dirs (target, node_modules, .git) 60-80%
Log dedup Normalize timestamps/UUIDs, count repeats 70-85%

Per-command stats shown inline, so you see exactly what was saved:

[shell] `cargo test` 342 lines → 28 lines, 91.8% filtered

Filter architecture →

Two-Tier Context Pruning

When the context window fills up, Zeph doesn't just truncate from the top.

Tier 1 — Selective eviction. Old tool output bodies are cleared from context (persisted to SQLite for recall), keeping message structure intact. No LLM call needed.

Tier 2 — LLM compaction. Only when Tier 1 isn't enough, a summarization call compresses older exchanges. A token-based protection zone shields recent messages from pruning.

Result: fewer compaction calls, lower costs, better memory of what happened. Context engineering →

Proportional Budget Allocation

Context window space is allocated by purpose, not by arrival order:

Budget Slice Allocation Purpose
Recent history 50% Current conversation flow
Code context 30% Project-relevant code via tree-sitter indexing
Summaries 8% Compressed prior exchanges
Semantic recall 8% Vector-retrieved relevant memories
Cross-session 4% Knowledge transferred from past conversations

Prompt Caching

Automatic prompt caching for Anthropic and OpenAI providers. Repeated system prompts and context blocks are served from cache — reducing latency and API costs on every turn after the first.

Additional Efficiency Measures

  • Tool output truncation at 30K chars with head+tail split and optional LLM summarization
  • Doom-loop detection breaks runaway tool cycles after 3 identical outputs
  • Parallel context preparation via try_join! — skills, memory, code context fetched concurrently
  • Byte-length token estimation — fast approximation without tokenizer overhead
  • Config hot-reload — change runtime parameters without restarting the agent
  • Auto-update check — optional daily check against GitHub releases; notification delivered to the active channel (ZEPH_AUTO_UPDATE_CHECK=false to disable)
  • Pipeline API — composable, type-safe step chains for LLM calls, vector retrieval, JSON extraction, and parallel execution

Token efficiency deep dive →

Defense-in-Depth Security

Security isn't a feature flag — it's the default. Every layer has its own protection:

flowchart TD
    Input[User Input] --> Sandbox
    Sandbox --> Permissions
    Permissions --> Confirmation
    Confirmation --> Execution
    Execution --> Redaction
    Redaction --> Output[Safe Output]

    Sandbox[Shell Sandbox<br><i>path restrictions, traversal detection</i>]
    Permissions[Tool Permissions<br><i>allow / ask / deny per tool pattern</i>]
    Confirmation[Destructive Command Gate<br><i>rm, drop, truncate require approval</i>]
    Execution[Sandboxed Execution<br><i>file sandbox, overflow-to-file, rate limiter</i>]
    Redaction[Secret Redaction<br><i>AWS, OpenAI, Anthropic, Google, GitLab</i>]

    style Sandbox fill:#e74c3c,color:#fff
    style Permissions fill:#e67e22,color:#fff
    style Confirmation fill:#f39c12,color:#fff
    style Execution fill:#27ae60,color:#fff
    style Redaction fill:#2980b9,color:#fff
Loading
Layer What It Protects Against
Shell sandbox Path traversal, unauthorized directory access
File sandbox Writes outside allowed paths
Tool permissions Glob-based allow/ask/deny policy per tool
Destructive command gate Accidental rm -rf, DROP TABLE, etc.
Secret redaction API keys leaking into context or logs (6 provider patterns)
SSRF protection Agent and MCP client requests to internal networks
Audit logging Full tool execution trace for forensics
Rate limiter TTL-based eviction, per-IP limits on gateway
Doom-loop detection Runaway tool cycles (3 identical outputs = break)
Skill trust quarantine 4-tier model (Trusted/Verified/Quarantined/Blocked) with blake3 integrity
Container scanning Trivy in CI — 0 HIGH/CRITICAL CVEs

Security model → · MCP security →

Lightweight by Design

No Expensive Hardware Required

Zeph compiles to a single static binary (~15 MB). No Python interpreter, no Node.js runtime, no JVM, no container orchestrator. Run it on a $5/month VPS, a Raspberry Pi, or your laptop.

With Ollama, you can run local models on consumer hardware — no cloud API needed. With Candle (GGUF), run models directly in-process with Metal (macOS) or CUDA (Linux) acceleration.

Rust, Not Python

Zeph Typical Python Agent
Startup time ~50ms 2-5s (import torch, langchain, ...)
Memory at idle ~20 MB 200-500 MB
Dependencies 0 system deps (rustls, no OpenSSL) Python + pip + venv + system libs
Deployment Copy one binary Dockerfile + requirements.txt + runtime
Type safety Compile-time (Rust Edition 2024) Runtime exceptions
Async Native async traits, zero-cost GIL contention, asyncio quirks

Hybrid Inference — Use What You Have

Run local models when you want privacy and zero cost. Use cloud APIs when you need capability. Mix them with the orchestrator for automatic fallback chains.

Provider Type When to Use
Ollama Local Privacy, no API costs, air-gapped environments
Candle Local (in-process) Embedded inference, Metal/CUDA acceleration
Claude Cloud Complex reasoning, tool_use
OpenAI Cloud GPT-4o, function calling, embeddings
Compatible Cloud Together AI, Groq, Fireworks — any OpenAI-compatible API
Orchestrator Multi-model Fallback chains across providers
Router Multi-model Prompt-based model selection

OpenAI guide → · Candle guide → · Orchestrator →

Skills, Not Hardcoded Prompts

Capabilities live in SKILL.md files — YAML frontmatter + markdown body. Drop a file into skills/, and the agent picks it up on the next query via semantic matching. No code changes. No redeployment.

Skills evolve: failure detection triggers self-reflection, and the agent generates improved versions — with optional manual approval before activation. A 4-tier trust model (Trusted → Verified → Quarantined → Blocked) with blake3 integrity hashing ensures that only verified skills execute privileged operations.

Self-learning → · Skill trust →

Connect Everything

Protocol What It Does
MCP Connect external tool servers (stdio + HTTP) with SSRF protection
A2A Agent-to-agent communication via JSON-RPC 2.0 with SSE streaming
Audio input Speech-to-text via OpenAI Whisper API or local Candle Whisper (offline, feature-gated); Telegram and Slack audio files transcribed automatically
Vision Image input via CLI (/image), TUI (/image), and Telegram photo messages; supported by Claude, OpenAI, and Ollama providers (20 MB max, automatic MIME detection)
Channels CLI (with persistent input history), Telegram (text + voice), Discord, Slack, TUI — all with streaming support
Gateway HTTP webhook ingestion with bearer auth and rate limiting
Native tool_use Structured tool calling via Claude/OpenAI APIs; text fallback for local models

MCP → · A2A → · Channels → · Gateway →

Built-In TUI Dashboard

A full terminal UI powered by ratatui — not a separate monitoring tool, but an integrated experience:

  • Tree-sitter syntax highlighting and markdown rendering with clickable hyperlinks (OSC 8)
  • Syntax-highlighted diff view for file edits (compact/expanded toggle)
  • @-triggered fuzzy file picker with real-time filtering (nucleo-matcher)
  • Command palette for quick access to agent actions
  • Live metrics: token usage, filter savings, cost tracking, confidence distribution
  • Conversation history with message queueing
  • Responsive input handling during streaming with render cache and event batching
  • Deferred model warmup with progress indicator
cargo build --release --features tui
./target/release/zeph --tui

TUI guide →

TUI Testing

The TUI crate uses snapshot testing (insta) for widget rendering, property-based testing (proptest) for layout constraints, and E2E terminal testing (expectrl) for interactive flows. Run snapshot tests with cargo insta test -p zeph-tui and review changes with cargo insta review. See the TUI testing docs for details.

Architecture

Agent Loop

flowchart LR
    User((User)) -->|query| Channel
    Channel -->|message| Agent
    Agent -->|context + skills| LLM
    LLM -->|tool_use / text| Agent
    Agent -->|execute| Tools
    Tools -->|filtered output| Agent
    Agent -->|recall| Memory[(Memory)]
    Agent -->|response| Channel
    Channel -->|stream| User

    subgraph Providers
        LLM
    end

    subgraph Execution
        Tools
        MCP[MCP Servers]
        A2A[A2A Agents]
        Tools -.-> MCP
        Tools -.-> A2A
    end

    style User fill:#4a9eff,color:#fff
    style Memory fill:#f5a623,color:#fff
    style LLM fill:#7b61ff,color:#fff
Loading

Crate Graph

graph TD
    ZEPH[zeph binary] --> CORE[zeph-core]
    ZEPH --> CHANNELS[zeph-channels]
    ZEPH --> TUI[zeph-tui]

    CORE --> LLM[zeph-llm]
    CORE --> SKILLS[zeph-skills]
    CORE --> MEMORY[zeph-memory]
    CORE --> TOOLS[zeph-tools]
    CORE --> MCP[zeph-mcp]
    CORE --> INDEX[zeph-index]

    CHANNELS --> CORE
    TUI --> CORE

    MCP --> TOOLS
    CORE --> A2A[zeph-a2a]
    ZEPH --> GATEWAY[zeph-gateway]
    ZEPH --> SCHEDULER[zeph-scheduler]

    SKILLS -.->|embeddings| LLM
    MEMORY -.->|embeddings| LLM
    INDEX -.->|embeddings| LLM

    classDef always fill:#2d8cf0,color:#fff,stroke:none
    classDef optional fill:#19be6b,color:#fff,stroke:none

    class ZEPH,CORE,LLM,SKILLS,MEMORY,TOOLS,CHANNELS,MCP always
    class TUI,A2A,INDEX,GATEWAY,SCHEDULER optional
Loading

Blue = always compiled  ·  Green = feature-gated

12 crates. Typed errors throughout (thiserror). Native async traits (Edition 2024). rustls everywhere — no OpenSSL dependency. zeph-core includes a Pipeline API — composable, type-safe step chains (LlmStep, RetrievalStep, ExtractStep, MapStep, ParallelStep) for building multi-stage data processing workflows.

Important

Requires Rust 1.88+. See the full architecture overview and crate reference.

Feature Flags

Always compiled in: openai, compatible, orchestrator, router, self-learning, qdrant, vault-age, mcp.

Flag What It Adds
tui Terminal dashboard with live metrics
candle Local HuggingFace inference (GGUF)
metal / cuda GPU acceleration (macOS / Linux)
discord / slack Bot adapters
a2a Agent-to-agent protocol
index AST-based code indexing
gateway HTTP webhook ingestion
daemon Component supervisor
pdf PDF document loading for RAG
stt Speech-to-text via OpenAI Whisper API
scheduler Cron-based periodic tasks; auto-update check runs daily at 09:00
otel OpenTelemetry OTLP export
full Everything above
cargo build --release                     # default (always-on features)
cargo build --release --features full     # everything
cargo build --release --features tui      # with dashboard

Feature flags reference →

Documentation

bug-ops.github.io/zeph — installation, configuration, guides, and API reference.

Contributing

See CONTRIBUTING.md for development workflow and guidelines.

Security

Found a vulnerability? Please use GitHub Security Advisories for responsible disclosure.

License

MIT

About

Lightweight AI agent runtime in Rust — hybrid inference (Ollama, Claude, OpenAI, HuggingFace), multi-model orchestration, skill-based tool use with self-learning, semantic memory via Qdrant, code RAG with tree-sitter, MCP client, A2A protocol. CLI + Telegram + TUI

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors 3

  •  
  •  
  •  

Languages