Sentinel-LLM is a high-impact, production-grade Retrieval-Augmented Generation (RAG) framework. Designed for enterprise scalability and strict data privacy, it delivers a fully self-contained AI ecosystem that eliminates dependency on external APIs while maintaining 100% observability over model performance and "Hallucination Control."
This project showcases a complete Modern LLMOps Lifecycle: from automated high-dimensional data ingestion to real-time semantic monitoring and unified lifecycle management.
- ⚡ High-Performance RAG: Sub-second context retrieval using Qdrant and semantic chunking.
- 🛡️ Hallucination Guardrails: Automated "Faithfulness" and "Answer Relevancy" scoring integrated into the inference pipeline via RAGAS.
- 📊 Real-time Observability: Comprehensive Grafana dashboards monitoring latency, token throughput, and retrieval drift.
- 🔄 Autonomous Lifecycles: Apache Airflow DAGs automate the ingestion of massive document silos without manual intervention.
- 📜 Prompt Governance: Version-controlled prompt engineering using MLflow, ensuring reproducibility across deployments.
- 🔒 Local-First Privacy: Powered by Ollama, ensuring all data remains within your infrastructure.
Sentinel-LLM doesn't just respond; it observes. Every query is tracked for accuracy, latency, and system health.
Sentinel-LLM follows an event-driven, multi-layered architecture where every step of the document lifecycle and inference process is tracked and validated.
graph TD
%% Define Styles
classDef data fill:#e1f5fe,stroke:#01579b,stroke-width:1px;
classDef logic fill:#f3e5f5,stroke:#4a148c,stroke-width:1px;
classDef monitor fill:#fff3e0,stroke:#e65100,stroke-width:1px;
classDef target fill:#f1f8e9,stroke:#33691e,stroke-width:2px,stroke-dasharray: 5 5;
%% 📁 DATA INGESTION WORKFLOW
subgraph Data_Pipe ["📥 Knowledge Ingestion (Airflow)"]
RAW["Raw PDFs/Docs"] -->|Monitor| AF["Airflow DAG"]
AF -->|Chunk| CHUNK["Recursive Splitting"]
CHUNK -->|Embed| VEC["Llama-3 Vectors"]
VEC -->|Index| QD[("Qdrant DB")]
end
%% 🧠 INFERENCE WORKFLOW
subgraph Inference_Engine ["🧠 RAG Inference (FastAPI)"]
USER["User Query"] -->|POST /chat| API["Sentinel Server"]
API -->|Search| QD
QD -->|Context| API
API -->|Augment| LLM["Ollama Core"]
LLM -->|Stream| API
API -->|Final Response| USER
end
%% ⚖️ GUARDRAILS & OPS
subgraph Observability_Layer ["⚖️ LLMOps & Guardrails"]
API -->|Audit| RAGAS["RAGAS Evaluator"]
RAGAS -->|Score| METRICS["Faithfulness/Relevancy"]
API -->|Trace| PROM["Prometheus"]
PROM -->|Visual| GRAF["Grafana Dashboards"]
API -->|Register| MLF[("MLflow Registry")]
end
%% Node Styles
class RAW,AF,CHUNK,VEC,QD data;
class USER,API,LLM logic;
class RAGAS,METRICS,PROM,GRAF,MLF monitor;
- Docker & Docker Compose
- 16GB+ RAM recommended for local LLM inference.
# Clone the repository
git clone https://github.com/your-username/sentinel-llm.git
cd sentinel-llm
# Initialize environment
cp .env.example .env
# Launch Infrastructure
docker compose up -ddocker exec -it sentinel_ollama ollama pull llama3curl -X POST "http://localhost:8000/chat" \
-H "Content-Type: application/json" \
-d '{"prompt": "How does Sentinel-LLM handle hallucinations?"}'sentinel-llm/
├── airflow/ # Ingestion Workflows (DAGs)
├── assets/ # README Visuals (Hero/Dashboards)
├── data/ # Document Ingestion Source
├── ingestion/ # Vectorization & Processing Logic
├── k8s/ # Production Kubernetes Manifests
├── monitoring/ # Prometheus & Grafana Configs
├── server/ # FastAPI RAG Engine
├── .env.example # Environment Template
├── docker-compose.yml # Full Stack Orchestration
└── README.md # Project Documentation
Contributions are welcome! Please feel free to submit a Pull Request.
Sentinel-LLM is released under the MIT License. See LICENSE for details.
Built with 💙 for the MLOps Community

