An interactive Enterprise AI tool to learn, compare, visualize, and evaluate chunking strategies for Retrieval-Augmented Generation (RAG).
Chunking is one of the most important—but often overlooked—steps in a RAG pipeline.
Different document types require different chunking strategies to maximize retrieval quality and LLM response accuracy.
This simulator allows developers, AI engineers, and architects to:
- Compare multiple chunking strategies
- Visualize chunk boundaries
- Evaluate chunk quality
- Explore metadata generated during chunking
- Understand when to use each strategy
- Generate LangChain code snippets
- Benchmark chunking approaches for enterprise AI systems
- Fixed Character
- Fixed Word
- Fixed Token
- Sliding Window
- Paragraph
- Sentence
- Recursive Character Splitter
- Markdown Header Splitter
- HTML Header Splitter
- Recursive JSON Splitter
- Python Code Splitter
- JavaScript Code Splitter
- Semantic Similarity Chunking
- Sentence Transformers
- Configurable similarity threshold
- LLM based chunking
✅ TXT
✅ Markdown
✅ DOCX
✅ HTML
✅ JSON
✅ CSV
✅ Python
✅ JavaScript
✅ TypeScript
✅ JSX / TSX
After processing a document the simulator displays
- Total Chunks
- Average Characters
- Minimum Chunk Size
- Maximum Chunk Size
- Average Words
The simulator automatically evaluates every chunking strategy.
Evaluates
- Chunk size consistency
- Boundary quality
- Readability
Measures how well context is preserved across chunk boundaries.
Shows how much useful metadata has been generated.
Each chunk can include metadata such as
- Chunk Type
- Break Reason
- Similarity Score
- Language
- Section Information
The simulator analyzes the document and recommends the most suitable strategy.
| Document | Recommendation |
|---|---|
| Plain Text | Recursive |
| Markdown | Markdown Header |
| HTML | HTML Header |
| JSON | Recursive JSON |
| Python | Python Code |
| JavaScript | JavaScript Code |
| Multi-topic Text | Semantic Similarity |
Compare every chunking strategy using
- Total Chunks
- Average Chunk Size
- Minimum Size
- Maximum Size
- Average Words
Visual side-by-side comparison of
- Recursive Character Splitter
- Semantic Similarity Chunker
Compare
- Chunk boundaries
- Metadata
- Similarity
- Chunk sizes
- Export JSON
- Copy Chunks
- Generate LangChain Code
- Copy LangChain Code
Unit tests cover
- Fixed Character Chunker
- Fixed Word Chunker
- Sliding Window Chunker
- Paragraph Chunker
- Sentence Chunker
- Recursive Chunker
- Markdown Chunker
- HTML Chunker
- JSON Chunker
- Python Chunker
- JavaScript Chunker
- Semantic Chunker
- Adaptive Chunker
- Metadata Chunker
- Parent Child Chunker
- Summary Attached Chunker
- LLM assisted Chunker
Run all tests
cd backend
source venv/bin/activate
pytest -v +---------------------+
| Next.js UI |
+----------+----------+
|
|
REST API (FastAPI)
|
+----------------------+----------------------+
| |
| |
Chunk Service Upload Service
|
|
+---------------------------+
| Chunking Strategy Layer |
+---------------------------+
| Fixed |
| Recursive |
| Sliding |
| Paragraph |
| Sentence |
| Markdown |
| HTML |
| JSON |
| Code |
| Semantic |
+---------------------------+
|
|
Evaluation Engine
|
|
Metadata + Statistics
|
|
JSON Response
rag-chunking-simulator/
├── backend/
│ ├── app/
│ │ ├── api/
│ │ ├── chunkers/
│ │ ├── services/
│ │ ├── utils/
│ │ └── main.py
│ │
│ ├── tests/
│ ├── requirements.txt
│ └── pytest.ini
│
├── frontend/
│ ├── app/
│ ├── components/
│ ├── public/
│ └── types/
│
└── README.md
cd backend
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
uvicorn app.main:app --reloadBackend
http://localhost:8000
Swagger
http://localhost:8000/docs
cd frontend
npm install
npm run devFrontend
http://localhost:3000
- Multiple chunking strategies
- Semantic chunking
- File upload
- Strategy comparison
- Recursive vs Semantic comparison
- Recommendation engine
- Metadata visualization
- Chunk quality metrics
- Export JSON
- LangChain code generation
- Unit tests
- Retrieval Simulator
- Embedding Visualization
- RAPTOR Chunking
- OCR-aware Chunking
- Multi-document Benchmarking
- Docker
- GitHub Actions CI/CD
- Vector Database Integration
- RAG Evaluation Dashboard
Contributions, ideas, and feature requests are welcome.
- Fork the repository
- Create a feature branch
- Commit your changes
- Open a Pull Request
MIT License
Jay Ram Singh
AI Engineer | Enterprise AI Architect | RAG & LLM Systems
- GitHub: https://github.com/code-jay
- LinkedIn: https://www.linkedin.com/in/jayram/
⭐ If you find this project useful, consider giving it a Star!






