CodeDocGen is a high-quality developer tool built with FastAPI (Backend) and Streamlit (Frontend) that automatically generates comprehensive, professional Markdown documentation for Python codebases using LangChain and Google Gemini (Gemini 2.5 Flash / Flash Lite).
It handles both single Python files and entire directories (via ZIP upload), recursively parsing code structure, verifying output correctness with a custom Hallucination Detection Engine, benchmarking results against standard metrics, and visualizing quality scores.
- AST-Based Python Parsing: Recursively extracts module-level info, classes, functions, argument lists, default parameters, decorators, docstrings, and exact source snippets using Python's native Abstract Syntax Tree (
ast). - AI-Powered Documentation Generation: Translates code syntax into clean, well-formatted Markdown documentation including parameters tables, return types, and context-aware usage snippets.
- Hallucination Detection Engine: Analyzes overlap between code identifiers (variables, functions, classes) and generated documentation to score output authenticity and flag low-confidence notes.
- Project-Wide ZIP Documentation: Upload an entire Python repository as a
.ziparchive. The backend generates individual.mdfiles mirroring the folder layout alongside a consolidated project-levelREADME.md, bundling it back into a downloadable ZIP. - Evaluation & Benchmarking Suite: Automated evaluation (
evaluate_docs.py) tracking:- Flesch Reading Ease (Readability)
- BLEU Score (against reference human-written documentation)
- Response latency and code metric scaling
- LLM-as-a-Judge scoring evaluating Accuracy, Completeness, Clarity, Hallucination, and Style.
- Rich Quality Visualization: Generates scatter plots, histograms, and polar radar charts representing documentation quality scores across modules.
CodeDocGen/
│
├── backend/
│ ├── main.py # FastAPI server endpoints (/generate-docs4/, /generate-project-docs/)
│ ├── code_parser.py # AST Visitor to extract classes, methods, functions, and metadata
│ ├── doc_generator.py # Prompt engineering & LangChain/Gemini integration
│ ├── project_parser.py # Project crawler to parse directory modules recursively
│ └── utils/
│ └── save_docs.py # Utility to structure and save generated markdown outputs
│
├── frontend/
│ └── streamlit_app.py # Web dashboard UI for single-file/ZIP uploads & downloads
│
├── evaluate_docs.py # Runs benchmarking suite against test samples
├── llm_judge.py # Implements LLM-as-a-Judge quality metrics
├── plot_evaluation.py # Creates performance & metric visualization graphs
├── .gitignore # Ignore configurations (e.g. backend/.env, venv)
└── README.md # Project documentation (this file)
Ensure you are in the project root directory.
Create a .env file inside the backend/ directory:
# Path: backend/.env
GEMINI_API_KEY=your_gemini_api_key_hereInstall the required packages in your Python environment:
pip install fastapi uvicorn streamlit pydantic python-dotenv requests langchain-google-genai langchain-huggingface langchain-core nltk textstat pandas matplotlib numpy aiofilesTo run the interactive web application, start both the backend API server and the frontend UI:
From the project root, start the Uvicorn server:
uvicorn backend.main:app --reloadThe API documentation will be available at http://127.0.0.1:8000/docs.
In a new terminal window, launch the Streamlit app:
streamlit run frontend/streamlit_app.pyOpen your browser to http://localhost:8501 to use the dashboard!
The project contains a built-in suite to benchmark the quality of the generated documentation:
evaluate_docs.py processes the modules in your project, compares generated outputs to human-written references, runs LLM-as-a-Judge evaluations, and saves results to evaluation_results.csv:
python evaluate_docs.pyRun plot_evaluation.py to process the CSV results and generate analytical plots:
python plot_evaluation.pyThis will produce the following files in your root folder:
response_time_vs_size.png: Shows how processing speed scales with file size.hallucination_vs_identifiers.png: Correlates identifier counts with hallucination rates.readability_histogram.png: Maps readability score distribution.llm_radar_<filename>.png: Radar charts visualizing LLM-as-a-Judge scores (Accuracy, Completeness, Clarity, Low Hallucination, Style) for individual files.
- Description: Takes a single Python file upload, parses it, runs the Gemini generator, runs the hallucination check, and returns a JSON response.
- Payload:
file(Multipart file) - Returns:
{ "filename": "sample.py", "parsed": { ... }, "documentation_md": "# Markdown string...", "hallucination_check": { "score": 0.85, "status": "PASS", "missing_terms": [] }, "readme_path": "generated_docs/README.md" }
- Description: Processes an entire uploaded ZIP codebase, generates module-level markdown files, builds a global
README.md, and returns a downloaded.ziparchive containing thedocs/folder. - Payload:
zip_file(Multipart ZIP archive) - Returns: Binary ZIP file response (
project_docs_<timestamp>.zip).
-
Readability (Flesch Reading Ease): Uses
textstatto evaluate the linguistic simplicity of the generated docs. - BLEU Score: Computes the N-gram precision overlap against expert documentation to evaluate vocabulary match.
-
Hallucination Rate: Computes the fraction of syntax identifiers (found in the AST) missing from the generated documentation:
$$\text{Hallucination Rate} = 1.0 - \left( \frac{\text{Identifiers in Doc}}{\text{Total Identifiers in AST}} \right)$$ - LLM-As-A-Judge: Query-driven scoring (0-10) using a zero-temperature model assessing Accuracy, Completeness, Clarity, Hallucination, and Style.