📘 CodeDocGen: Automated Code Document Generator

CodeDocGen is a high-quality developer tool built with FastAPI (Backend) and Streamlit (Frontend) that automatically generates comprehensive, professional Markdown documentation for Python codebases using LangChain and Google Gemini (Gemini 2.5 Flash / Flash Lite).

It handles both single Python files and entire directories (via ZIP upload), recursively parsing code structure, verifying output correctness with a custom Hallucination Detection Engine, benchmarking results against standard metrics, and visualizing quality scores.

🚀 Key Features

AST-Based Python Parsing: Recursively extracts module-level info, classes, functions, argument lists, default parameters, decorators, docstrings, and exact source snippets using Python's native Abstract Syntax Tree (ast).
AI-Powered Documentation Generation: Translates code syntax into clean, well-formatted Markdown documentation including parameters tables, return types, and context-aware usage snippets.
Hallucination Detection Engine: Analyzes overlap between code identifiers (variables, functions, classes) and generated documentation to score output authenticity and flag low-confidence notes.
Project-Wide ZIP Documentation: Upload an entire Python repository as a .zip archive. The backend generates individual .md files mirroring the folder layout alongside a consolidated project-level README.md, bundling it back into a downloadable ZIP.
Evaluation & Benchmarking Suite: Automated evaluation (evaluate_docs.py) tracking:
- Flesch Reading Ease (Readability)
- BLEU Score (against reference human-written documentation)
- Response latency and code metric scaling
- LLM-as-a-Judge scoring evaluating Accuracy, Completeness, Clarity, Hallucination, and Style.
Rich Quality Visualization: Generates scatter plots, histograms, and polar radar charts representing documentation quality scores across modules.

📁 Project Structure

CodeDocGen/
│
├── backend/
│   ├── main.py                  # FastAPI server endpoints (/generate-docs4/, /generate-project-docs/)
│   ├── code_parser.py           # AST Visitor to extract classes, methods, functions, and metadata
│   ├── doc_generator.py         # Prompt engineering & LangChain/Gemini integration
│   ├── project_parser.py        # Project crawler to parse directory modules recursively
│   └── utils/
│       └── save_docs.py         # Utility to structure and save generated markdown outputs
│
├── frontend/
│   └── streamlit_app.py         # Web dashboard UI for single-file/ZIP uploads & downloads
│
├── evaluate_docs.py             # Runs benchmarking suite against test samples
├── llm_judge.py                 # Implements LLM-as-a-Judge quality metrics
├── plot_evaluation.py           # Creates performance & metric visualization graphs
├── .gitignore                   # Ignore configurations (e.g. backend/.env, venv)
└── README.md                    # Project documentation (this file)

⚙️ Installation & Setup

1. Clone & Set Up Directory

Ensure you are in the project root directory.

2. Configure Environment Variables

Create a .env file inside the backend/ directory:

# Path: backend/.env
GEMINI_API_KEY=your_gemini_api_key_here

3. Install Dependencies

Install the required packages in your Python environment:

pip install fastapi uvicorn streamlit pydantic python-dotenv requests langchain-google-genai langchain-huggingface langchain-core nltk textstat pandas matplotlib numpy aiofiles

🏃 Run the Application

To run the interactive web application, start both the backend API server and the frontend UI:

1. Run FastAPI Backend

From the project root, start the Uvicorn server:

uvicorn backend.main:app --reload

The API documentation will be available at http://127.0.0.1:8000/docs.

2. Run Streamlit Frontend

In a new terminal window, launch the Streamlit app:

streamlit run frontend/streamlit_app.py

Open your browser to http://localhost:8501 to use the dashboard!

📊 Evaluation & Benchmarking

The project contains a built-in suite to benchmark the quality of the generated documentation:

1. Run Benchmarks

evaluate_docs.py processes the modules in your project, compares generated outputs to human-written references, runs LLM-as-a-Judge evaluations, and saves results to evaluation_results.csv:

python evaluate_docs.py

2. Generate Performance Visualizations

Run plot_evaluation.py to process the CSV results and generate analytical plots:

python plot_evaluation.py

This will produce the following files in your root folder:

response_time_vs_size.png: Shows how processing speed scales with file size.
hallucination_vs_identifiers.png: Correlates identifier counts with hallucination rates.
readability_histogram.png: Maps readability score distribution.
llm_radar_<filename>.png: Radar charts visualizing LLM-as-a-Judge scores (Accuracy, Completeness, Clarity, Low Hallucination, Style) for individual files.

🔗 Core API Endpoints

`POST /generate-docs4/`

Description: Takes a single Python file upload, parses it, runs the Gemini generator, runs the hallucination check, and returns a JSON response.
Payload: file (Multipart file)

Returns:

{
  "filename": "sample.py",
  "parsed": { ... },
  "documentation_md": "# Markdown string...",
  "hallucination_check": {
    "score": 0.85,
    "status": "PASS",
    "missing_terms": []
  },
  "readme_path": "generated_docs/README.md"
}

`POST /generate-project-docs/`

Description: Processes an entire uploaded ZIP codebase, generates module-level markdown files, builds a global README.md, and returns a downloaded .zip archive containing the docs/ folder.
Payload: zip_file (Multipart ZIP archive)
Returns: Binary ZIP file response (project_docs_<timestamp>.zip).

🧠 Evaluation Metrics Details

Readability (Flesch Reading Ease): Uses textstat to evaluate the linguistic simplicity of the generated docs.
BLEU Score: Computes the N-gram precision overlap against expert documentation to evaluate vocabulary match.
Hallucination Rate: Computes the fraction of syntax identifiers (found in the AST) missing from the generated documentation: $$\text{Hallucination Rate} = 1.0 - \left( \frac{\text{Identifiers in Doc}}{\text{Total Identifiers in AST}} \right)$$
LLM-As-A-Judge: Query-driven scoring (0-10) using a zero-temperature model assessing Accuracy, Completeness, Clarity, Hallucination, and Style.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📘 CodeDocGen: Automated Code Document Generator

🚀 Key Features

📁 Project Structure

⚙️ Installation & Setup

1. Clone & Set Up Directory

2. Configure Environment Variables

3. Install Dependencies

🏃 Run the Application

1. Run FastAPI Backend

2. Run Streamlit Frontend

📊 Evaluation & Benchmarking

1. Run Benchmarks

2. Generate Performance Visualizations

🔗 Core API Endpoints

`POST /generate-docs4/`

`POST /generate-project-docs/`

🧠 Evaluation Metrics Details

About

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
__pycache__		__pycache__
backend		backend
frontend		frontend
generated_docs		generated_docs
generated_projects		generated_projects
results		results
tests		tests
.gitignore		.gitignore
Mini_project_ppt_Kartik_MT24AAC011_Gr_22.pptx		Mini_project_ppt_Kartik_MT24AAC011_Gr_22.pptx
README.md		README.md
evaluate_docs.py		evaluate_docs.py
evaluation_results.csv		evaluation_results.csv
llm_judge.py		llm_judge.py
plot_evaluation.py		plot_evaluation.py

Folders and files

Latest commit

History

Repository files navigation

📘 CodeDocGen: Automated Code Document Generator

🚀 Key Features

📁 Project Structure

⚙️ Installation & Setup

1. Clone & Set Up Directory

2. Configure Environment Variables

3. Install Dependencies

🏃 Run the Application

1. Run FastAPI Backend

2. Run Streamlit Frontend

📊 Evaluation & Benchmarking

1. Run Benchmarks

2. Generate Performance Visualizations

🔗 Core API Endpoints

POST /generate-docs4/

POST /generate-project-docs/

🧠 Evaluation Metrics Details

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages

`POST /generate-docs4/`

`POST /generate-project-docs/`