Skip to content

Kartik-Burele/CodeDocGen

Repository files navigation

📘 CodeDocGen: Automated Code Document Generator

CodeDocGen is a high-quality developer tool built with FastAPI (Backend) and Streamlit (Frontend) that automatically generates comprehensive, professional Markdown documentation for Python codebases using LangChain and Google Gemini (Gemini 2.5 Flash / Flash Lite).

It handles both single Python files and entire directories (via ZIP upload), recursively parsing code structure, verifying output correctness with a custom Hallucination Detection Engine, benchmarking results against standard metrics, and visualizing quality scores.


🚀 Key Features

  • AST-Based Python Parsing: Recursively extracts module-level info, classes, functions, argument lists, default parameters, decorators, docstrings, and exact source snippets using Python's native Abstract Syntax Tree (ast).
  • AI-Powered Documentation Generation: Translates code syntax into clean, well-formatted Markdown documentation including parameters tables, return types, and context-aware usage snippets.
  • Hallucination Detection Engine: Analyzes overlap between code identifiers (variables, functions, classes) and generated documentation to score output authenticity and flag low-confidence notes.
  • Project-Wide ZIP Documentation: Upload an entire Python repository as a .zip archive. The backend generates individual .md files mirroring the folder layout alongside a consolidated project-level README.md, bundling it back into a downloadable ZIP.
  • Evaluation & Benchmarking Suite: Automated evaluation (evaluate_docs.py) tracking:
    • Flesch Reading Ease (Readability)
    • BLEU Score (against reference human-written documentation)
    • Response latency and code metric scaling
    • LLM-as-a-Judge scoring evaluating Accuracy, Completeness, Clarity, Hallucination, and Style.
  • Rich Quality Visualization: Generates scatter plots, histograms, and polar radar charts representing documentation quality scores across modules.

📁 Project Structure

CodeDocGen/
│
├── backend/
│   ├── main.py                  # FastAPI server endpoints (/generate-docs4/, /generate-project-docs/)
│   ├── code_parser.py           # AST Visitor to extract classes, methods, functions, and metadata
│   ├── doc_generator.py         # Prompt engineering & LangChain/Gemini integration
│   ├── project_parser.py        # Project crawler to parse directory modules recursively
│   └── utils/
│       └── save_docs.py         # Utility to structure and save generated markdown outputs
│
├── frontend/
│   └── streamlit_app.py         # Web dashboard UI for single-file/ZIP uploads & downloads
│
├── evaluate_docs.py             # Runs benchmarking suite against test samples
├── llm_judge.py                 # Implements LLM-as-a-Judge quality metrics
├── plot_evaluation.py           # Creates performance & metric visualization graphs
├── .gitignore                   # Ignore configurations (e.g. backend/.env, venv)
└── README.md                    # Project documentation (this file)

⚙️ Installation & Setup

1. Clone & Set Up Directory

Ensure you are in the project root directory.

2. Configure Environment Variables

Create a .env file inside the backend/ directory:

# Path: backend/.env
GEMINI_API_KEY=your_gemini_api_key_here

3. Install Dependencies

Install the required packages in your Python environment:

pip install fastapi uvicorn streamlit pydantic python-dotenv requests langchain-google-genai langchain-huggingface langchain-core nltk textstat pandas matplotlib numpy aiofiles

🏃 Run the Application

To run the interactive web application, start both the backend API server and the frontend UI:

1. Run FastAPI Backend

From the project root, start the Uvicorn server:

uvicorn backend.main:app --reload

The API documentation will be available at http://127.0.0.1:8000/docs.

2. Run Streamlit Frontend

In a new terminal window, launch the Streamlit app:

streamlit run frontend/streamlit_app.py

Open your browser to http://localhost:8501 to use the dashboard!


📊 Evaluation & Benchmarking

The project contains a built-in suite to benchmark the quality of the generated documentation:

1. Run Benchmarks

evaluate_docs.py processes the modules in your project, compares generated outputs to human-written references, runs LLM-as-a-Judge evaluations, and saves results to evaluation_results.csv:

python evaluate_docs.py

2. Generate Performance Visualizations

Run plot_evaluation.py to process the CSV results and generate analytical plots:

python plot_evaluation.py

This will produce the following files in your root folder:

  • response_time_vs_size.png: Shows how processing speed scales with file size.
  • hallucination_vs_identifiers.png: Correlates identifier counts with hallucination rates.
  • readability_histogram.png: Maps readability score distribution.
  • llm_radar_<filename>.png: Radar charts visualizing LLM-as-a-Judge scores (Accuracy, Completeness, Clarity, Low Hallucination, Style) for individual files.

🔗 Core API Endpoints

POST /generate-docs4/

  • Description: Takes a single Python file upload, parses it, runs the Gemini generator, runs the hallucination check, and returns a JSON response.
  • Payload: file (Multipart file)
  • Returns:
    {
      "filename": "sample.py",
      "parsed": { ... },
      "documentation_md": "# Markdown string...",
      "hallucination_check": {
        "score": 0.85,
        "status": "PASS",
        "missing_terms": []
      },
      "readme_path": "generated_docs/README.md"
    }

POST /generate-project-docs/

  • Description: Processes an entire uploaded ZIP codebase, generates module-level markdown files, builds a global README.md, and returns a downloaded .zip archive containing the docs/ folder.
  • Payload: zip_file (Multipart ZIP archive)
  • Returns: Binary ZIP file response (project_docs_<timestamp>.zip).

🧠 Evaluation Metrics Details

  • Readability (Flesch Reading Ease): Uses textstat to evaluate the linguistic simplicity of the generated docs.
  • BLEU Score: Computes the N-gram precision overlap against expert documentation to evaluate vocabulary match.
  • Hallucination Rate: Computes the fraction of syntax identifiers (found in the AST) missing from the generated documentation: $$\text{Hallucination Rate} = 1.0 - \left( \frac{\text{Identifiers in Doc}}{\text{Total Identifiers in AST}} \right)$$
  • LLM-As-A-Judge: Query-driven scoring (0-10) using a zero-temperature model assessing Accuracy, Completeness, Clarity, Hallucination, and Style.

About

This is mini project for 2nd Semester of Mtech in Applied AI and Communications.

Topics

Resources

Stars

Watchers

Forks

Contributors

Languages