Analyze, visualize, and predict software defects using PROMISE research benchmarks, Git repository mining, static code metrics, and machine learning.
- PROMISE Benchmark Explorer — Load standard defect-prediction datasets (Ant, Camel, jEdit, Lucene, Tomcat), explore CK metrics, and train classifiers.
- Git Repository Analyzer — Mine commit history for code churn and bug-fix labels, extract static metrics from Python/Java files, and predict defect hotspots.
- Single File Inspector — Paste or upload source code for real-time static analysis, defect probability scoring, and refactoring recommendations.
| Layer | Tools |
|---|---|
| Data collection | Python, GitPython |
| Static analysis | Radon (Python), regex-based parser (Java) |
| ML | Scikit-learn, XGBoost |
| Data / viz | Pandas, NumPy, Plotly, Matplotlib |
| UI | Streamlit |
.
├── app.py # Streamlit dashboard
├── test_analyzer.py # Integration tests
├── requirements.txt
├── data/ # Cached PROMISE datasets (auto-downloaded)
└── src/
├── dataset_manager.py # PROMISE dataset loading & preprocessing
├── git_miner.py # Git history & churn mining
├── metrics_extractor.py# LOC, complexity, coupling metrics
└── model_trainer.py # LR / RF / XGBoost training & evaluation
pip install -r requirements.txtpython test_analyzer.pypython -m streamlit run app.pyOpen http://localhost:8501 in your browser.
- Select a dataset (e.g. Apache Ant 1.7).
- Explore defect distribution, metric scatter plots, and correlation heatmaps.
- Choose a model (Logistic Regression, Random Forest, or XGBoost) and click Train Model Now.
- Review ROC-AUC, confusion matrix, and feature importances.
- Enter a local repo path or remote Git URL (presets include Flask and Requests).
- Click Start Git & Code Analysis to mine commits and extract static metrics.
- Optionally train a repo-specific defect predictor.
- View defect hotspot predictions on a churn vs. complexity chart.
- Paste Python or Java code (or select a file mined in Tab 2).
- Click Inspect Code Quality for LOC, complexity, coupling metrics.
- If a model was trained in Tab 1 or Tab 2, see defect probability and refactoring advice.
PROMISE datasets are downloaded automatically on first use and cached in data/:
| Key | Project | Source |
|---|---|---|
| ant | Apache Ant 1.7 | PROMISE-backup |
| camel | Apache Camel 1.6 | PROMISE-backup |
| jedit | jEdit 4.3 | PROMISE-backup |
| lucene | Apache Lucene 2.4 | PROMISE-backup |
| tomcat | Apache Tomcat 6.0 | DefectData |
Static (CK-style): LOC, WMC, DIT, NOC, CBO, cyclomatic complexity, method count
Git-derived: commit frequency, code churn, bug-fix count, author count
- Python 3.10+
- Git (for repository mining features)
Made by KAVYA RAJ