Skip to content

aashishkumar-tech/mlproject

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

60 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

End-to-End ML Deployment with CI/CD Pipeline

A complete MLOps project showcasing automated deployment of a Machine Learning application to AWS EC2 using Docker, GitHub Actions, and ECR.

This project demonstrates production-grade ML deployment with continuous integration and delivery. It features a student performance prediction model deployed as a Flask web application with fully automated CI/CD pipeline.

Deploy to EC2 Python Docker AWS CI/CD

๐Ÿ”— Live Production Deployment: http://34.228.159.84/


๐ŸŽฏ Project Highlights

This project focuses on MLOps and Cloud Deployment, featuring:

  • โœ… Complete CI/CD Pipeline: Automated testing โ†’ Docker build โ†’ ECR push โ†’ EC2 deployment
  • โœ… AWS Cloud Infrastructure: EC2, ECR, IAM, Security Groups, and VPC configuration
  • โœ… Containerization: Docker multi-stage builds with optimized image size
  • โœ… Infrastructure as Code: GitHub Actions workflow for automated deployments
  • โœ… Production Best Practices: Gunicorn WSGI server, health checks, logging, monitoring
  • โœ… Zero-Downtime Deployment: Automated container replacement with health verification
  • โœ… Cost-Optimized: Runs on AWS Free Tier (t2.micro)

๐Ÿ“‹ Table of Contents


๐Ÿ—๏ธ Deployment Architecture

GitHub Repository (Push to main)
          โ†“
GitHub Actions Workflow
          โ†“
    โ”Œโ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”
    โ”‚           โ”‚
   CI Job    Build & Push to ECR
    โ”‚           โ”‚
    โ””โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”˜
          โ†“
    Deploy to EC2
          โ†“
    Docker Container
          โ†“
  Flask App (Production)

Deployment Flow:

  1. Code pushed to main branch triggers GitHub Actions
  2. CI job runs tests and linting
  3. Docker image built and pushed to AWS ECR with version tags
  4. SSH deployment to EC2 pulls latest image
  5. Old container stopped, new container started with health check
  6. Application live at public IP with zero downtime

โœจ Features

Deployment & DevOps

  • Automated CI/CD: Push code โ†’ Auto-deploy to production in ~3 minutes
  • Docker Containerization: Consistent environments from dev to production
  • AWS ECR Integration: Private container registry with image versioning
  • Blue-Green Deployment: Zero-downtime container replacement
  • Health Monitoring: Automated health checks post-deployment
  • Security Best Practices: IAM roles, Security Groups, encrypted secrets

Application Features

  • ML Prediction API: Student math score prediction based on 7 features
  • Web Interface: Responsive Flask UI with form validation
  • RESTful Architecture: Clean API design for future integrations
  • Production Logging: Comprehensive logging for debugging and audit trails
  • Error Handling: Custom exception framework with detailed error reporting

๐Ÿ› ๏ธ Tech Stack

DevOps & Cloud Infrastructure (Primary Focus)

Component Technology Purpose
Container Platform Docker Application containerization
CI/CD GitHub Actions Automated build, test, and deployment
Container Registry AWS ECR Private Docker image storage
Compute AWS EC2 (t2.micro) Production application hosting
Networking AWS VPC, Security Groups Network isolation and security
IAM AWS IAM Access management and permissions
WSGI Server Gunicorn Production-grade Python app server
Version Control Git/GitHub Source code management

Application Stack

Component Technology Purpose
Language Python 3.10 Core programming language
Web Framework Flask HTTP server and routing
ML Framework scikit-learn Model training and preprocessing
Data Processing pandas, numpy Data manipulation
Model Storage dill, pickle Model serialization

๐Ÿ“Š Model Performance

Measured on the held-out test split from artifacts/test.csv with the same preprocessing pipeline used in training.

Model Rยฒ Score MAE RMSE
Ridge โœ… 0.8806 4.2126 5.3910
Linear Regression 0.8795 4.2434 5.4146
CatBoost 0.8511 4.5752 6.0203
Random Forest 0.8488 4.6858 6.0652
XGBoost 0.8231 5.0907 6.5612

๐Ÿ”„ CI/CD Pipeline

Pipeline Overview

Trigger: Push to main branch
  โ†“
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Job 1: Continuous Integration (CI)                      โ”‚
โ”‚ โ€ข Checkout code                                         โ”‚
โ”‚ โ€ข Setup Python 3.10                                     โ”‚
โ”‚ โ€ข Install dependencies                                  โ”‚
โ”‚ โ€ข Run tests & linting                                   โ”‚
โ”‚ โ€ข Validate code quality                                 โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
  โ†“ (Only if CI passes)
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Job 2: Build & Push to ECR                             โ”‚
โ”‚ โ€ข Configure AWS credentials                             โ”‚
โ”‚ โ€ข Login to Amazon ECR                                   โ”‚
โ”‚ โ€ข Build Docker image                                    โ”‚
โ”‚ โ€ข Tag with git SHA & latest                            โ”‚
โ”‚ โ€ข Push to ECR registry                                  โ”‚
โ”‚ โ€ข Time: ~1m 43s                                         โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
  โ†“ (After successful push)
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Job 3: Deploy to EC2                                    โ”‚
โ”‚ โ€ข SSH to EC2 instance                                   โ”‚
โ”‚ โ€ข Pull latest image from ECR                           โ”‚
โ”‚ โ€ข Stop old container gracefully                         โ”‚
โ”‚ โ€ข Start new container on port 80                       โ”‚
โ”‚ โ€ข Run health check                                      โ”‚
โ”‚ โ€ข Clean up old images                                   โ”‚
โ”‚ โ€ข Time: ~9s                                             โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
  โ†“
โœ… Deployed to Production: http://34.228.159.84/

Pipeline Features

  • Parallel Execution: CI runs on all branches; deployment only on main
  • Automated Rollback: Deployment fails if health check doesn't pass
  • Version Tagging: Each deployment tagged with git commit SHA
  • Secret Management: AWS credentials stored as GitHub encrypted secrets
  • Monitoring: Build status visible via GitHub Actions badge

โšก CI/CD Pipeline Performance

Stage Time Description
CI (tests + lint) ~45s Python setup and quality validation
Docker Build + ECR Push ~1m 43s Build container image and push to ECR
EC2 Deploy ~9s Pull latest image, replace container, and health check
Total: Push to Production ~2m 37s (~3 min) End-to-end automated deployment

๐Ÿš€ Quick Start

Local Development

Prerequisites

  • Python 3.10+
  • Git
  • Docker (optional, for local container testing)

1๏ธโƒฃ Clone the Repository

git clone https://github.com/aashishkumar-tech/mlproject.git
cd mlproject

2๏ธโƒฃ Create Virtual Environment

Windows (PowerShell):

python -m venv venv
.\venv\Scripts\Activate.ps1

macOS / Linux:

python3 -m venv venv
source venv/bin/activate

3๏ธโƒฃ Install Dependencies

pip install -r requirements.txt

4๏ธโƒฃ Run the Application Locally

python app.py

Visit: http://localhost:8080


โ˜๏ธ AWS Deployment Guide

Complete Setup from Scratch

This project includes step-by-step AWS setup instructions. Follow the guide to deploy your own instance:

๐Ÿ“– AWS-SETUP-GUIDE.md - Complete AWS deployment walkthrough

What you'll create:

  1. โœ… IAM user with ECR permissions
  2. โœ… ECR repository for Docker images
  3. โœ… EC2 Security Group with proper rules
  4. โœ… EC2 Key Pair for SSH access
  5. โœ… EC2 t2.micro instance (Free Tier)
  6. โœ… Docker and AWS CLI installation
  7. โœ… GitHub Secrets configuration
  8. โœ… Automated deployment pipeline

Time to deploy: ~30 minutes for first-time setup

Quick Deploy (If AWS is already configured)

# 1. Configure GitHub Secrets (in repo settings)
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
AWS_REGION
ECR_REGISTRY
ECR_REPOSITORY
EC2_HOST
EC2_USER
EC2_SSH_KEY

# 2. Push to main branch
git add .
git commit -m "deploy: initial deployment"
git push origin main

# 3. GitHub Actions automatically deploys to EC2
# Check workflow: https://github.com/aashishkumar-tech/mlproject/actions

๐Ÿณ Docker

Local Docker Testing

# Build image
docker build -t mlproject:local .

# Run container
docker run -d -p 80:8080 --name mlproject mlproject:local

# Test
curl http://localhost

# View logs
docker logs mlproject

# Stop container
docker stop mlproject
docker rm mlproject

Docker Configuration

Dockerfile highlights:

  • Base image: python:3.10-slim
  • Multi-stage optimization
  • 2 Gunicorn workers (optimized for 1GB RAM)
  • 120s timeout (for model loading)
  • Port 8080 exposed

๐Ÿ“ Project Structure

Focus: DevOps & Deployment Components

mlproject/
โ”œโ”€โ”€ .github/
โ”‚   โ””โ”€โ”€ workflows/
โ”‚       โ””โ”€โ”€ ec2-deploy.yml          # โญ CI/CD Pipeline (3 jobs)
โ”œโ”€โ”€ Dockerfile                      # โญ Container configuration
โ”œโ”€โ”€ docker-compose.yml              # โญ Local development setup
โ”œโ”€โ”€ .dockerignore                   # Build optimization
โ”œโ”€โ”€ AWS-SETUP-GUIDE.md             # โญ Complete AWS setup guide
โ”œโ”€โ”€ .markdownlint.json             # Markdown linting config
โ”œโ”€โ”€ requirements.txt                # Python dependencies
โ”œโ”€โ”€ app.py                          # Flask application
โ”œโ”€โ”€ src/                           # ML pipeline code
โ”‚   โ”œโ”€โ”€ components/                # Training components
โ”‚   โ”œโ”€โ”€ pipeline/                  # Inference pipeline
โ”‚   โ””โ”€โ”€ utils.py                   # Helper functions
โ”œโ”€โ”€ templates/                     # HTML templates
โ”œโ”€โ”€ static/                        # CSS files
โ”œโ”€โ”€ artifacts/                     # Model files (included in Docker)
โ””โ”€โ”€ logs/                          # Application logs

๐Ÿ“Š CI/CD Workflow Visualization

File: .github/workflows/ec2-deploy.yml

graph LR
    A[Git Push] --> B{GitHub Actions}
    B --> C[CI Job<br/>Tests & Lint]
    C -->|Pass| D[Build Job<br/>Docker Build]
    D --> E[Push to ECR<br/>Tag: SHA & latest]
    E --> F[Deploy Job<br/>SSH to EC2]
    F --> G[Pull Image]
    G --> H[Stop Old Container]
    H --> I[Start New Container]
    I --> J[Health Check]
    J -->|Pass| K[โœ… Live Production]
    J -->|Fail| L[โŒ Rollback]
Loading

Total Pipeline Time: ~3 minutes from push to production


๐Ÿ“š Documentation

Comprehensive documentation for different audiences:

Document Audience Description
README.md Everyone Project overview and quick start
AWS-SETUP-GUIDE.md DevOps Engineers Complete AWS deployment walkthrough
HLD.md Architects High-level system design and architecture
TECHNICAL_DOC.md Developers Code structure and implementation details
API_DOCS.md API Users API endpoints and usage examples
CONTRIBUTING.md Contributors Contribution guidelines and coding standards

๐ŸŽฏ Project Goals & Learning Outcomes

DevOps Skills Demonstrated

โœ… CI/CD Pipeline Design: Multi-stage GitHub Actions workflow
โœ… Containerization: Docker best practices and optimization
โœ… Cloud Infrastructure: AWS EC2, ECR, IAM, VPC configuration
โœ… Automation: Automated testing, building, and deployment
โœ… Security: IAM roles, encrypted secrets, security groups
โœ… Monitoring: Health checks, logging, and error tracking
โœ… Documentation: Comprehensive technical documentation
โœ… Version Control: Git workflows and branching strategies

Why This Project Stands Out

  1. Production-Grade: Not just code that works, but code that deploys automatically
  2. End-to-End: From local development to production deployment
  3. Best Practices: Follows industry standards for DevOps and MLOps
  4. Documented: Every step explained with detailed documentation
  5. Cost-Effective: Runs on AWS Free Tier
  6. Portfolio-Ready: Demonstrates real-world deployment skills

๐Ÿ”ง Troubleshooting

Common Deployment Issues

Worker Timeout Errors:

# Check Docker logs on EC2
ssh ec2-user@YOUR_EC2_IP
docker logs mlproject --tail 100

Solution: Increase Gunicorn timeout in Dockerfile (already set to 120s)

Missing Model Files:

  • Ensure artifacts/ is NOT in .dockerignore
  • Check that model.pkl and preprocessor.pkl exist

SSH Authentication Failed:

  • Verify EC2_SSH_KEY secret contains full .pem content
  • Include BEGIN and END lines

AWS Credentials Error:

  • Rotate IAM access keys
  • Update GitHub Secrets

๐Ÿ“– Full troubleshooting guide: See TECHNICAL_DOC.md


๐Ÿ’ฐ Cost Breakdown

AWS Free Tier (First 12 months)

  • EC2 t2.micro: 750 hours/month - $0.00
  • ECR Storage: 500MB/month - $0.00
  • Data Transfer: 1GB/month - $0.00

After Free Tier

  • EC2 t2.micro 24/7: ~$8.50/month
  • ECR Storage (5GB): ~$0.50/month
  • Data Transfer (10GB): ~$0.90/month
  • Total: ~$10/month

Cost Optimization Tips:

  • Stop EC2 when not in use (~$0.01/hour)
  • Use reserved instances for 30-70% savings
  • Clean up old Docker images regularly

๐Ÿš€ Future Enhancements

Phase 1: Infrastructure

  • Add Application Load Balancer
  • Implement Auto Scaling Groups
  • Set up CloudWatch monitoring and alarms
  • Enable HTTPS with SSL certificate
  • Configure custom domain with Route 53

Phase 2: MLOps

  • Integrate MLflow for experiment tracking
  • Add model versioning and A/B testing
  • Implement model retraining pipeline
  • Set up model performance monitoring
  • Add feature store (AWS SageMaker)

Phase 3: Advanced Features

  • Kubernetes deployment (EKS)
  • Terraform IaC implementation
  • Multi-region deployment
  • Blue-green deployment strategy
  • Canary releases

๐Ÿค Contributing

Contributions are welcome! Please see CONTRIBUTING.md for:

  • Code of conduct
  • Development setup
  • Coding standards
  • Pull request process

๐Ÿ“œ License

This project is licensed under the MIT License - see the LICENSE file for details.


๐Ÿ‘จโ€๐Ÿ’ป Author

Aashish Kumar

Skills Demonstrated: Python โ€ข Flask โ€ข Docker โ€ข AWS (EC2, ECR, IAM) โ€ข GitHub Actions โ€ข CI/CD โ€ข MLOps โ€ข DevOps


๐ŸŒŸ Acknowledgments

  • AWS Free Tier for hosting infrastructure
  • GitHub Actions for CI/CD platform
  • Open-source community for tools and libraries

๐Ÿ“ง Contact & Support


๐Ÿ“ˆ Project Stats

GitHub last commit GitHub repo size GitHub


โญ If you find this project useful for learning MLOps and DevOps, please star it on GitHub!


๐ŸŽฌ Demo

Live Application: http://34.228.159.84/

Usage

  1. Visit the live application
  2. Click "Get Started" to access prediction form
  3. Fill in student information
  4. Get instant math score prediction

๐Ÿ“– How to Use This Project

For Learning DevOps/MLOps

  1. Clone and explore the .github/workflows/ec2-deploy.yml file
  2. Review Dockerfile and .dockerignore for containerization best practices
  3. Follow AWS-SETUP-GUIDE.md to deploy your own instance
  4. Modify and push code to see CI/CD in action

For Portfolio/Resume

  • Showcase automated deployment skills
  • Demonstrate AWS cloud infrastructure knowledge
  • Highlight CI/CD pipeline design
  • Show Docker containerization expertise

For Job Interviews

Talk about:

  • How you designed the 3-stage CI/CD pipeline
  • Why you chose specific AWS services (EC2 vs Lambda, ECR vs Docker Hub)
  • How you optimized Docker images and Gunicorn configuration
  • Security considerations (IAM, Security Groups, encrypted secrets)
  • Cost optimization strategies for production deployment

Logging

Logs are stored in logs/ directory with timestamps.


๐Ÿค Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

๐Ÿ“ License

This project is licensed under the MIT License - see the LICENSE file for details.


๐Ÿ‘จโ€๐Ÿ’ป Author

Aashish Kumar


๐Ÿ™ Acknowledgments

  • Dataset: Student Performance Dataset
  • Inspiration: End-to-end ML project deployment
  • Cloud Platform: AWS Free Tier

๐Ÿ“ง Support

For issues and questions:


๐Ÿ”ฎ Future Enhancements

  • Add authentication and user management
  • Implement model versioning with MLflow
  • Add unit tests and integration tests
  • Set up monitoring with Prometheus/Grafana
  • Add custom domain with HTTPS
  • Implement A/B testing for model comparison
  • Create REST API with FastAPI
  • Add real-time prediction streaming

โญ If you find this project useful, please star it on GitHub!

About

End-to-end MLOps project for student performance prediction with Flask, Docker, and automated CI/CD deployment on AWS EC2 via GitHub Actions and ECR.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors