A complete end-to-end Machine Learning project that predicts a student's Performance Index based on academic and lifestyle factors using Linear Regression.
This project analyzes how different factors affect student performance and builds a predictive model using real-world-like data.
It takes user inputs such as study hours, sleep, and previous scores, and predicts the expected performance index.
The dataset contains 10,000 entries with the following features:
- 📘 Hours Studied (Numeric)
- 📝 Previous Scores (Numeric)
- 🎯 Extracurricular Activities (Categorical: Yes/No)
- 😴 Sleep Hours (Numeric)
- 📄 Sample Question Papers Practiced (Numeric)
- 🎯 Performance Index (Target Variable)
- Data Loading
- Data Cleaning
- Encoding Categorical Variables
- Feature Scaling (StandardScaler)
- Train-Test Split
- Model Training (Linear Regression)
- Model Evaluation
- User Input Prediction System
- 🔹 Mean Absolute Error (MAE): 1.61
- 🔹 R² Score: 0.9889
👉 The model performs extremely well, explaining ~99% variance in the data.
After training, the program allows real-time predictions:
Enter student details: Hours Studied: 5 Previous Scores: 85 Extracurricular Activities (Yes/No): yes Sleep Hours: 8 Sample Papers Practiced: 0
Predicted Performance Index: 71.21
You may see this warning:
UserWarning: X does not have valid feature names
👉 This happens because the model was trained with feature names but prediction input is a raw array.
Use a DataFrame for prediction instead of a list.
- 🐍 Python
- 📊 NumPy
- 📁 Pandas
- 🤖 Scikit-learn
Student_Performance_Predictor/ │── main.py │── Student_Performance.csv │── README.md
- 🔹 Add Polynomial Regression
- 🔹 Try Advanced Models (Random Forest, XGBoost)
- 🔹 Hyperparameter Tuning
- 🔹 Build Web App (Streamlit)
- Real-world ML pipeline
- Feature scaling importance
- Handling categorical data
- Model evaluation metrics
- Building interactive ML systems
This project is a strong beginner-to-intermediate ML project that demonstrates:
✔ End-to-end pipeline
✔ Real-world data handling
✔ Model deployment logic (CLI-based)