Mall Customers Clustering

This repository contains a clustering analysis of the Mall Customers dataset. The main work is in code/clustering_project.ipynb, where the data is explored, preprocessed, and segmented using unsupervised learning.

Dataset

The project uses dataset/Mall_Customers.csv, which contains 200 mall customers with the following attributes:

CustomerID
Genre
Age
Annual Income (k$)
Spending Score (1-100)

The dataset has no missing values.

Goal

The objective is to identify meaningful customer groups based on income and spending behavior, then compare two clustering approaches:

K-Means clustering
Agglomerative hierarchical clustering

The notebook uses Annual Income (k$) and Spending Score (1-100) as the main features for clustering after scaling.

Workflow

The analysis follows these steps:

Load and inspect the dataset.
Explore feature distributions and pairwise relationships.
Remove CustomerID, which does not help with behavioral clustering.
Encode Genre and scale the numeric features.
Estimate the best number of clusters with inertia and silhouette analysis.
Fit K-Means with the selected K.
Compare K-Means with hierarchical clustering.
Visualize the final segments and summarize the customer profiles.

Main Result

The notebook selects K = 5 as the optimal number of clusters. The final groups are interpreted as customer segments such as average shoppers, premium shoppers, impulsive shoppers, careful spenders, and sensible shoppers.

Repository Structure

code/ - notebooks for the clustering analysis
dataset/ - Mall Customers CSV file
paper/ - space for report material

Requirements

The notebook was developed with Python and common data science libraries, including:

pandas
numpy
matplotlib
seaborn
scikit-learn
scipy

How To Run

Open code/clustering_project.ipynb in Jupyter or VS Code.
Make sure the relative dataset path points to dataset/Mall_Customers.csv.
Run the notebook cells from top to bottom.

Reference

The notebook also includes a comparison against scikit-learn clustering documentation and a Kaggle reference for the dataset (https://www.kaggle.com/datasets/abdallahwagih/mall-customers-segmentation/data).

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
code		code
dataset		dataset
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mall Customers Clustering

Dataset

Goal

Workflow

Main Result

Repository Structure

Requirements

How To Run

Reference

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Mall Customers Clustering

Dataset

Goal

Workflow

Main Result

Repository Structure

Requirements

How To Run

Reference

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages