Skip to content

Latest commit

Β 

History

History
77 lines (59 loc) Β· 2.09 KB

File metadata and controls

77 lines (59 loc) Β· 2.09 KB

πŸ“Š Udemy Course Analysis with ELT Stack

This project analyzes 98,000+ Udemy courses using the ELT stack (Elasticsearch, Logstash, Kibana), orchestrated with Docker. The goal is to gain insights into online courses by performing data wrangling, querying, and visualization.


πŸš€ Tech Stack

  • Elasticsearch β†’ Data indexing & queries
  • Logstash β†’ Data extraction & transformation
  • Kibana β†’ Interactive dashboards & visualizations
  • Docker β†’ Containerized setup for easy deployment

πŸ“‚ Project Structure

πŸ“‚ Other files/
   β”œβ”€β”€ docker-compose.yaml      # ELK Stack container setup
   β”œβ”€β”€ index.json               # Elasticsearch index mapping
   β”œβ”€β”€ logstash/                # Logstash configuration files
       β”œβ”€β”€ logstash.conf         # Logstash configuration file
   β”œβ”€β”€ kibanaDashboard.ndjson   # Kibana dashboard import file

πŸ› οΈ Setup & Usage

1️⃣ Start Elasticsearch and Kibana

cd "Other files"
docker-compose up es01 kibana

2️⃣ Create the Index in Elasticsearch

curl -u elastic:admin -X PUT "localhost:9200/udemy_courses"

3️⃣ Apply the Index Mapping

curl -u elastic:admin -X PUT "localhost:9200/udemy_courses/_mapping" \
    -H "Content-Type: application/json" -d @index.json

4️⃣ Start Logstash

docker-compose up -d logstash

5️⃣ Import CSV Data

cp /path/to/dataset.csv logstash/csvData/

6️⃣ Restart Logstash to Load Data

docker-compose restart logstash

πŸ“Š Example Queries

  • Top 5 free Python courses based on ratings and reviews
  • Course distribution by category and level
  • Comparison of ratings for free vs. paid courses
  • Most popular business courses

πŸ“œ Credits

  • Author: Marco Minaudo
  • Course: Systems and Methods for Big and Unstructured Data (SMBUD)
  • Academic Year: 2024-2025

🌟 If you find this project useful, feel free to star ⭐ the repository!