Skip to content

[FOSSOVERFLOW-25] feat: Automatic Scraping & Parsing of Academic Calendar #199

@amaydixit11

Description

@amaydixit11

Feature Request: Automatic Scraping & Parsing of Academic Calendar


Problem Statement

Current Issue:
The academic calendar is uploaded on the IIT Bhilai website as a PDF/HTML table, and currently admins must manually copy dates into our Calendar component. This is time-consuming, error-prone, and requires updates every semester/year.


Proposed Solution

✅ Automatic Academic Calendar Scraping

✅ Integration with Existing Calendar UI

  • New events should automatically appear in the existing Calendar component with category-appropriate icons and color coding.
  • Admins should have ability to review/edit scraped data before publishing (optional improvement).

Technical Notes

Layer Requirement
Scraper A cron-triggered script
Data Parsing Match patterns like: "holiday", "exam", "commencement", "registration"
Database Add relevant fields to calendar events table
Retry & Fail-Safe If scraping fails, continue using last known data
Optional Cache PDF locally with versioning for historical comparison

Possible Libraries

  • BeautifulSoup4 / lxml (for HTML scraping)
  • PyMuPDF or pdfplumber if format changes back to PDF

Alternatives Considered

  1. Manual Upload of Events
    ❌ Still requires regular admin effort
  2. Direct API Feed from Institute Website
    ❌ No such API currently exists

Mockups & Visual Examples

  • Checkout the figma design mockups in README

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions