💼 SwipeHire

AI-Powered Job Discovery Platform with Semantic Matching & Conversational RAG

SwipeHire is a modern job discovery platform that leverages advanced NLP and machine learning to match candidates with opportunities using semantic understanding of resumes and job descriptions.

Architecture Overview

System Architecture

The platform follows a microservices-inspired architecture deployed on serverless infrastructure (Modal) with a modern frontend (Next.js on Vercel).

Key Components:

Component	Technology	Description
Client Layer	Next.js 16, Vercel Edge	React-based SSR frontend with CDN distribution
API Gateway	Firebase Auth, WebSocket	JWT authentication and real-time bidirectional streaming
Serverless Compute	Modal (ASGI)	FastAPI backend with Parser Agent, Semantic Ranker, RAG Engine
NLP Pipeline	Tesseract, Poppler, Gemini	OCR extraction, PDF rendering, LLM reasoning, vector embeddings
Persistence	Firestore, Vector Store	NoSQL database with semantic embedding index
Ingestion	JobSpy Scraper	Multi-source job aggregation from LinkedIn, Indeed, Glassdoor

Data Pipeline Flow

The 7-stage pipeline processes data from ingestion through real-time delivery:

Pipeline Stages Explained:

Stage	Component	Technical Description
1. Data Acquisition	JobSpy Scraper	Multi-source job aggregation using web scraping from LinkedIn, Indeed, Glassdoor
2. Document Processing	Tesseract OCR + Poppler	PDF rendering via Poppler, text extraction via Tesseract OCR engine
3. NLP & Embeddings	Parser Agent + Embedder	Gemini 3 Flash Preview for entity extraction, text-embedding-004 for semantic vectors
4. Storage & Indexing	Firestore + Vector Store	NoSQL persistence with approximate nearest neighbor (ANN) indexing
5. Retrieval & Ranking	Semantic Retriever + Ranker	Pure cosine similarity scoring between resume and job embeddings
6. RAG & Response	RAG Engine + LLM	In-memory context retrieval with streaming SSE responses
7. Real-time Delivery	WebSocket Stream	Bidirectional WSS for instant job card updates

Core Pipeline Details

1. Resume Parsing & Information Extraction

Multi-stage parsing pipeline to extract structured data from unstructured resume documents:

Document Ingestion: PDF/DOCX support via Poppler and python-docx
OCR Processing: Tesseract for scanned document text extraction
LLM-Powered Extraction: Gemini 3 Flash Preview extracts entities (name, skills, education, experience)
Schema Validation: Structured JSON output conforming to defined schemas

2. Semantic Embedding Generation

Dense vector representations for similarity computation:

Embedding Model: Google's text-embedding-004
Dual Encoding: Separate embeddings for resumes and job descriptions
Batch Processing: Efficient corpus embedding for 3000+ jobs

3. Recommendation & Ranking Engine

Pure semantic similarity ranking using cosine similarity:

Semantic Similarity: Cosine similarity between resume and job embeddings
Real-time Ranking: Jobs ranked and streamed via WebSocket in descending order

4. Conversational RAG (Retrieval-Augmented Generation)

Context-aware conversational AI for job-specific queries:

Context Injection: User resume + job posting injected into prompts
Streaming Responses: Real-time token streaming via Server-Sent Events (SSE)
Use Cases: Fit analysis, interview prep, cover letter generation, skills gap identification

Tech Stack

Layer	Technology
Frontend	Next.js 16, React, TypeScript, Tailwind CSS, Framer Motion
Backend	Python 3.11+, FastAPI (ASGI), Modal Serverless
AI/ML	Google Gemini 3 Flash Preview, text-embedding-004
Database	Firebase Firestore, Vector Store
Auth	Firebase Authentication (JWT)
Real-time	WebSocket (WSS), Server-Sent Events (SSE)
OCR	Tesseract, Poppler
Deployment	Vercel (Frontend), Modal (Backend)

API Endpoints

Endpoint	Method	Description
`/parse-resume`	POST	OCR + LLM parsing of resume documents
`/save-profile`	GET	Fetch ranked job recommendations
`/ws/jobs`	WS	Real-time streaming of ranked jobs
`/match`	POST	Save job to user's matches
`/matches`	GET	Retrieve saved matches
`/match/{id}`	DELETE	Remove a saved match
`/matches`	DELETE	Clear all matches
`/chat`	POST	RAG-powered contextual chat
`/health`	GET	Health check endpoint

Getting Started

Prerequisites

Node.js 18+
Python 3.11+
Firebase project with Firestore enabled
Google AI API key (Gemini)
Tesseract OCR installed
Poppler installed

Running Locally

Backend Setup

cd backend

# Install dependencies
pip install -r requirements.txt

# Set environment variables
set GEMINI_API_KEY=your-gemini-api-key

# Run the local server
uvicorn server:app --reload --port 8000

Frontend Setup

cd frontend

# Install dependencies
npm install

# Create .env.local file
echo NEXT_PUBLIC_API_URL=http://localhost:8000 > .env.local

# Run development server
npm run dev

The app will be available at http://localhost:3000

Production Deployment

Backend (Modal)

cd backend

# Create volume for credentials
modal volume create tfj-data
modal volume put tfj-data firebase-credentials.json /firebase-credentials.json
modal volume put tfj-data system_prompt.txt /system_prompt.txt

# Create secret for API key
modal secret create gemini-secret GEMINI_API_KEY=your_key

# Deploy to Modal
modal deploy modal_server.py

Frontend (Vercel)

cd frontend
vercel --prod -e NEXT_PUBLIC_API_URL=https://your-modal-url.modal.run

Features

✅ Resume parsing with OCR + LLM extraction
✅ Semantic job matching with vector embeddings
✅ Real-time job recommendations via WebSocket
✅ Swipe-based discovery interface (Tinder-style UX)
✅ RAG-powered job-specific chatbot
✅ Match persistence and management
✅ Responsive, animated UI with Framer Motion
✅ Serverless deployment on Modal + Vercel

Deployed Link

https://swipehire-lime.vercel.app/

License

MIT License - See LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
backend		backend
frontend		frontend
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
package-lock.json		package-lock.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

💼 SwipeHire

Architecture Overview

System Architecture

Data Pipeline Flow

Core Pipeline Details

1. Resume Parsing & Information Extraction

2. Semantic Embedding Generation

3. Recommendation & Ranking Engine

4. Conversational RAG (Retrieval-Augmented Generation)

Tech Stack

API Endpoints

Getting Started

Prerequisites

Running Locally

Backend Setup

Frontend Setup

Production Deployment

Backend (Modal)

Frontend (Vercel)

Features

Deployed Link

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

💼 SwipeHire

Architecture Overview

System Architecture

Data Pipeline Flow

Core Pipeline Details

1. Resume Parsing & Information Extraction

2. Semantic Embedding Generation

3. Recommendation & Ranking Engine

4. Conversational RAG (Retrieval-Augmented Generation)

Tech Stack

API Endpoints

Getting Started

Prerequisites

Running Locally

Backend Setup

Frontend Setup

Production Deployment

Backend (Modal)

Frontend (Vercel)

Features

Deployed Link

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages