AI-Powered Job Discovery Platform with Semantic Matching & Conversational RAG
SwipeHire is a modern job discovery platform that leverages advanced NLP and machine learning to match candidates with opportunities using semantic understanding of resumes and job descriptions.
The platform follows a microservices-inspired architecture deployed on serverless infrastructure (Modal) with a modern frontend (Next.js on Vercel).
Key Components:
| Component | Technology | Description |
|---|---|---|
| Client Layer | Next.js 16, Vercel Edge | React-based SSR frontend with CDN distribution |
| API Gateway | Firebase Auth, WebSocket | JWT authentication and real-time bidirectional streaming |
| Serverless Compute | Modal (ASGI) | FastAPI backend with Parser Agent, Semantic Ranker, RAG Engine |
| NLP Pipeline | Tesseract, Poppler, Gemini | OCR extraction, PDF rendering, LLM reasoning, vector embeddings |
| Persistence | Firestore, Vector Store | NoSQL database with semantic embedding index |
| Ingestion | JobSpy Scraper | Multi-source job aggregation from LinkedIn, Indeed, Glassdoor |
The 7-stage pipeline processes data from ingestion through real-time delivery:
Pipeline Stages Explained:
| Stage | Component | Technical Description |
|---|---|---|
| 1. Data Acquisition | JobSpy Scraper | Multi-source job aggregation using web scraping from LinkedIn, Indeed, Glassdoor |
| 2. Document Processing | Tesseract OCR + Poppler | PDF rendering via Poppler, text extraction via Tesseract OCR engine |
| 3. NLP & Embeddings | Parser Agent + Embedder | Gemini 3 Flash Preview for entity extraction, text-embedding-004 for semantic vectors |
| 4. Storage & Indexing | Firestore + Vector Store | NoSQL persistence with approximate nearest neighbor (ANN) indexing |
| 5. Retrieval & Ranking | Semantic Retriever + Ranker | Pure cosine similarity scoring between resume and job embeddings |
| 6. RAG & Response | RAG Engine + LLM | In-memory context retrieval with streaming SSE responses |
| 7. Real-time Delivery | WebSocket Stream | Bidirectional WSS for instant job card updates |
Multi-stage parsing pipeline to extract structured data from unstructured resume documents:
- Document Ingestion: PDF/DOCX support via Poppler and python-docx
- OCR Processing: Tesseract for scanned document text extraction
- LLM-Powered Extraction: Gemini 3 Flash Preview extracts entities (name, skills, education, experience)
- Schema Validation: Structured JSON output conforming to defined schemas
Dense vector representations for similarity computation:
- Embedding Model: Google's
text-embedding-004 - Dual Encoding: Separate embeddings for resumes and job descriptions
- Batch Processing: Efficient corpus embedding for 3000+ jobs
Pure semantic similarity ranking using cosine similarity:
- Semantic Similarity: Cosine similarity between resume and job embeddings
- Real-time Ranking: Jobs ranked and streamed via WebSocket in descending order
Context-aware conversational AI for job-specific queries:
- Context Injection: User resume + job posting injected into prompts
- Streaming Responses: Real-time token streaming via Server-Sent Events (SSE)
- Use Cases: Fit analysis, interview prep, cover letter generation, skills gap identification
| Layer | Technology |
|---|---|
| Frontend | Next.js 16, React, TypeScript, Tailwind CSS, Framer Motion |
| Backend | Python 3.11+, FastAPI (ASGI), Modal Serverless |
| AI/ML | Google Gemini 3 Flash Preview, text-embedding-004 |
| Database | Firebase Firestore, Vector Store |
| Auth | Firebase Authentication (JWT) |
| Real-time | WebSocket (WSS), Server-Sent Events (SSE) |
| OCR | Tesseract, Poppler |
| Deployment | Vercel (Frontend), Modal (Backend) |
| Endpoint | Method | Description |
|---|---|---|
/parse-resume |
POST | OCR + LLM parsing of resume documents |
/save-profile |
GET | Fetch ranked job recommendations |
/ws/jobs |
WS | Real-time streaming of ranked jobs |
/match |
POST | Save job to user's matches |
/matches |
GET | Retrieve saved matches |
/match/{id} |
DELETE | Remove a saved match |
/matches |
DELETE | Clear all matches |
/chat |
POST | RAG-powered contextual chat |
/health |
GET | Health check endpoint |
- Node.js 18+
- Python 3.11+
- Firebase project with Firestore enabled
- Google AI API key (Gemini)
- Tesseract OCR installed
- Poppler installed
cd backend
# Install dependencies
pip install -r requirements.txt
# Set environment variables
set GEMINI_API_KEY=your-gemini-api-key
# Run the local server
uvicorn server:app --reload --port 8000cd frontend
# Install dependencies
npm install
# Create .env.local file
echo NEXT_PUBLIC_API_URL=http://localhost:8000 > .env.local
# Run development server
npm run devThe app will be available at http://localhost:3000
cd backend
# Create volume for credentials
modal volume create tfj-data
modal volume put tfj-data firebase-credentials.json /firebase-credentials.json
modal volume put tfj-data system_prompt.txt /system_prompt.txt
# Create secret for API key
modal secret create gemini-secret GEMINI_API_KEY=your_key
# Deploy to Modal
modal deploy modal_server.pycd frontend
vercel --prod -e NEXT_PUBLIC_API_URL=https://your-modal-url.modal.run- ✅ Resume parsing with OCR + LLM extraction
- ✅ Semantic job matching with vector embeddings
- ✅ Real-time job recommendations via WebSocket
- ✅ Swipe-based discovery interface (Tinder-style UX)
- ✅ RAG-powered job-specific chatbot
- ✅ Match persistence and management
- ✅ Responsive, animated UI with Framer Motion
- ✅ Serverless deployment on Modal + Vercel
https://swipehire-lime.vercel.app/
MIT License - See LICENSE file for details.