This guide covers the minimum setup needed to run MajorMatch locally, including PostgreSQL storage for the course index.
- Python 3.10+.
- A virtual environment for the project.
- PostgreSQL 14+.
- Ollama installed locally and running if you want the chat assistant and tool-calling flow.
Create and activate a virtual environment, then install the requirements:
python -m venv venv
venv\Scripts\activate
pip install -r requirements.txtMajorMatch stores course records, embeddings, and projection coordinates in PostgreSQL.
Create a database and user if you do not already have one:
CREATE DATABASE semantic_search;
CREATE USER postgres WITH PASSWORD 'postgres';
GRANT ALL PRIVILEGES ON DATABASE semantic_search TO postgres;You can also use your own database name and credentials. The app reads the connection string from DATABASE_URL.
Example local connection string:
$env:DATABASE_URL = "postgresql+psycopg2://postgres:postgres@localhost:5432/semantic_search"MajorMatch tries to create the vector extension automatically if it is available.
- If the extension exists, you can keep using it.
- If the extension is not available, the app still works by storing embeddings as
float[]and using a portable fallback search path. - Because of that fallback,
pgvectoris helpful but not required for the app to run.
If you want to enable it manually, run:
CREATE EXTENSION IF NOT EXISTS vector;Start Ollama before using the chat assistant:
ollama serveOptional environment variables:
OLLAMA_BASE_URL: defaults tohttp://localhost:11434.OLLAMA_MODEL: defaults tollama2:latest, with fallback model selection when tools are requested.
If you want a model that is more likely to support tools, pull one of the local fallback models first, for example:
ollama pull llama3.2:1bIf you want live job-market data, set Adzuna credentials:
$env:ADZUNA_APP_ID = "your_app_id"
$env:ADZUNA_APP_KEY = "your_app_key"Without these values, the career-context tool falls back gracefully.
MajorMatch reads course data from the data/ folder and stores the indexed corpus in PostgreSQL.
Run the indexer after the database is ready:
python scripts/embed.pyYou can also point it at a specific CSV file or folder:
python scripts/embed.py data\courses.csvThe CSV files must include title and description columns. Rows without those fields are skipped.
Start the Streamlit app after the database and index are ready:
streamlit run streamlit_app.pyRun the main test target:
$env:PYTHONPATH='.'; .\venv\Scripts\python -m pytest tests/test_orchestrator.py -q- Course title and description.
- Embedding vector as a
float[]column. - 2D projection coordinates for PCA, UMAP, and t-SNE.
That database table is what powers semantic search and the projection plot in the UI.