ML Pipeline

MLPipeline is a modular machine learning framework that integrates with Weights & Biases to manage the full experiment lifecycle—dataset versioning, model training, hyperparameter tuning, evaluation, and artifact storage. It supports graph-based physics simulations (Graph Neural Simulator and Mesh Graph Net via PyTorch Geometric) as well as NLP text classification (BERT-based Tagifai model), with distributed training through Ray. The pipeline provides a CLI interface for uploading versioned datasets as W&B artifacts, launching training runs with automatic metric logging and checkpoint management, querying best-performing runs for evaluation, and generating visualizations of model predictions. It is designed for scalable research workflows, including HPC cluster execution via SLURM scripts.

Virtual Environment - Do if you only want to run a model or hit endpoints

make venv

Create or open the .env file and add the W&B key WEIGHT_AND_BIASES_API_KEY=...

Login to Weights & Biases

wandb login

Dev Setup - Do if you only want to perform development within the repo

make dev

Create or open the .env file and add the W&B key WEIGHT_AND_BIASES_API_KEY=...

Login to Weights & Biases

wandb login

Docs Setup

make docs
python3 -m mkdocs new .
python3 -m mkdocs serve

Training

python pipeline/train.py gns-train-model \
    --dataset-loc "./data/complex_physics/WaterDropSmall" \
    --train-loop-config "./config/complex_physics_gns.json" \
    --num-workers 2

python pipeline/train.py mesh-train-model \
    --dataset-loc "./data/mesh_graph_net/meshgraphnets_miniset5traj_vis.pt" \
    --train-loop-config "./config/mesh_graph_net.json" \
    --num-workers 2

Data Artifact Processing

python pipeline/artifacts.py process-dataset \
    --dataset-loc "./data/labeled_projects.csv" \
    --data-type "raw_data" \
    --data-for-model-id "Tagifai_LLM_Model"

python pipeline/artifacts.py process-dataset \
    --dataset-loc "./data/test" \
    --data-type "raw_data" \
    --data-for-model-id "Tagifai_LLM_Model"

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
config		config
docs		docs
ml		ml
pipeline		pipeline
slurm_scripts		slurm_scripts
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
TODO.md		TODO.md
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ML Pipeline

Virtual Environment - Do if you only want to run a model or hit endpoints

Dev Setup - Do if you only want to perform development within the repo

Docs Setup

Training

Data Artifact Processing

About

Uh oh!

Releases

Packages

Languages

License

MitchellGRead/MLPipeline

Folders and files

Latest commit

History

Repository files navigation

ML Pipeline

Virtual Environment - Do if you only want to run a model or hit endpoints

Dev Setup - Do if you only want to perform development within the repo

Docs Setup

Training

Data Artifact Processing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages