Skip to content
View YuZh98's full-sized avatar
:electron:
:electron:

Highlights

  • Pro

Block or report YuZh98

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
YuZh98/README.md

Hi, I'm Hugh 👋

Statistics PhD by training, tool builder by compulsion. My papers propose scalable algorithms and prove theorems, and my side projects are tools I build to help other people get their work done faster. I find it hard to leave a solvable problem alone, and whenever I run into repetitive work, I'd rather automate it than let it eat into my time.


Tools I built because I needed them

latex2arxiv: Submit to arXiv without the headache. One command cleans your LaTeX project, catches rejection-causing errors, and walks you through the upload.

PyPI Downloads Homebrew VS Code MCP

Chrome extension Stars

Takes any LaTeX project (zip, directory, or git URL) and outputs a submission-ready zip. Prunes unreachable files, strips draft markup and revision commands, normalizes BibTeX, and runs pre-flight checks that surface errors arXiv silently fails on. Pass --guide and it writes a step-by-step upload walkthrough with copy-paste title/authors/abstract. Gate your paper repo on compliance with --dry-run in CI. Same pipeline runs in five surfaces — terminal, Chrome extension for Overleaf (zero install, in Web Store review), VS Code, MCP server (Claude/Cursor/Copilot/Windsurf/Zed), and GitHub Action.

Python CLI PyPI Homebrew Chrome extension VS Code GitHub Actions pre-commit MCP


academic-application-tracker: Local Streamlit dashboard that answers "what do I do today?" for academics juggling dozens of applications, deadlines, and recommendation letters.

CI Live demo Python Coverage

Academic job searching is chaos. Overlapping deadlines, three recommenders per position, every institution wanting the materials checklist in a different shape. Halfway through last cycle I gave up on spreadsheets and built the Streamlit dashboard I actually needed — urgency-banded deadlines, recommender state per position, an interview log, and a daily action list auto-computed from what's still open. Try the live demo: no install, each session gets its own sandbox. 1000+ tests at 95% coverage, because I'm running it against my own applications and I can't afford it eating a deadline.

Python Streamlit SQLite pytest Plotly


python-project-scaffold: Skip the 30-minute setup ritual and start at your first feature commit.

Every new Python project starts with the same 30-minute ritual: wire up ruff, pyright, pytest, CI matrix, coverage gate, pre-commit, Dependabot, ADRs... I automated all of it. One click on Use this template + one python3 scripts/init-project.py and you have a green-CI repo ready for your first feature. Ships with a /new-project Claude Code skill that creates the GitHub repo and sets up branch protection, because even the setup should be one command.

Python GitHub Actions Claude Code pre-commit


Research

I've always enjoyed working on statistical problems that are mathematically challenging and scientifically motivated. I am in particular intrigued by problems where the data or the quantity of interest is combinatorial: some or all coordinates of the observation take values in a discrete, structured set rather than in Euclidean space. The loss of Euclidean geometry brings simultaneous challenges in probabilistic modeling, mathematical theory, and scalable computation. The central question organizing my research is: How can we develop Bayesian methodology with theoretical guarantees for problems that are combinatorial in structure, and how do those methods behave on real scientific data?

Three first-author papers:

  • JCGS 2025 (published): blocked Gibbs sampler with anti-correlation Gaussian data augmentation; 23 to 67 times faster than NUTS (the industry-standard sampler) with a geometric ergodicity proof. Code: Anti-correlation-Gaussian.
  • JASA (revision submitted): Bayesian regression over combinatorial response data via integer programming duality. Code: combinatorial-regression, a multi-language reproducibility pipeline (R + Rcpp inner loops, JAX/NumPyro baselines, Makefile-orchestrated).
  • Bernoulli (revision submitted): first consistency guarantee for graph-based clustering under model misspecification.

More research code: VAE-fMRI-Alzheimer, a 3D-convolutional VAE for Alzheimer's fMRI. CUDA training on HiPerGator, 36 unit tests, 18 tutorial notebooks.


Stack

Python R C++ Rust JAX PyTorch GitHub Actions PyPI Homebrew Linux Slurm/HPC


📫 hugh.stats@gmail.com · Google Scholar · ORCID · LinkedIn · Website

Pinned Loading

  1. academic-application-tracker academic-application-tracker Public

    Local Streamlit dashboard that answers "what do I do today?" for academics juggling dozens of applications, deadlines, and recommendation letters.

    Python 1

  2. latex2arxiv latex2arxiv Public

    Submit to arXiv without the headache. One command cleans your LaTeX project, catches rejection-causing errors, and walks you through the upload.

    Python 3 1

  3. VAE-fMRI-Alzheimer VAE-fMRI-Alzheimer Public

    Developing models to extract information about Alzheimer's disease from fMRIs

    Jupyter Notebook 3 1

  4. combinatorial-regression combinatorial-regression Public

    Statistical Modeling for Combinatorial Response Data

    Jupyter Notebook 2

  5. finetune-gsm8k finetune-gsm8k Public

    QLoRA fine-tuning of Qwen2.5-3B-Instruct on GSM8K, with LoRA-knob ablation. Learning project, frozen on completion.

    Jupyter Notebook

  6. latex2ufdissertation latex2ufdissertation Public

    A safety-net validator for UF doctoral dissertations

    Python 1