Feature: Support downloading model weights on-the-fly from HuggingFace (#166) by rohan-uiuc · Pull Request #167 · VectorInstitute/vector-inference

rohan-uiuc · 2025-11-12T22:58:44Z

PR Type

Feature

Short Description

Implements support for on-the-fly model weight downloads from HuggingFace when local model weights directory doesn't exist. This allows users to launch models without manually downloading and mounting weight directories.

The code now checks if the model weights directory exists before attempting to bind mount it. If the directory doesn't exist, it skips the bind mount and uses the model identifier from --model in vllm_args (or falls back to model_name). Users must pass the full HuggingFace model identifier (e.g., Qwen/Qwen2.5-7B-Instruct) via --model in vllm_args for automatic downloads to work.

Fixes #166

Tests Added

test_generate_server_setup_singularity_no_weights: Verifies server setup doesn't include model weights path when directory doesn't exist
test_generate_launch_cmd_singularity_no_local_weights: Verifies launch command uses HF model identifier when local weights are missing
test_generate_model_launch_script_singularity_no_weights: Verifies batch mode correctly handles missing model weights
All existing tests pass (28 tests in test_slurm_script_generator.py, 116+ total tests)
Verified end-to-end: model downloads and serves successfully from HuggingFace when local weights don't exist and --model is specified in vllm_args

…vllm args

…ssing

…/llm-inference into hf_download

codecov-commenter · 2025-11-12T23:04:46Z

Codecov Report

❌ Patch coverage is 27.58621% with 21 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.97%. Comparing base (6c2e558) to head (8e66e26).
⚠️ Report is 9 commits behind head on main.

Files with missing lines	Patch %	Lines
vec_inf/client/_utils.py	8.69%	21 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #167      +/-   ##
==========================================
- Coverage   91.23%   89.97%   -1.27%     
==========================================
  Files          14       14              
  Lines        1438     1466      +28     
==========================================
+ Hits         1312     1319       +7     
- Misses        126      147      +21

Files with missing lines	Coverage Δ
vec_inf/cli/_cli.py	`88.06% <100.00%> (+0.06%)`	⬆️
vec_inf/client/_helper.py	`92.95% <100.00%> (+0.03%)`	⬆️
vec_inf/client/_slurm_script_generator.py	`96.17% <100.00%> (ø)`
vec_inf/client/config.py	`100.00% <100.00%> (ø)`
vec_inf/client/models.py	`100.00% <100.00%> (ø)`
vec_inf/client/_utils.py	`72.04% <8.69%> (-8.94%)`	⬇️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

XkunW

Hi @rohan-uiuc, thanks for opening this, I left a few comments. Another thing worth considering is adding a check in the API to see if a model needs to be downloaded, and if that's the case, only allow the download if the HF cache directory env var is set, so that users wouldn't accidentally download a model to their home directory and use up all the quota

vec_inf/client/_slurm_templates.py

vec_inf/client/_slurm_script_generator.py

Adds hf_model field to ModelConfig and LaunchOptions to specify a HuggingFace model id for vLLM to download at runtime.

Updates SlurmScriptGenerator and BatchSlurmScriptGenerator to use hf_model for vllm serve when local weights don't exist. Priority: local weights > hf_model > model name.

rohan-uiuc added 7 commits October 30, 2025 17:33

Add support to download models automatically if --model specified in …

fc843ed

…vllm args

create model dir if it doesn't exist

5f790ff

Check model weights existence before binding; use HF model name if mi…

0f22bec

…ssing

Remove commented code

9f2fdd2

Apply code formatting fixes from pre-commit

38011be

revert unnecessary test change

4de3563

Merge branch 'develop' of https://github.com/Center-for-AI-Innovation…

eb1e929

…/llm-inference into hf_download

rohan-uiuc added 2 commits November 12, 2025 17:05

Apply formatting fixes from pre-commit

8b6a211

Add tests for model weights existence coverage

c68cb35

XkunW reviewed Nov 18, 2025

View reviewed changes

vec_inf/client/_slurm_templates.py Outdated Show resolved Hide resolved

vec_inf/client/_slurm_script_generator.py Outdated Show resolved Hide resolved

vec_inf/client/_slurm_script_generator.py Outdated Show resolved Hide resolved

rohan-uiuc added 9 commits January 5, 2026 17:32

Remove redundant /dev/infiniband

b610891

Remove unused variable

a7a5deb

Add warning if downloading weights and HF cache not set

bb3142b

format ONLY

2db9c6c

Add --hf-model CLI option and config field

079c86a

Adds hf_model field to ModelConfig and LaunchOptions to specify a HuggingFace model id for vLLM to download at runtime.

Use hf_model as model source when local weights missing

20163da

Updates SlurmScriptGenerator and BatchSlurmScriptGenerator to use hf_model for vllm serve when local weights don't exist. Priority: local weights > hf_model > model name.

Pass hf_model from CLI to launch params

ed82b77

Add tests for hf_model override in slurm script generation

d3f6772

Add documentation for --hf-model option

1c312df

Vismayak mentioned this pull request Feb 18, 2026

Supporting downloading Hugging Face models by hf_repo_id from models.yaml when weights are missing #193

Closed

XkunW added 6 commits March 29, 2026 14:55

Rebase with main and update hf_model field processing logic

cd7a9b1

Fix typos

03864a5

Fix tests

292200f

ruff check & format

ba9030f

Merge branch 'main' into hf_download

b4037e6

ruff format

8e66e26

XkunW approved these changes Mar 30, 2026

View reviewed changes

XkunW merged commit 3ca1438 into VectorInstitute:main Mar 30, 2026
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Support downloading model weights on-the-fly from HuggingFace (#166)#167

Feature: Support downloading model weights on-the-fly from HuggingFace (#166)#167
XkunW merged 24 commits intoVectorInstitute:mainfrom
Center-for-AI-Innovation:hf_download

rohan-uiuc commented Nov 12, 2025 •

edited

Loading

Uh oh!

codecov-commenter commented Nov 12, 2025 •

edited

Loading

Uh oh!

XkunW left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

rohan-uiuc commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Type

Short Description

Tests Added

Uh oh!

codecov-commenter commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

XkunW left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rohan-uiuc commented Nov 12, 2025 •

edited

Loading

codecov-commenter commented Nov 12, 2025 •

edited

Loading