Skip to content

Feature: Support downloading model weights on-the-fly from HuggingFace (#166)#167

Merged
XkunW merged 24 commits intoVectorInstitute:mainfrom
Center-for-AI-Innovation:hf_download
Mar 30, 2026
Merged

Feature: Support downloading model weights on-the-fly from HuggingFace (#166)#167
XkunW merged 24 commits intoVectorInstitute:mainfrom
Center-for-AI-Innovation:hf_download

Conversation

@rohan-uiuc
Copy link
Copy Markdown
Contributor

@rohan-uiuc rohan-uiuc commented Nov 12, 2025

PR Type

Feature

Short Description

Implements support for on-the-fly model weight downloads from HuggingFace when local model weights directory doesn't exist. This allows users to launch models without manually downloading and mounting weight directories.

The code now checks if the model weights directory exists before attempting to bind mount it. If the directory doesn't exist, it skips the bind mount and uses the model identifier from --model in vllm_args (or falls back to model_name). Users must pass the full HuggingFace model identifier (e.g., Qwen/Qwen2.5-7B-Instruct) via --model in vllm_args for automatic downloads to work.

Fixes #166

Tests Added

  • test_generate_server_setup_singularity_no_weights: Verifies server setup doesn't include model weights path when directory doesn't exist
  • test_generate_launch_cmd_singularity_no_local_weights: Verifies launch command uses HF model identifier when local weights are missing
  • test_generate_model_launch_script_singularity_no_weights: Verifies batch mode correctly handles missing model weights
  • All existing tests pass (28 tests in test_slurm_script_generator.py, 116+ total tests)
  • Verified end-to-end: model downloads and serves successfully from HuggingFace when local weights don't exist and --model is specified in vllm_args

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Nov 12, 2025

Codecov Report

❌ Patch coverage is 27.58621% with 21 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.97%. Comparing base (6c2e558) to head (8e66e26).
⚠️ Report is 9 commits behind head on main.

Files with missing lines Patch % Lines
vec_inf/client/_utils.py 8.69% 21 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #167      +/-   ##
==========================================
- Coverage   91.23%   89.97%   -1.27%     
==========================================
  Files          14       14              
  Lines        1438     1466      +28     
==========================================
+ Hits         1312     1319       +7     
- Misses        126      147      +21     
Files with missing lines Coverage Δ
vec_inf/cli/_cli.py 88.06% <100.00%> (+0.06%) ⬆️
vec_inf/client/_helper.py 92.95% <100.00%> (+0.03%) ⬆️
vec_inf/client/_slurm_script_generator.py 96.17% <100.00%> (ø)
vec_inf/client/config.py 100.00% <100.00%> (ø)
vec_inf/client/models.py 100.00% <100.00%> (ø)
vec_inf/client/_utils.py 72.04% <8.69%> (-8.94%) ⬇️

Impacted file tree graph

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

@XkunW XkunW left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @rohan-uiuc, thanks for opening this, I left a few comments. Another thing worth considering is adding a check in the API to see if a model needs to be downloaded, and if that's the case, only allow the download if the HF cache directory env var is set, so that users wouldn't accidentally download a model to their home directory and use up all the quota

Adds hf_model field to ModelConfig and LaunchOptions to specify
a HuggingFace model id for vLLM to download at runtime.
Updates SlurmScriptGenerator and BatchSlurmScriptGenerator to use
hf_model for vllm serve when local weights don't exist.
Priority: local weights > hf_model > model name.
@XkunW XkunW merged commit 3ca1438 into VectorInstitute:main Mar 30, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for on-the-fly model downloads to llm-inference package Support downloading model weights on the fly from HF

3 participants