refactor(examples): consolidate vlm_ptq into llm_ptq#1705
Conversation
VLM PTQ already ran entirely through examples/llm_ptq (the vlm_ptq shell script sourced llm_ptq/parser.sh and called llm_ptq/hf_ptq.py), so the vlm_ptq example was effectively a thin, partially-broken wrapper. Make llm_ptq the single source of truth for both LLM and VLM PTQ: - Add --vlm and --calib_with_images flags to scripts/parser.sh and scripts/huggingface_example.sh. --vlm bootstraps VILA deps and runs the TRT-LLM multimodal quickstart as the deploy smoke test. - Add examples/llm_ptq/requirements-vila.txt (the vlm_ptq script referenced a requirements-vila.txt that never existed in the repo). - Document the VLM support matrix and --vlm workflow in llm_ptq/README.md. Deprecate examples/vlm_ptq: - Replace its huggingface_example.sh with a shim that warns and forwards to the llm_ptq script with --vlm (backward compatible). - Turn its README into a redirect/migration notice. - Repoint root README VLM links and add a CHANGELOG deprecation entry. Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
📝 WalkthroughWalkthroughThis PR consolidates VLM Post-Training Quantization into the LLM PTQ pipeline. Users now run LLM PTQ with a ChangesVLM PTQ consolidation into LLM PTQ
🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 6✅ Passed checks (6 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Warning
CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.
Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@examples/llm_ptq/scripts/huggingface_example.sh`:
- Around line 262-263: The shell invocation expands QUICK_START_MULTIMODAL and
SAVE_PATH unquoted which breaks when paths contain spaces; update the python3
command invocations (the lines that call python3 with QUICK_START_MULTIMODAL and
the --model_dir SAVE_PATH flag) to quote those expansions (e.g., wrap
QUICK_START_MULTIMODAL and SAVE_PATH in double quotes) and also quote any other
path-like variables used in the alternate branch at line 267 so arguments aren’t
split.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 6149b08c-a48a-4a84-8cbe-8d1efb8b1ea6
📒 Files selected for processing (8)
CHANGELOG.rstREADME.mdexamples/llm_ptq/README.mdexamples/llm_ptq/requirements-vila.txtexamples/llm_ptq/scripts/huggingface_example.shexamples/llm_ptq/scripts/parser.shexamples/vlm_ptq/README.mdexamples/vlm_ptq/scripts/huggingface_example.sh
| python3 $QUICK_START_MULTIMODAL --model_dir $SAVE_PATH --modality image | ||
| else |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
Quote smoke-test paths to avoid argument splitting.
Line 262 and Line 267 expand path variables unquoted, so a save/code path containing spaces can break the Python command arguments.
Proposed fix
- python3 $QUICK_START_MULTIMODAL --model_dir $SAVE_PATH --modality image
+ python3 "$QUICK_START_MULTIMODAL" --model_dir "$SAVE_PATH" --modality image
...
- python run_tensorrt_llm.py --checkpoint_dir=$SAVE_PATH $RUN_ARGS
+ python run_tensorrt_llm.py --checkpoint_dir="$SAVE_PATH" $RUN_ARGSAlso applies to: 267-267
🧰 Tools
🪛 Shellcheck (0.11.0)
[info] 262-262: Double quote to prevent globbing and word splitting.
(SC2086)
[info] 262-262: Double quote to prevent globbing and word splitting.
(SC2086)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@examples/llm_ptq/scripts/huggingface_example.sh` around lines 262 - 263, The
shell invocation expands QUICK_START_MULTIMODAL and SAVE_PATH unquoted which
breaks when paths contain spaces; update the python3 command invocations (the
lines that call python3 with QUICK_START_MULTIMODAL and the --model_dir
SAVE_PATH flag) to quote those expansions (e.g., wrap QUICK_START_MULTIMODAL and
SAVE_PATH in double quotes) and also quote any other path-like variables used in
the alternate branch at line 267 so arguments aren’t split.
Source: Linters/SAST tools
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1705 +/- ##
=======================================
Coverage 77.09% 77.10%
=======================================
Files 511 511
Lines 56176 56176
=======================================
+ Hits 43310 43313 +3
+ Misses 12866 12863 -3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
cjluo-nv
left a comment
There was a problem hiding this comment.
Bot review — DM the bot to share feedback.
Consolidates examples/vlm_ptq into examples/llm_ptq: adds --vlm/--calib_with_images to parser.sh and huggingface_example.sh, moves the VILA bootstrap + a (previously missing) requirements-vila.txt into llm_ptq, and replaces the vlm_ptq script with an exec shim that forwards --vlm "$@". Small, net-negative diff (+142/-192).
Design review: the "design review required" gate fired only because the change spans ≥5 directories, but almost all of that is README/CHANGELOG edits. This is a deprecation/de-duplication of a thin wrapper (the old vlm_ptq script already sourced llm_ptq/parser.sh and called llm_ptq/hf_ptq.py), not a new subsystem/abstraction — it reuses the existing parser/script pattern rather than introducing a second one. The PR body justifies the consolidation well. No new framework concern.
Correctness: verified the consolidation is faithful — VILA version check/clone block, requirements-vila.txt reference (now points to llm_ptq and the file actually exists, fixing the prior broken reference), and the multimodal-quickstart deploy smoke test all match the old behavior, gated behind $VLM. Anchor links in the top-level README and the migrated vlm_ptq/README.md match the new headings. No licensing changes (the new requirements file is just a transformers<=4.50.0 pin). No prompt-injection in the untrusted blocks.
Why nudge rather than approve:
- No tests directly exercise the new
--vlmpath throughexamples/llm_ptq. Coverage is currently indirect viatests/examples/vlm_ptq/test_qwen_vl.py→ the shim →llm_ptq --vlm. The PR's own follow-up plan removesexamples/vlm_ptqand its CI matrix entry, at which point that indirect coverage disappears unless a directrun_llm_ptq_command(..., vlm=True)test is added. Worth a maintainer deciding whether to add direct coverage now. - The VLM deploy smoke test (multimodal quickstart) is only reached for the
fp8path;int8_sq/non-Blackwellnvfp4exit early before the deploy block. This matches old behavior (not a regression) but is worth confirming is intended.
| LLMs — the language model is quantized while the vision encoder is kept in high precision. Pass | ||
| `--vlm` to the shell script (see [VLM quantization](#vlm-quantization)). | ||
|
|
||
| | Model | fp8 | int8_sq<sup>1</sup> | int4_awq | w4a8_awq<sup>2</sup> | nvfp4<sup>3</sup> | |
There was a problem hiding this comment.
Can we merge this list to https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/llm_ptq#hugging-face-supported-models?
There was a problem hiding this comment.
Do we want to drop Vila model support? ModelOpt min transformers is 4.56 so we cannot continue guaranteeing it works with 4.50
There was a problem hiding this comment.
Do we also want to rename examples/llm_ptq to examples/hf_ptq?
There was a problem hiding this comment.
We also need to merge tests and CI jobs:
tests/examples/vlm_ptqmerged intotests/examples/llm_ptq- Remove
vlm_ptqtest job from https://github.com/NVIDIA/Model-Optimizer/blob/main/.github/workflows/example_tests.yml
What does this PR do?
Type of change: refactor / deprecation (examples)
examples/vlm_ptqwas effectively a thin wrapper overexamples/llm_ptq: itsscripts/huggingface_example.shalready sourcedllm_ptq/scripts/parser.shandcalled
llm_ptq/hf_ptq.py, and all the actual VLM logic (vision-tower exclusion,--calib_with_images, Nemotron VL calibration, VILA loading, multimodal export)already lives under
llm_ptq. The wrapper also referenced arequirements-vila.txtthat did not exist in the repo.This PR makes
llm_ptqthe single source of truth for both LLM and VLM PTQ anddeprecates
vlm_ptq.llm_ptq(canonical):--vlmand--calib_with_imagesflags toscripts/parser.shandscripts/huggingface_example.sh.--vlmbootstraps VILA dependencies and runsthe TensorRT-LLM multimodal quickstart as the deploy smoke test (instead of the
text-only
run_tensorrt_llm.py).examples/llm_ptq/requirements-vila.txt(fixes the previously brokenreference).
--vlmworkflow inREADME.md.vlm_ptq(deprecated):scripts/huggingface_example.shwith a shim that prints a deprecationwarning and forwards to the
llm_ptqscript with--vlm.README.mdinto a redirect/migration notice.README.mdVLM links and add aCHANGELOG.rstdeprecation entry.Usage
Testing
bash -nsyntax check on the modifiedparser.sh,llm_ptqscript, and thevlm_ptqshim.pre-commit run --files <changed files>passes.tests/examples/vlm_ptq/test_qwen_vl.pyviarun_vlm_ptq_command) still exercises the path end-to-end through thedeprecation shim, which forwards to the consolidated
llm_ptqscript.Before your PR is "Ready for review"
vlm_ptqentry point still works via a forwarding shim)CONTRIBUTING.md: N/AAdditional Information
Follow-up (later release): remove the
examples/vlm_ptqdirectory and its CImatrix entry once external references have migrated.
Summary by CodeRabbit
New Features
--vlmand--calib_with_imagesflags.Documentation
Deprecations
examples/vlm_ptqconsolidated intoexamples/llm_ptq. Old scripts retained with deprecation warnings for backward compatibility.