refactor(examples): consolidate vlm_ptq into llm_ptq by Edwardf0t1 · Pull Request #1705 · NVIDIA/Model-Optimizer

Edwardf0t1 · 2026-06-12T22:47:53Z

What does this PR do?

Type of change: refactor / deprecation (examples)

examples/vlm_ptq was effectively a thin wrapper over examples/llm_ptq: its
scripts/huggingface_example.sh already sourced llm_ptq/scripts/parser.sh and
called llm_ptq/hf_ptq.py, and all the actual VLM logic (vision-tower exclusion,
--calib_with_images, Nemotron VL calibration, VILA loading, multimodal export)
already lives under llm_ptq. The wrapper also referenced a
requirements-vila.txt that did not exist in the repo.

This PR makes llm_ptq the single source of truth for both LLM and VLM PTQ and
deprecates vlm_ptq.

llm_ptq (canonical):

Add --vlm and --calib_with_images flags to scripts/parser.sh and
scripts/huggingface_example.sh. --vlm bootstraps VILA dependencies and runs
the TensorRT-LLM multimodal quickstart as the deploy smoke test (instead of the
text-only run_tensorrt_llm.py).
Add examples/llm_ptq/requirements-vila.txt (fixes the previously broken
reference).
Document the VLM support matrix and the --vlm workflow in README.md.

vlm_ptq (deprecated):

Replace scripts/huggingface_example.sh with a shim that prints a deprecation
warning and forwards to the llm_ptq script with --vlm.
Convert README.md into a redirect/migration notice.
Repoint root README.md VLM links and add a CHANGELOG.rst deprecation entry.

Usage

cd examples/llm_ptq
# VLM PTQ (was: examples/vlm_ptq/scripts/huggingface_example.sh)
scripts/huggingface_example.sh --model <hf_model> --quant fp8 --vlm

# VLM image-text calibration
scripts/huggingface_example.sh --model <hf_model> --quant nvfp4 --vlm --calib_with_images --trust_remote_code

Testing

bash -n syntax check on the modified parser.sh, llm_ptq script, and the
vlm_ptq shim.
pre-commit run --files <changed files> passes.
The existing VLM example test (tests/examples/vlm_ptq/test_qwen_vl.py via
run_vlm_ptq_command) still exercises the path end-to-end through the
deprecation shim, which forwards to the consolidated llm_ptq script.

Before your PR is "Ready for review"

Is this change backward compatible?: ✅ (old vlm_ptq entry point still works via a forwarding shim)
If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A
Did you write any new necessary tests?: N/A (existing VLM test still covers the consolidated path)
Did you update Changelog?: ✅
Did you get Claude approval on this PR?: ❌

Additional Information

Follow-up (later release): remove the examples/vlm_ptq directory and its CI
matrix entry once external references have migrated.

Summary by CodeRabbit

New Features
- Vision Language Model (VLM) quantization now integrated into the main LLM PTQ example with --vlm and --calib_with_images flags.
Documentation
- Added comprehensive VLM quantization guidance, including model support matrix and usage examples.
Deprecations
- examples/vlm_ptq consolidated into examples/llm_ptq. Old scripts retained with deprecation warnings for backward compatibility.

VLM PTQ already ran entirely through examples/llm_ptq (the vlm_ptq shell script sourced llm_ptq/parser.sh and called llm_ptq/hf_ptq.py), so the vlm_ptq example was effectively a thin, partially-broken wrapper. Make llm_ptq the single source of truth for both LLM and VLM PTQ: - Add --vlm and --calib_with_images flags to scripts/parser.sh and scripts/huggingface_example.sh. --vlm bootstraps VILA deps and runs the TRT-LLM multimodal quickstart as the deploy smoke test. - Add examples/llm_ptq/requirements-vila.txt (the vlm_ptq script referenced a requirements-vila.txt that never existed in the repo). - Document the VLM support matrix and --vlm workflow in llm_ptq/README.md. Deprecate examples/vlm_ptq: - Replace its huggingface_example.sh with a shim that warns and forwards to the llm_ptq script with --vlm (backward compatible). - Turn its README into a redirect/migration notice. - Repoint root README VLM links and add a CHANGELOG deprecation entry. Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>

coderabbitai · 2026-06-12T22:48:06Z

📝 Walkthrough

Walkthrough

This PR consolidates VLM Post-Training Quantization into the LLM PTQ pipeline. Users now run LLM PTQ with a --vlm flag instead of a separate vlm_ptq example. The old vlm_ptq entry point remains as a deprecated shim that forwards calls with backward compatibility. VLM-specific setup (VILA repo cloning, transformer version constraints) and smoke tests are now integrated into the LLM PTQ scripts.

Changes

VLM PTQ consolidation into LLM PTQ

Layer / File(s)	Summary
Command-line option parsing for VLM support `examples/llm_ptq/scripts/parser.sh`	`--vlm` and `--calib_with_images` flags are added to option parsing, initialized in variable scope, and included in configuration output.
VILA setup and VLM-specific execution logic `examples/llm_ptq/scripts/huggingface_example.sh`	When `--vlm` is enabled and model contains "vila", the script enforces transformers version constraints, installs VILA requirements, clones VILA repository at a pinned commit, and uses TRT-LLM multimodal quickstart for deploy smoke tests instead of the standard `run_tensorrt_llm.py`.
VLM dependencies and integrated PTQ documentation `examples/llm_ptq/requirements-vila.txt`, `examples/llm_ptq/README.md`	New VILA requirements file specifies transformers `<=4.50.0` constraint. LLM PTQ README adds VLM model support table with format constraints, usage examples with `--vlm`, and image-text calibration command examples.
Deprecated VLM PTQ forwarding and migration guide `examples/vlm_ptq/scripts/huggingface_example.sh`, `examples/vlm_ptq/README.md`	The `vlm_ptq` huggingface example script becomes a deprecated shim that prints warnings and forwards to `llm_ptq` with `--vlm`. The README is condensed into a deprecation notice with a migration table and reduced resources section.
Root documentation and changelog updates `README.md`, `CHANGELOG.rst`	Main README updates VLM quantization links from `vlm_ptq` paths to `llm_ptq` README sections. CHANGELOG documents the consolidation and deprecation plan.

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 6

✅ Passed checks (6 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'refactor(examples): consolidate vlm_ptq into llm_ptq' accurately summarizes the main change: consolidating the VLM PTQ example into the LLM PTQ example as a single source of truth.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns	✅ Passed	No tracked file diffs detected. Scanned VLM PTQ-related Python files (examples/llm_ptq/hf_ptq.py, example_utils.py, modelopt/torch/utils/vlm_dataset_utils.py) for torch.load weights_only=False, num...

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch consolidate-vlm-ptq-into-llm-ptq

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Warning

CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.

Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.

👉 Steps to fix this

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@examples/llm_ptq/scripts/huggingface_example.sh`:
- Around line 262-263: The shell invocation expands QUICK_START_MULTIMODAL and
SAVE_PATH unquoted which breaks when paths contain spaces; update the python3
command invocations (the lines that call python3 with QUICK_START_MULTIMODAL and
the --model_dir SAVE_PATH flag) to quote those expansions (e.g., wrap
QUICK_START_MULTIMODAL and SAVE_PATH in double quotes) and also quote any other
path-like variables used in the alternate branch at line 267 so arguments aren’t
split.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 6149b08c-a48a-4a84-8cbe-8d1efb8b1ea6

📥 Commits

Reviewing files that changed from the base of the PR and between d26c8af and 3d42843.

📒 Files selected for processing (8)

CHANGELOG.rst
README.md
examples/llm_ptq/README.md
examples/llm_ptq/requirements-vila.txt
examples/llm_ptq/scripts/huggingface_example.sh
examples/llm_ptq/scripts/parser.sh
examples/vlm_ptq/README.md
examples/vlm_ptq/scripts/huggingface_example.sh

coderabbitai · 2026-06-12T22:54:41Z

+                python3 $QUICK_START_MULTIMODAL --model_dir $SAVE_PATH --modality image
+            else


🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Quote smoke-test paths to avoid argument splitting.

Line 262 and Line 267 expand path variables unquoted, so a save/code path containing spaces can break the Python command arguments.

Proposed fix

- python3 $QUICK_START_MULTIMODAL --model_dir $SAVE_PATH --modality image + python3 "$QUICK_START_MULTIMODAL" --model_dir "$SAVE_PATH" --modality image ... - python run_tensorrt_llm.py --checkpoint_dir=$SAVE_PATH $RUN_ARGS + python run_tensorrt_llm.py --checkpoint_dir="$SAVE_PATH" $RUN_ARGS

Also applies to: 267-267

🧰 Tools

🪛 Shellcheck (0.11.0)

[info] 262-262: Double quote to prevent globbing and word splitting.

(SC2086)

[info] 262-262: Double quote to prevent globbing and word splitting.

(SC2086)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/llm_ptq/scripts/huggingface_example.sh` around lines 262 - 263, The shell invocation expands QUICK_START_MULTIMODAL and SAVE_PATH unquoted which breaks when paths contain spaces; update the python3 command invocations (the lines that call python3 with QUICK_START_MULTIMODAL and the --model_dir SAVE_PATH flag) to quote those expansions (e.g., wrap QUICK_START_MULTIMODAL and SAVE_PATH in double quotes) and also quote any other path-like variables used in the alternate branch at line 267 so arguments aren’t split.

Source: Linters/SAST tools

codecov · 2026-06-12T22:56:41Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 77.10%. Comparing base (ddc0a8e) to head (3d42843).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #1705   +/-   ##
=======================================
  Coverage   77.09%   77.10%           
=======================================
  Files         511      511           
  Lines       56176    56176           
=======================================
+ Hits        43310    43313    +3     
+ Misses      12866    12863    -3

Flag	Coverage Δ
examples	`41.83% <ø> (-0.12%)`	⬇️
unit	`54.34% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

cjluo-nv

Bot review — DM the bot to share feedback.

Consolidates examples/vlm_ptq into examples/llm_ptq: adds --vlm/--calib_with_images to parser.sh and huggingface_example.sh, moves the VILA bootstrap + a (previously missing) requirements-vila.txt into llm_ptq, and replaces the vlm_ptq script with an exec shim that forwards --vlm "$@". Small, net-negative diff (+142/-192).

Design review: the "design review required" gate fired only because the change spans ≥5 directories, but almost all of that is README/CHANGELOG edits. This is a deprecation/de-duplication of a thin wrapper (the old vlm_ptq script already sourced llm_ptq/parser.sh and called llm_ptq/hf_ptq.py), not a new subsystem/abstraction — it reuses the existing parser/script pattern rather than introducing a second one. The PR body justifies the consolidation well. No new framework concern.

Correctness: verified the consolidation is faithful — VILA version check/clone block, requirements-vila.txt reference (now points to llm_ptq and the file actually exists, fixing the prior broken reference), and the multimodal-quickstart deploy smoke test all match the old behavior, gated behind $VLM. Anchor links in the top-level README and the migrated vlm_ptq/README.md match the new headings. No licensing changes (the new requirements file is just a transformers<=4.50.0 pin). No prompt-injection in the untrusted blocks.

Why nudge rather than approve:

No tests directly exercise the new --vlm path through examples/llm_ptq. Coverage is currently indirect via tests/examples/vlm_ptq/test_qwen_vl.py → the shim → llm_ptq --vlm. The PR's own follow-up plan removes examples/vlm_ptq and its CI matrix entry, at which point that indirect coverage disappears unless a direct run_llm_ptq_command(..., vlm=True) test is added. Worth a maintainer deciding whether to add direct coverage now.
The VLM deploy smoke test (multimodal quickstart) is only reached for the fp8 path; int8_sq/non-Blackwell nvfp4 exit early before the deploy block. This matches old behavior (not a regression) but is worth confirming is intended.

meenchen

Is the refactor functionally equivalent?

meenchen · 2026-06-12T23:45:46Z

+LLMs — the language model is quantized while the vision encoder is kept in high precision. Pass
+`--vlm` to the shell script (see [VLM quantization](#vlm-quantization)).
+
+| Model | fp8 | int8_sq<sup>1</sup> | int4_awq | w4a8_awq<sup>2</sup> | nvfp4<sup>3</sup> |


Can we merge this list to https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/llm_ptq#hugging-face-supported-models?

shengliangxu

LGTM

kevalmorabia97 · 2026-06-13T05:52:03Z

Do we want to drop Vila model support? ModelOpt min transformers is 4.56 so we cannot continue guaranteeing it works with 4.50

kevalmorabia97 · 2026-06-13T05:53:54Z

Do we also want to rename examples/llm_ptq to examples/hf_ptq?

kevalmorabia97 · 2026-06-13T05:56:40Z

We also need to merge tests and CI jobs:

tests/examples/vlm_ptq merged into tests/examples/llm_ptq

Remove vlm_ptq test job from https://github.com/NVIDIA/Model-Optimizer/blob/main/.github/workflows/example_tests.yml

Edwardf0t1 requested review from a team as code owners June 12, 2026 22:47

Edwardf0t1 requested review from ajrasane, kevalmorabia97 and meenchen June 12, 2026 22:47

Edwardf0t1 requested review from chadvoegele, cjluo-nv, shengliangxu and yueshen2016 June 12, 2026 22:52

coderabbitai Bot reviewed Jun 12, 2026

View reviewed changes

cjluo-nv reviewed Jun 12, 2026

View reviewed changes

meenchen reviewed Jun 12, 2026

View reviewed changes

shengliangxu approved these changes Jun 13, 2026

View reviewed changes

kevalmorabia97 reviewed Jun 13, 2026

View reviewed changes

Comment thread examples/llm_ptq/README.md

kevalmorabia97 Jun 13, 2026

Copy link
Copy Markdown

Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we also want to rename examples/llm_ptq to examples/hf_ptq?

kevalmorabia97 reviewed Jun 13, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(examples): consolidate vlm_ptq into llm_ptq#1705

refactor(examples): consolidate vlm_ptq into llm_ptq#1705
Edwardf0t1 wants to merge 1 commit into
mainfrom
consolidate-vlm-ptq-into-llm-ptq

Edwardf0t1 commented Jun 12, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 12, 2026 •

edited

Loading

Walkthrough

Changes

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 12, 2026

Uh oh!

codecov Bot commented Jun 12, 2026 •

edited

Loading

Uh oh!

cjluo-nv left a comment

Uh oh!

meenchen left a comment •

edited

Loading

Uh oh!

meenchen Jun 12, 2026

Uh oh!

shengliangxu left a comment

Uh oh!

kevalmorabia97 Jun 13, 2026

Uh oh!

kevalmorabia97 Jun 13, 2026

Uh oh!

kevalmorabia97 Jun 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

		python3 $QUICK_START_MULTIMODAL --model_dir $SAVE_PATH --modality image
		else

Conversation

Edwardf0t1 commented Jun 12, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

cjluo-nv left a comment

Choose a reason for hiding this comment

Uh oh!

meenchen left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

meenchen Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

shengliangxu left a comment

Choose a reason for hiding this comment

Uh oh!

kevalmorabia97 Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

kevalmorabia97 Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

kevalmorabia97 Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Edwardf0t1 commented Jun 12, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 12, 2026 •

edited

Loading

codecov Bot commented Jun 12, 2026 •

edited

Loading

meenchen left a comment •

edited

Loading