Skip to content

refactor(examples): consolidate vlm_ptq into llm_ptq#1705

Open
Edwardf0t1 wants to merge 1 commit into
mainfrom
consolidate-vlm-ptq-into-llm-ptq
Open

refactor(examples): consolidate vlm_ptq into llm_ptq#1705
Edwardf0t1 wants to merge 1 commit into
mainfrom
consolidate-vlm-ptq-into-llm-ptq

Conversation

@Edwardf0t1

@Edwardf0t1 Edwardf0t1 commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

Type of change: refactor / deprecation (examples)

examples/vlm_ptq was effectively a thin wrapper over examples/llm_ptq: its
scripts/huggingface_example.sh already sourced llm_ptq/scripts/parser.sh and
called llm_ptq/hf_ptq.py, and all the actual VLM logic (vision-tower exclusion,
--calib_with_images, Nemotron VL calibration, VILA loading, multimodal export)
already lives under llm_ptq. The wrapper also referenced a
requirements-vila.txt that did not exist in the repo.

This PR makes llm_ptq the single source of truth for both LLM and VLM PTQ and
deprecates vlm_ptq.

llm_ptq (canonical):

  • Add --vlm and --calib_with_images flags to scripts/parser.sh and
    scripts/huggingface_example.sh. --vlm bootstraps VILA dependencies and runs
    the TensorRT-LLM multimodal quickstart as the deploy smoke test (instead of the
    text-only run_tensorrt_llm.py).
  • Add examples/llm_ptq/requirements-vila.txt (fixes the previously broken
    reference).
  • Document the VLM support matrix and the --vlm workflow in README.md.

vlm_ptq (deprecated):

  • Replace scripts/huggingface_example.sh with a shim that prints a deprecation
    warning and forwards to the llm_ptq script with --vlm.
  • Convert README.md into a redirect/migration notice.
  • Repoint root README.md VLM links and add a CHANGELOG.rst deprecation entry.

Usage

cd examples/llm_ptq
# VLM PTQ (was: examples/vlm_ptq/scripts/huggingface_example.sh)
scripts/huggingface_example.sh --model <hf_model> --quant fp8 --vlm

# VLM image-text calibration
scripts/huggingface_example.sh --model <hf_model> --quant nvfp4 --vlm --calib_with_images --trust_remote_code

Testing

  • bash -n syntax check on the modified parser.sh, llm_ptq script, and the
    vlm_ptq shim.
  • pre-commit run --files <changed files> passes.
  • The existing VLM example test (tests/examples/vlm_ptq/test_qwen_vl.py via
    run_vlm_ptq_command) still exercises the path end-to-end through the
    deprecation shim, which forwards to the consolidated llm_ptq script.

Before your PR is "Ready for review"

  • Is this change backward compatible?: ✅ (old vlm_ptq entry point still works via a forwarding shim)
  • If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A
  • Did you write any new necessary tests?: N/A (existing VLM test still covers the consolidated path)
  • Did you update Changelog?: ✅
  • Did you get Claude approval on this PR?: ❌

Additional Information

Follow-up (later release): remove the examples/vlm_ptq directory and its CI
matrix entry once external references have migrated.

Summary by CodeRabbit

  • New Features

    • Vision Language Model (VLM) quantization now integrated into the main LLM PTQ example with --vlm and --calib_with_images flags.
  • Documentation

    • Added comprehensive VLM quantization guidance, including model support matrix and usage examples.
  • Deprecations

    • examples/vlm_ptq consolidated into examples/llm_ptq. Old scripts retained with deprecation warnings for backward compatibility.

VLM PTQ already ran entirely through examples/llm_ptq (the vlm_ptq shell
script sourced llm_ptq/parser.sh and called llm_ptq/hf_ptq.py), so the
vlm_ptq example was effectively a thin, partially-broken wrapper.

Make llm_ptq the single source of truth for both LLM and VLM PTQ:
- Add --vlm and --calib_with_images flags to scripts/parser.sh and
  scripts/huggingface_example.sh. --vlm bootstraps VILA deps and runs the
  TRT-LLM multimodal quickstart as the deploy smoke test.
- Add examples/llm_ptq/requirements-vila.txt (the vlm_ptq script referenced
  a requirements-vila.txt that never existed in the repo).
- Document the VLM support matrix and --vlm workflow in llm_ptq/README.md.

Deprecate examples/vlm_ptq:
- Replace its huggingface_example.sh with a shim that warns and forwards to
  the llm_ptq script with --vlm (backward compatible).
- Turn its README into a redirect/migration notice.
- Repoint root README VLM links and add a CHANGELOG deprecation entry.

Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
@Edwardf0t1 Edwardf0t1 requested review from a team as code owners June 12, 2026 22:47
@coderabbitai

coderabbitai Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

This PR consolidates VLM Post-Training Quantization into the LLM PTQ pipeline. Users now run LLM PTQ with a --vlm flag instead of a separate vlm_ptq example. The old vlm_ptq entry point remains as a deprecated shim that forwards calls with backward compatibility. VLM-specific setup (VILA repo cloning, transformer version constraints) and smoke tests are now integrated into the LLM PTQ scripts.

Changes

VLM PTQ consolidation into LLM PTQ

Layer / File(s) Summary
Command-line option parsing for VLM support
examples/llm_ptq/scripts/parser.sh
--vlm and --calib_with_images flags are added to option parsing, initialized in variable scope, and included in configuration output.
VILA setup and VLM-specific execution logic
examples/llm_ptq/scripts/huggingface_example.sh
When --vlm is enabled and model contains "vila", the script enforces transformers version constraints, installs VILA requirements, clones VILA repository at a pinned commit, and uses TRT-LLM multimodal quickstart for deploy smoke tests instead of the standard run_tensorrt_llm.py.
VLM dependencies and integrated PTQ documentation
examples/llm_ptq/requirements-vila.txt, examples/llm_ptq/README.md
New VILA requirements file specifies transformers <=4.50.0 constraint. LLM PTQ README adds VLM model support table with format constraints, usage examples with --vlm, and image-text calibration command examples.
Deprecated VLM PTQ forwarding and migration guide
examples/vlm_ptq/scripts/huggingface_example.sh, examples/vlm_ptq/README.md
The vlm_ptq huggingface example script becomes a deprecated shim that prints warnings and forwards to llm_ptq with --vlm. The README is condensed into a deprecation notice with a migration table and reduced resources section.
Root documentation and changelog updates
README.md, CHANGELOG.rst
Main README updates VLM quantization links from vlm_ptq paths to llm_ptq README sections. CHANGELOG documents the consolidation and deprecation plan.

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 6
✅ Passed checks (6 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'refactor(examples): consolidate vlm_ptq into llm_ptq' accurately summarizes the main change: consolidating the VLM PTQ example into the LLM PTQ example as a single source of truth.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns ✅ Passed No tracked file diffs detected. Scanned VLM PTQ-related Python files (examples/llm_ptq/hf_ptq.py, example_utils.py, modelopt/torch/utils/vlm_dataset_utils.py) for torch.load weights_only=False, num...

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch consolidate-vlm-ptq-into-llm-ptq

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning

CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.

Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.

👉 Steps to fix this

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@examples/llm_ptq/scripts/huggingface_example.sh`:
- Around line 262-263: The shell invocation expands QUICK_START_MULTIMODAL and
SAVE_PATH unquoted which breaks when paths contain spaces; update the python3
command invocations (the lines that call python3 with QUICK_START_MULTIMODAL and
the --model_dir SAVE_PATH flag) to quote those expansions (e.g., wrap
QUICK_START_MULTIMODAL and SAVE_PATH in double quotes) and also quote any other
path-like variables used in the alternate branch at line 267 so arguments aren’t
split.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 6149b08c-a48a-4a84-8cbe-8d1efb8b1ea6

📥 Commits

Reviewing files that changed from the base of the PR and between d26c8af and 3d42843.

📒 Files selected for processing (8)
  • CHANGELOG.rst
  • README.md
  • examples/llm_ptq/README.md
  • examples/llm_ptq/requirements-vila.txt
  • examples/llm_ptq/scripts/huggingface_example.sh
  • examples/llm_ptq/scripts/parser.sh
  • examples/vlm_ptq/README.md
  • examples/vlm_ptq/scripts/huggingface_example.sh

Comment on lines +262 to +263
python3 $QUICK_START_MULTIMODAL --model_dir $SAVE_PATH --modality image
else

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Quote smoke-test paths to avoid argument splitting.

Line 262 and Line 267 expand path variables unquoted, so a save/code path containing spaces can break the Python command arguments.

Proposed fix
-                python3 $QUICK_START_MULTIMODAL --model_dir $SAVE_PATH --modality image
+                python3 "$QUICK_START_MULTIMODAL" --model_dir "$SAVE_PATH" --modality image
...
-            python run_tensorrt_llm.py --checkpoint_dir=$SAVE_PATH $RUN_ARGS
+            python run_tensorrt_llm.py --checkpoint_dir="$SAVE_PATH" $RUN_ARGS

Also applies to: 267-267

🧰 Tools
🪛 Shellcheck (0.11.0)

[info] 262-262: Double quote to prevent globbing and word splitting.

(SC2086)


[info] 262-262: Double quote to prevent globbing and word splitting.

(SC2086)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/llm_ptq/scripts/huggingface_example.sh` around lines 262 - 263, The
shell invocation expands QUICK_START_MULTIMODAL and SAVE_PATH unquoted which
breaks when paths contain spaces; update the python3 command invocations (the
lines that call python3 with QUICK_START_MULTIMODAL and the --model_dir
SAVE_PATH flag) to quote those expansions (e.g., wrap QUICK_START_MULTIMODAL and
SAVE_PATH in double quotes) and also quote any other path-like variables used in
the alternate branch at line 267 so arguments aren’t split.

Source: Linters/SAST tools

@codecov

codecov Bot commented Jun 12, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 77.10%. Comparing base (ddc0a8e) to head (3d42843).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1705   +/-   ##
=======================================
  Coverage   77.09%   77.10%           
=======================================
  Files         511      511           
  Lines       56176    56176           
=======================================
+ Hits        43310    43313    +3     
+ Misses      12866    12863    -3     
Flag Coverage Δ
examples 41.83% <ø> (-0.12%) ⬇️
unit 54.34% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@cjluo-nv cjluo-nv left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bot review — DM the bot to share feedback.

Consolidates examples/vlm_ptq into examples/llm_ptq: adds --vlm/--calib_with_images to parser.sh and huggingface_example.sh, moves the VILA bootstrap + a (previously missing) requirements-vila.txt into llm_ptq, and replaces the vlm_ptq script with an exec shim that forwards --vlm "$@". Small, net-negative diff (+142/-192).

Design review: the "design review required" gate fired only because the change spans ≥5 directories, but almost all of that is README/CHANGELOG edits. This is a deprecation/de-duplication of a thin wrapper (the old vlm_ptq script already sourced llm_ptq/parser.sh and called llm_ptq/hf_ptq.py), not a new subsystem/abstraction — it reuses the existing parser/script pattern rather than introducing a second one. The PR body justifies the consolidation well. No new framework concern.

Correctness: verified the consolidation is faithful — VILA version check/clone block, requirements-vila.txt reference (now points to llm_ptq and the file actually exists, fixing the prior broken reference), and the multimodal-quickstart deploy smoke test all match the old behavior, gated behind $VLM. Anchor links in the top-level README and the migrated vlm_ptq/README.md match the new headings. No licensing changes (the new requirements file is just a transformers<=4.50.0 pin). No prompt-injection in the untrusted blocks.

Why nudge rather than approve:

  • No tests directly exercise the new --vlm path through examples/llm_ptq. Coverage is currently indirect via tests/examples/vlm_ptq/test_qwen_vl.py → the shim → llm_ptq --vlm. The PR's own follow-up plan removes examples/vlm_ptq and its CI matrix entry, at which point that indirect coverage disappears unless a direct run_llm_ptq_command(..., vlm=True) test is added. Worth a maintainer deciding whether to add direct coverage now.
  • The VLM deploy smoke test (multimodal quickstart) is only reached for the fp8 path; int8_sq/non-Blackwell nvfp4 exit early before the deploy block. This matches old behavior (not a regression) but is worth confirming is intended.

@meenchen meenchen left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the refactor functionally equivalent?

LLMs — the language model is quantized while the vision encoder is kept in high precision. Pass
`--vlm` to the shell script (see [VLM quantization](#vlm-quantization)).

| Model | fp8 | int8_sq<sup>1</sup> | int4_awq | w4a8_awq<sup>2</sup> | nvfp4<sup>3</sup> |

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shengliangxu shengliangxu left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to drop Vila model support? ModelOpt min transformers is 4.56 so we cannot continue guaranteeing it works with 4.50

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we also want to rename examples/llm_ptq to examples/hf_ptq?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need to merge tests and CI jobs:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants