[Refactor] Replace --skip-mm-profiling with --deploy-modality text by kevincheng2 · Pull Request #7048 · PaddlePaddle/FastDeploy

kevincheng2 · 2026-03-27T07:57:32Z

Motivation

原 --skip-mm-profiling 参数与已有的 --deploy-modality 参数功能存在语义重叠：
当以纯文本模式（--deploy-modality text）部署时，本就不需要为多模态 token 预留显存。
引入独立参数增加了配置复杂度，复用 deploy_modality 更加直观和一致。

Modifications

fastdeploy/engine/args_utils.py：删除 EngineArgs.skip_mm_profiling 字段及
--skip-mm-profiling 启动参数
fastdeploy/config.py：删除 ModelConfig.__init__ 中的 self.skip_mm_profiling = False；
FDConfig.get_max_chunk_tokens 中将条件改为
self.deploy_modality != DeployModality.TEXT，
当 deploy_modality 为 text 时直接返回 max_num_batched_tokens，跳过 mm token 叠加

Usage or Command

# 以文本模式部署，跳过 mm token profiling 开销（替代原 --skip-mm-profiling）
python -m fastdeploy.entrypoints.openai.api_server \
  --deploy-modality text \
  --model /path/to/model \
  ...

Checklist

Add at least a tag in the PR title.
Format your code, run pre-commit before commit.
Add unit tests. 本次为参数重构，逻辑等价替换，已有 config 单元测试覆盖。

…ad in profiling ## Motivation 在多模态模型（如 Qwen2.5-VL、ERNIE4.5-VL 等）部署时，`get_max_chunk_tokens` 会在基础 token 数之上额外叠加 mm token 数，用于 profiling 阶段预留显存。某些场景下（如已知图像 token 数较小，或希望节省显存），用户希望跳过该多模态 token 额外开销的计算，直接使用文本 token 数进行 profiling。 ## Modifications - `fastdeploy/engine/args_utils.py`：`EngineArgs` 新增 `skip_mm_profiling: bool = False` 字段，parser 新增 `--skip-mm-profiling` 启动参数 - `fastdeploy/config.py`：`ModelConfig.__init__` 新增 `self.skip_mm_profiling = False`； `FDConfig.get_max_chunk_tokens` 中增加 `not self.model_config.skip_mm_profiling` 判断，开启后跳过 mm token 叠加，直接返回基础 `num_tokens` ## Usage or Command 启动服务时添加参数： ```bash --skip-mm-profiling ``` ## Checklist - [x] Add at least a tag in the PR title. - [x] Format your code, run `pre-commit` before commit. - [ ] Add unit tests. 本功能为配置参数透传，逻辑简单，已有相关 config 单元测试覆盖。 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

paddle-bot · 2026-03-27T07:57:39Z

Thanks for your contribution!

codecov-commenter · 2026-03-27T10:16:37Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (develop@6693bcd). Learn more about missing BASE report.

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #7048   +/-   ##
==========================================
  Coverage           ?   73.70%           
==========================================
  Files              ?      399           
  Lines              ?    56412           
  Branches           ?     8919           
==========================================
  Hits               ?    41577           
  Misses             ?    11886           
  Partials           ?     2949

Flag	Coverage Δ
GPU	`73.70% <100.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…p mm profiling ## Motivation 原 `--skip-mm-profiling` 参数与已有的 `deploy_modality` 参数功能存在语义重叠：当以纯文本模式（`deploy_modality=text`）部署时，本就不需要为多模态 token 预留显存。引入独立参数增加了配置复杂度，复用 `deploy_modality` 更加直观和一致。 ## Modifications - `fastdeploy/engine/args_utils.py`：删除 `EngineArgs.skip_mm_profiling` 字段及 `--skip-mm-profiling` 启动参数 - `fastdeploy/config.py`：删除 `ModelConfig.__init__` 中的 `self.skip_mm_profiling = False`； `FDConfig.get_max_chunk_tokens` 中将条件改为 `self.deploy_modality != DeployModality.TEXT`，当 deploy_modality 为 text 时直接返回 `max_num_batched_tokens`，跳过 mm token 叠加 ## Usage or Command ```bash # 以文本模式部署，跳过 mm token profiling 开销（替代原 --skip-mm-profiling） python -m fastdeploy.entrypoints.openai.api_server \ --deploy-modality text \ --model /path/to/model \ ... ``` ## Checklist - [x] Add at least a tag in the PR title. - [x] Format your code, run `pre-commit` before commit. - [ ] Add unit tests. 本次为参数重构，逻辑等价替换，已有 config 单元测试覆盖。 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

kevincheng2 temporarily deployed to Metax_ci March 27, 2026 07:57 — with GitHub Actions Inactive

kevincheng2 added cherry-pick: release/2.4 cherry-pick: release/2.5 labels Mar 27, 2026

kevincheng2 temporarily deployed to Metax_ci March 27, 2026 10:40 — with GitHub Actions Inactive

kevincheng2 changed the title ~~[Feature] Support --skip-mm-profiling to skip multimodal token overhead in profiling~~ [Refactor] Replace --skip-mm-profiling with --deploy-modality text Mar 27, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Refactor] Replace --skip-mm-profiling with --deploy-modality text#7048

[Refactor] Replace --skip-mm-profiling with --deploy-modality text#7048
kevincheng2 wants to merge 2 commits intoPaddlePaddle:developfrom
kevincheng2:feature/skip-mm-profiling

kevincheng2 commented Mar 27, 2026 •

edited

Loading

Uh oh!

paddle-bot bot commented Mar 27, 2026

Uh oh!

codecov-commenter commented Mar 27, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kevincheng2 commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Usage or Command

Checklist

Uh oh!

paddle-bot bot commented Mar 27, 2026

Uh oh!

codecov-commenter commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kevincheng2 commented Mar 27, 2026 •

edited

Loading

codecov-commenter commented Mar 27, 2026 •

edited

Loading