Skip to content

[Refactor] Replace --skip-mm-profiling with --deploy-modality text#7048

Open
kevincheng2 wants to merge 2 commits intoPaddlePaddle:developfrom
kevincheng2:feature/skip-mm-profiling
Open

[Refactor] Replace --skip-mm-profiling with --deploy-modality text#7048
kevincheng2 wants to merge 2 commits intoPaddlePaddle:developfrom
kevincheng2:feature/skip-mm-profiling

Conversation

@kevincheng2
Copy link
Copy Markdown
Collaborator

@kevincheng2 kevincheng2 commented Mar 27, 2026

Motivation

--skip-mm-profiling 参数与已有的 --deploy-modality 参数功能存在语义重叠:
当以纯文本模式(--deploy-modality text)部署时,本就不需要为多模态 token 预留显存。
引入独立参数增加了配置复杂度,复用 deploy_modality 更加直观和一致。

Modifications

  • fastdeploy/engine/args_utils.py:删除 EngineArgs.skip_mm_profiling 字段及
    --skip-mm-profiling 启动参数
  • fastdeploy/config.py:删除 ModelConfig.__init__ 中的 self.skip_mm_profiling = False
    FDConfig.get_max_chunk_tokens 中将条件改为
    self.deploy_modality != DeployModality.TEXT
    当 deploy_modality 为 text 时直接返回 max_num_batched_tokens,跳过 mm token 叠加

Usage or Command

# 以文本模式部署,跳过 mm token profiling 开销(替代原 --skip-mm-profiling)
python -m fastdeploy.entrypoints.openai.api_server \
  --deploy-modality text \
  --model /path/to/model \
  ...

Checklist

  • Add at least a tag in the PR title.
  • Format your code, run pre-commit before commit.
  • Add unit tests. 本次为参数重构,逻辑等价替换,已有 config 单元测试覆盖。

…ad in profiling

## Motivation

在多模态模型(如 Qwen2.5-VL、ERNIE4.5-VL 等)部署时,`get_max_chunk_tokens` 会在
基础 token 数之上额外叠加 mm token 数,用于 profiling 阶段预留显存。

某些场景下(如已知图像 token 数较小,或希望节省显存),用户希望跳过该多模态 token
额外开销的计算,直接使用文本 token 数进行 profiling。

## Modifications

- `fastdeploy/engine/args_utils.py`:`EngineArgs` 新增 `skip_mm_profiling: bool = False`
  字段,parser 新增 `--skip-mm-profiling` 启动参数
- `fastdeploy/config.py`:`ModelConfig.__init__` 新增 `self.skip_mm_profiling = False`;
  `FDConfig.get_max_chunk_tokens` 中增加 `not self.model_config.skip_mm_profiling` 判断,
  开启后跳过 mm token 叠加,直接返回基础 `num_tokens`

## Usage or Command

启动服务时添加参数:
```bash
--skip-mm-profiling
```

## Checklist

- [x] Add at least a tag in the PR title.
- [x] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. 本功能为配置参数透传,逻辑简单,已有相关 config 单元测试覆盖。

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@paddle-bot
Copy link
Copy Markdown

paddle-bot bot commented Mar 27, 2026

Thanks for your contribution!

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Mar 27, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (develop@6693bcd). Learn more about missing BASE report.

Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #7048   +/-   ##
==========================================
  Coverage           ?   73.70%           
==========================================
  Files              ?      399           
  Lines              ?    56412           
  Branches           ?     8919           
==========================================
  Hits               ?    41577           
  Misses             ?    11886           
  Partials           ?     2949           
Flag Coverage Δ
GPU 73.70% <100.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…p mm profiling

## Motivation

原 `--skip-mm-profiling` 参数与已有的 `deploy_modality` 参数功能存在语义重叠:
当以纯文本模式(`deploy_modality=text`)部署时,本就不需要为多模态 token 预留显存。
引入独立参数增加了配置复杂度,复用 `deploy_modality` 更加直观和一致。

## Modifications

- `fastdeploy/engine/args_utils.py`:删除 `EngineArgs.skip_mm_profiling` 字段及
  `--skip-mm-profiling` 启动参数
- `fastdeploy/config.py`:删除 `ModelConfig.__init__` 中的 `self.skip_mm_profiling = False`;
  `FDConfig.get_max_chunk_tokens` 中将条件改为
  `self.deploy_modality != DeployModality.TEXT`,
  当 deploy_modality 为 text 时直接返回 `max_num_batched_tokens`,跳过 mm token 叠加

## Usage or Command

```bash
# 以文本模式部署,跳过 mm token profiling 开销(替代原 --skip-mm-profiling)
python -m fastdeploy.entrypoints.openai.api_server \
  --deploy-modality text \
  --model /path/to/model \
  ...
```

## Checklist

- [x] Add at least a tag in the PR title.
- [x] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. 本次为参数重构,逻辑等价替换,已有 config 单元测试覆盖。

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@kevincheng2 kevincheng2 changed the title [Feature] Support --skip-mm-profiling to skip multimodal token overhead in profiling [Refactor] Replace --skip-mm-profiling with --deploy-modality text Mar 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants