[Refactor] Replace --skip-mm-profiling with --deploy-modality text#7048
Open
kevincheng2 wants to merge 2 commits intoPaddlePaddle:developfrom
Open
[Refactor] Replace --skip-mm-profiling with --deploy-modality text#7048kevincheng2 wants to merge 2 commits intoPaddlePaddle:developfrom
kevincheng2 wants to merge 2 commits intoPaddlePaddle:developfrom
Conversation
…ad in profiling ## Motivation 在多模态模型(如 Qwen2.5-VL、ERNIE4.5-VL 等)部署时,`get_max_chunk_tokens` 会在 基础 token 数之上额外叠加 mm token 数,用于 profiling 阶段预留显存。 某些场景下(如已知图像 token 数较小,或希望节省显存),用户希望跳过该多模态 token 额外开销的计算,直接使用文本 token 数进行 profiling。 ## Modifications - `fastdeploy/engine/args_utils.py`:`EngineArgs` 新增 `skip_mm_profiling: bool = False` 字段,parser 新增 `--skip-mm-profiling` 启动参数 - `fastdeploy/config.py`:`ModelConfig.__init__` 新增 `self.skip_mm_profiling = False`; `FDConfig.get_max_chunk_tokens` 中增加 `not self.model_config.skip_mm_profiling` 判断, 开启后跳过 mm token 叠加,直接返回基础 `num_tokens` ## Usage or Command 启动服务时添加参数: ```bash --skip-mm-profiling ``` ## Checklist - [x] Add at least a tag in the PR title. - [x] Format your code, run `pre-commit` before commit. - [ ] Add unit tests. 本功能为配置参数透传,逻辑简单,已有相关 config 单元测试覆盖。 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Thanks for your contribution! |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #7048 +/- ##
==========================================
Coverage ? 73.70%
==========================================
Files ? 399
Lines ? 56412
Branches ? 8919
==========================================
Hits ? 41577
Misses ? 11886
Partials ? 2949
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…p mm profiling ## Motivation 原 `--skip-mm-profiling` 参数与已有的 `deploy_modality` 参数功能存在语义重叠: 当以纯文本模式(`deploy_modality=text`)部署时,本就不需要为多模态 token 预留显存。 引入独立参数增加了配置复杂度,复用 `deploy_modality` 更加直观和一致。 ## Modifications - `fastdeploy/engine/args_utils.py`:删除 `EngineArgs.skip_mm_profiling` 字段及 `--skip-mm-profiling` 启动参数 - `fastdeploy/config.py`:删除 `ModelConfig.__init__` 中的 `self.skip_mm_profiling = False`; `FDConfig.get_max_chunk_tokens` 中将条件改为 `self.deploy_modality != DeployModality.TEXT`, 当 deploy_modality 为 text 时直接返回 `max_num_batched_tokens`,跳过 mm token 叠加 ## Usage or Command ```bash # 以文本模式部署,跳过 mm token profiling 开销(替代原 --skip-mm-profiling) python -m fastdeploy.entrypoints.openai.api_server \ --deploy-modality text \ --model /path/to/model \ ... ``` ## Checklist - [x] Add at least a tag in the PR title. - [x] Format your code, run `pre-commit` before commit. - [ ] Add unit tests. 本次为参数重构,逻辑等价替换,已有 config 单元测试覆盖。 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
原
--skip-mm-profiling参数与已有的--deploy-modality参数功能存在语义重叠:当以纯文本模式(
--deploy-modality text)部署时,本就不需要为多模态 token 预留显存。引入独立参数增加了配置复杂度,复用
deploy_modality更加直观和一致。Modifications
fastdeploy/engine/args_utils.py:删除EngineArgs.skip_mm_profiling字段及--skip-mm-profiling启动参数fastdeploy/config.py:删除ModelConfig.__init__中的self.skip_mm_profiling = False;FDConfig.get_max_chunk_tokens中将条件改为self.deploy_modality != DeployModality.TEXT,当 deploy_modality 为 text 时直接返回
max_num_batched_tokens,跳过 mm token 叠加Usage or Command
# 以文本模式部署,跳过 mm token profiling 开销(替代原 --skip-mm-profiling) python -m fastdeploy.entrypoints.openai.api_server \ --deploy-modality text \ --model /path/to/model \ ...Checklist
pre-commitbefore commit.