-
Notifications
You must be signed in to change notification settings - Fork 674
[Models] Add Qwen3-VL Model Support #5763
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
|
Thanks for your contribution! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds comprehensive support for the Qwen3-VL multimodal model to the FastDeploy framework. The implementation includes model architecture, input processing pipelines, and integration with existing training/inference infrastructure.
Key Changes:
- Complete Qwen3-VL model implementation with vision transformer and language model components
- Multimodal input processors for images and videos with 3D rotary position embeddings
- Reinforcement learning support through rollout model integration
- Updated worker infrastructure to support Qwen3-VL's rope_scaling configuration
Reviewed changes
Copilot reviewed 29 out of 31 changed files in this pull request and generated 15 comments.
Show a summary per file
| File | Description |
|---|---|
| fastdeploy/model_executor/models/qwen3_vl/ | Core model architecture including vision transformer with DFNROPE and language model integration |
| fastdeploy/input/qwen3_vl_processor/ | Input processing pipeline for tokenization, image/video preprocessing, and position encoding |
| fastdeploy/rl/rollout_model.py | RL support class for Qwen3-VL with weight mapping utilities |
| fastdeploy/worker/*_model_runner.py | Added rope_scaling parameter support for 3D RoPE computation |
| fastdeploy/config.py | Configuration updates to handle Qwen3-VL's nested text_config and rope_scaling |
| fastdeploy/input/mm_data_processor.py | New abstract base class for multimodal data processors |
| tests/input/test_qwen3_vl_processor.py | Comprehensive unit tests for the Qwen3-VL processor |
| tests/e2e/Qwen3VL_RL/test_rollout_model.py | End-to-end test for RL model rollout |
Motivation
Add Qwen3-VL Model Support
Modifications
1.组网部分:text、vit、processor
其中使用了交错3d位置编码,所以位置编码部分透传了rope_scaling
2.目前多模模型的config.json中文本和多模分别在不同的大key中,如本文config在text_config中,本PR兼容该现状
3.对于视频请求,Qwen3VL计算多模token数的方式为 t * h * w // 4 , 与ernie vl 的 t * h * w // 4 // 2不同,所以在Processor封装了计算token数的函数,传递给worker,在需要时使用该函数计算,以达到通用性
4.get_img_boundaries存在数组越界问题(下图),该PR修复了该问题,并优化为传入多模token数,保证通用性,而不是固定 h*w // 4

Usage or Command
python -m fastdeploy.entrypoints.openai.api_server \ --model you/path/Qwen3-VL-4B-Instruct \ --port 8801 --metrics-port 8181 -engine-worker-queue-port 8182 --cache-queue-port 8183 \ --max-num-seqs 32Accuracy Tests
result
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.