[Models] Add Qwen3-VL Model Support #5763

CSWYF3634076 · 2025-12-25T07:42:35Z

Motivation

Add Qwen3-VL Model Support

Modifications

1.组网部分：text、vit、processor
其中使用了交错3d位置编码，所以位置编码部分透传了rope_scaling

2.目前多模模型的config.json中文本和多模分别在不同的大key中，如本文config在text_config中，本PR兼容该现状

3.对于视频请求，Qwen3VL计算多模token数的方式为 t * h * w // 4 , 与ernie vl 的 t * h * w // 4 // 2不同，所以在Processor封装了计算token数的函数，传递给worker，在需要时使用该函数计算，以达到通用性

4.get_img_boundaries存在数组越界问题（下图），该PR修复了该问题，并优化为传入多模token数，保证通用性，而不是固定 h*w // 4

Usage or Command

python -m fastdeploy.entrypoints.openai.api_server \
       --model you/path/Qwen3-VL-4B-Instruct \
       --port 8801  --metrics-port 8181  -engine-worker-queue-port 8182  --cache-queue-port 8183 \
       --max-num-seqs 32

Accuracy Tests

curl --location --request POST 'http://10.57.151.140:8801/v1/chat/completions' \
--header 'Authorization: Bearer $OPENAI_API_KEY' \
--header 'Content-Type: application/json' \
--data-raw '{
  "model": "ERNIE-45-VL-28B",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "Describe the content of the image"
        },
        {
          "type": "image_url",
          "image_url": {
            "url": "https://paddlenlp.bj.bcebos.com/datasets/paddlemix/demo_images/example2.jpg"
          }
        }
      ]
    }
  ],
  "temperature": 1,
  "top_p": 1,
  "max_tokens": 1024,
  "skip_special_tokens": false,
  "chat_template_kwargs": {
    "enable_thinking": false
  }
}'

result

This image displays a detailed stone Buddha statue, likely from a traditional East Asian Buddhist context.\n\nThe central figure is a Buddha seated in the Dhyana mudra (meditation pose), with legs crossed and hands resting in his lap. He has a serene, smiling expression, closed eyes, and a prominent, curly topknot (ushnisha). His robes are intricately carved to show folds and drapery, and some areas, particularly on the hands and head, show remnants of gold leaf or paint.\n\nBehind the Buddha is a large, ornate, arched mandorla (radiant halo). The mandorla is a complex structure of concentric rings filled with detailed carvings. The innermost ring contains a series of smaller, distinct figures, likely representing bodhisattvas or other enlightened beings. The outer parts of the mandorla feature elaborate swirling, floral, and foliate patterns.\n\nOn either side of the central Buddha, at the base of the arch, stand two smaller, standing figures. These are likely bodhisattvas or other celestial beings, each with their own small circular halo. Their postures are formal and respectful, flanking the central figure.\n\nThe entire sculpture is set on a rectangular base. The stone is a greyish color, showing signs of age and wear, and the craftsmanship is highly detailed, suggesting it is a significant religious artifact from a period of Buddhist art. The statue is photographed against a dark, neutral background, which makes the details of the carving stand out.

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

paddle-bot · 2025-12-25T07:42:57Z

Thanks for your contribution!

codecov-commenter · 2025-12-25T09:50:26Z

Codecov Report

❌ Patch coverage is 67.90987% with 413 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@0410c42). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
fastdeploy/input/qwen3_vl_processor/process.py	56.45%	118 Missing and 17 partials ⚠️
...tdeploy/model_executor/models/qwen3_vl/qwen3_vl.py	70.92%	55 Missing and 11 partials ⚠️
...model_executor/models/qwen3_vl/dfnrope/modeling.py	74.40%	59 Missing and 5 partials ⚠️
...loy/input/qwen3_vl_processor/qwen3_vl_processor.py	69.40%	25 Missing and 16 partials ⚠️
...deploy/input/qwen3_vl_processor/image_processor.py	75.42%	17 Missing and 12 partials ⚠️
...del_executor/models/qwen3_vl/dfnrope/activation.py	73.77%	15 Missing and 1 partial ⚠️
..._executor/models/qwen3_vl/dfnrope/configuration.py	28.57%	15 Missing ⚠️
...astdeploy/input/qwen_vl_processor/process_video.py	18.18%	9 Missing ⚠️
fastdeploy/input/preprocess.py	0.00%	8 Missing ⚠️
fastdeploy/input/ernie4_5_vl_processor/process.py	61.11%	3 Missing and 4 partials ⚠️
... and 6 more

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #5763   +/-   ##
==========================================
  Coverage           ?   65.56%           
==========================================
  Files              ?      346           
  Lines              ?    44337           
  Branches           ?     6810           
==========================================
  Hits               ?    29071           
  Misses             ?    13088           
  Partials           ?     2178

Flag	Coverage Δ
GPU	`65.56% <67.90%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot

Pull request overview

This PR adds comprehensive support for the Qwen3-VL multimodal model to the FastDeploy framework. The implementation includes model architecture, input processing pipelines, and integration with existing training/inference infrastructure.

Key Changes:

Complete Qwen3-VL model implementation with vision transformer and language model components
Multimodal input processors for images and videos with 3D rotary position embeddings
Reinforcement learning support through rollout model integration
Updated worker infrastructure to support Qwen3-VL's rope_scaling configuration

Reviewed changes

Copilot reviewed 29 out of 31 changed files in this pull request and generated 15 comments.

Show a summary per file

File	Description
fastdeploy/model_executor/models/qwen3_vl/	Core model architecture including vision transformer with DFNROPE and language model integration
fastdeploy/input/qwen3_vl_processor/	Input processing pipeline for tokenization, image/video preprocessing, and position encoding
fastdeploy/rl/rollout_model.py	RL support class for Qwen3-VL with weight mapping utilities
fastdeploy/worker/*_model_runner.py	Added rope_scaling parameter support for 3D RoPE computation
fastdeploy/config.py	Configuration updates to handle Qwen3-VL's nested text_config and rope_scaling
fastdeploy/input/mm_data_processor.py	New abstract base class for multimodal data processors
tests/input/test_qwen3_vl_processor.py	Comprehensive unit tests for the Qwen3-VL processor
tests/e2e/Qwen3VL_RL/test_rollout_model.py	End-to-end test for RL model rollout

tests/e2e/Qwen3VL_RL/test_rollout_model.py

fastdeploy/input/qwen3_vl_processor/process.py

fastdeploy/input/qwen_vl_processor/process_video.py

fastdeploy/config.py

fastdeploy/input/mm_data_processor.py

fastdeploy/input/ernie4_5_vl_processor/process.py

fastdeploy/input/paddleocr_vl_processor/process.py

fastdeploy/input/qwen_vl_processor/process.py

fastdeploy/input/qwen3_vl_processor/process.py

fastdeploy/config.py

aquagull and others added 10 commits November 21, 2025 17:05

support v1 loader

3cf395c

remove useless code

9083b4c

remove useless

87af261

[Model] support Qwen3VL images success

4a54034

[Model] support Qwen3VL rope_3d

09ad396

[Model] support Qwen3VL remove log

7b4f1e7

[Model] support Qwen3VL RL

8e6ea57

Merge branch 'develop' into qwen3vl

76dc3f7

[Model] support Qwen3VL tp

1cb8b2b

[Model] support Qwen3VL video

f54c2b9

CSWYF3634076 requested review from kevincheng2, ming1753 and yuanlehome December 25, 2025 07:42

CSWYF3634076 temporarily deployed to Metax_ci December 25, 2025 07:42 — with GitHub Actions Inactive

[Model] support Qwen3VL fix ernievl

5fb37d1

CSWYF3634076 had a problem deploying to Metax_ci December 25, 2025 08:11 — with GitHub Actions Failure

CSWYF3634076 requested a review from xiaoxiaohehe001 December 25, 2025 08:27

yuanlehome requested a review from Copilot December 25, 2025 10:21

Copilot started reviewing on behalf of yuanlehome December 25, 2025 10:21 View session

Copilot AI reviewed Dec 25, 2025

View reviewed changes

[Model] support Qwen3VL fix get_image_boundaries.cc array out of bounds

9bb9bce

CSWYF3634076 temporarily deployed to Metax_ci December 25, 2025 13:17 — with GitHub Actions Inactive

CSWYF3634076 changed the title ~~[Model] Add Qwen3-VL Model Support~~ [Models] Add Qwen3-VL Model Support Dec 25, 2025

[Model] support Qwen3VL fix multi card

fe462c2

CSWYF3634076 had a problem deploying to Metax_ci December 26, 2025 07:33 — with GitHub Actions Error

[Model] support Qwen3VL file close

9150d83

CSWYF3634076 had a problem deploying to Metax_ci December 26, 2025 07:45 — with GitHub Actions Failure

[Model] support Qwen3VL fix ce

99f57bb

CSWYF3634076 had a problem deploying to Metax_ci December 26, 2025 08:31 — with GitHub Actions Failure

[Model] support Qwen3VL fix unittest

a605c92

CSWYF3634076 temporarily deployed to Metax_ci December 26, 2025 11:18 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Models] Add Qwen3-VL Model Support #5763

[Models] Add Qwen3-VL Model Support #5763

Uh oh!

CSWYF3634076 commented Dec 25, 2025 •

edited

Loading

Uh oh!

paddle-bot bot commented Dec 25, 2025

Uh oh!

codecov-commenter commented Dec 25, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[Models] Add Qwen3-VL Model Support #5763

Are you sure you want to change the base?

[Models] Add Qwen3-VL Model Support #5763

Uh oh!

Conversation

CSWYF3634076 commented Dec 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot bot commented Dec 25, 2025

Uh oh!

codecov-commenter commented Dec 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Key Changes:

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

CSWYF3634076 commented Dec 25, 2025 •

edited

Loading

codecov-commenter commented Dec 25, 2025 •

edited

Loading