Skip to content

Conversation

@CSWYF3634076
Copy link
Collaborator

@CSWYF3634076 CSWYF3634076 commented Dec 25, 2025

Motivation

Add Qwen3-VL Model Support

Modifications

1.组网部分:text、vit、processor
其中使用了交错3d位置编码,所以位置编码部分透传了rope_scaling

2.目前多模模型的config.json中文本和多模分别在不同的大key中,如本文config在text_config中,本PR兼容该现状

3.对于视频请求,Qwen3VL计算多模token数的方式为 t * h * w // 4 , 与ernie vl 的 t * h * w // 4 // 2不同,所以在Processor封装了计算token数的函数,传递给worker,在需要时使用该函数计算,以达到通用性

4.get_img_boundaries存在数组越界问题(下图),该PR修复了该问题,并优化为传入多模token数,保证通用性,而不是固定 h*w // 4
3d1c062002d28ea44f71ca57051b1c5d

Usage or Command

python -m fastdeploy.entrypoints.openai.api_server \
       --model you/path/Qwen3-VL-4B-Instruct \
       --port 8801  --metrics-port 8181  -engine-worker-queue-port 8182  --cache-queue-port 8183 \
       --max-num-seqs 32

Accuracy Tests

curl --location --request POST 'http://10.57.151.140:8801/v1/chat/completions' \
--header 'Authorization: Bearer $OPENAI_API_KEY' \
--header 'Content-Type: application/json' \
--data-raw '{
  "model": "ERNIE-45-VL-28B",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "Describe the content of the image"
        },
        {
          "type": "image_url",
          "image_url": {
            "url": "https://paddlenlp.bj.bcebos.com/datasets/paddlemix/demo_images/example2.jpg"
          }
        }
      ]
    }
  ],
  "temperature": 1,
  "top_p": 1,
  "max_tokens": 1024,
  "skip_special_tokens": false,
  "chat_template_kwargs": {
    "enable_thinking": false
  }
}'

result

This image displays a detailed stone Buddha statue, likely from a traditional East Asian Buddhist context.\n\nThe central figure is a Buddha seated in the Dhyana mudra (meditation pose), with legs crossed and hands resting in his lap. He has a serene, smiling expression, closed eyes, and a prominent, curly topknot (ushnisha). His robes are intricately carved to show folds and drapery, and some areas, particularly on the hands and head, show remnants of gold leaf or paint.\n\nBehind the Buddha is a large, ornate, arched mandorla (radiant halo). The mandorla is a complex structure of concentric rings filled with detailed carvings. The innermost ring contains a series of smaller, distinct figures, likely representing bodhisattvas or other enlightened beings. The outer parts of the mandorla feature elaborate swirling, floral, and foliate patterns.\n\nOn either side of the central Buddha, at the base of the arch, stand two smaller, standing figures. These are likely bodhisattvas or other celestial beings, each with their own small circular halo. Their postures are formal and respectful, flanking the central figure.\n\nThe entire sculpture is set on a rectangular base. The stone is a greyish color, showing signs of age and wear, and the craftsmanship is highly detailed, suggesting it is a significant religious artifact from a period of Buddhist art. The statue is photographed against a dark, neutral background, which makes the details of the carving stand out.

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

@paddle-bot
Copy link

paddle-bot bot commented Dec 25, 2025

Thanks for your contribution!

@codecov-commenter
Copy link

codecov-commenter commented Dec 25, 2025

Codecov Report

❌ Patch coverage is 67.90987% with 413 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@0410c42). Learn more about missing BASE report.

Files with missing lines Patch % Lines
fastdeploy/input/qwen3_vl_processor/process.py 56.45% 118 Missing and 17 partials ⚠️
...tdeploy/model_executor/models/qwen3_vl/qwen3_vl.py 70.92% 55 Missing and 11 partials ⚠️
...model_executor/models/qwen3_vl/dfnrope/modeling.py 74.40% 59 Missing and 5 partials ⚠️
...loy/input/qwen3_vl_processor/qwen3_vl_processor.py 69.40% 25 Missing and 16 partials ⚠️
...deploy/input/qwen3_vl_processor/image_processor.py 75.42% 17 Missing and 12 partials ⚠️
...del_executor/models/qwen3_vl/dfnrope/activation.py 73.77% 15 Missing and 1 partial ⚠️
..._executor/models/qwen3_vl/dfnrope/configuration.py 28.57% 15 Missing ⚠️
...astdeploy/input/qwen_vl_processor/process_video.py 18.18% 9 Missing ⚠️
fastdeploy/input/preprocess.py 0.00% 8 Missing ⚠️
fastdeploy/input/ernie4_5_vl_processor/process.py 61.11% 3 Missing and 4 partials ⚠️
... and 6 more
Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #5763   +/-   ##
==========================================
  Coverage           ?   65.56%           
==========================================
  Files              ?      346           
  Lines              ?    44337           
  Branches           ?     6810           
==========================================
  Hits               ?    29071           
  Misses             ?    13088           
  Partials           ?     2178           
Flag Coverage Δ
GPU 65.56% <67.90%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds comprehensive support for the Qwen3-VL multimodal model to the FastDeploy framework. The implementation includes model architecture, input processing pipelines, and integration with existing training/inference infrastructure.

Key Changes:

  • Complete Qwen3-VL model implementation with vision transformer and language model components
  • Multimodal input processors for images and videos with 3D rotary position embeddings
  • Reinforcement learning support through rollout model integration
  • Updated worker infrastructure to support Qwen3-VL's rope_scaling configuration

Reviewed changes

Copilot reviewed 29 out of 31 changed files in this pull request and generated 15 comments.

Show a summary per file
File Description
fastdeploy/model_executor/models/qwen3_vl/ Core model architecture including vision transformer with DFNROPE and language model integration
fastdeploy/input/qwen3_vl_processor/ Input processing pipeline for tokenization, image/video preprocessing, and position encoding
fastdeploy/rl/rollout_model.py RL support class for Qwen3-VL with weight mapping utilities
fastdeploy/worker/*_model_runner.py Added rope_scaling parameter support for 3D RoPE computation
fastdeploy/config.py Configuration updates to handle Qwen3-VL's nested text_config and rope_scaling
fastdeploy/input/mm_data_processor.py New abstract base class for multimodal data processors
tests/input/test_qwen3_vl_processor.py Comprehensive unit tests for the Qwen3-VL processor
tests/e2e/Qwen3VL_RL/test_rollout_model.py End-to-end test for RL model rollout

@CSWYF3634076 CSWYF3634076 changed the title [Model] Add Qwen3-VL Model Support [Models] Add Qwen3-VL Model Support Dec 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants