[Feature]Support reorder ids to split prefill and decodes #5779

bukejiyu · 2025-12-25T13:46:54Z

Motivation

PR from #5194
为了能够更好的接入三方Attention，需要对输入进行重排，将prefill token和decode token区分开来，本PR支持了重排功能，目前支持了基础场景及投机解码场景下的重排

Modifications

当前PD重排仅支持 CUDA backend
1.新增InputBatch结构用于gpumodelrunner share_input 管理,新增ProposerInputBatch 用于mtp share_input 管理，用于管理gpu_model_runner的输入，并且添加reorder_split_prefill_and_decode和condense函数支持重排
2.merge develop
3.为每个VL请求增加req_id -> img_features的映射方便重排

Usage or Command

在AttentionBackend中添加类变量enable_ids_reorder字段并设置为True，即可使用P/D重排功能

Accuracy Tests

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

CLAassistant · 2025-12-25T13:47:01Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
2 out of 3 committers have signed the CLA.

✅ EmmonsCurse
✅ bukejiyu
❌ root

root seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

paddle-bot · 2025-12-25T13:47:02Z

Thanks for your contribution!

…into pd_reorder

codecov-commenter · 2026-01-05T15:21:10Z

Codecov Report

❌ Patch coverage is 74.24512% with 145 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@9fc2400). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
fastdeploy/worker/input_batch.py	73.67%	105 Missing and 9 partials ⚠️
fastdeploy/worker/gpu_model_runner.py	79.26%	14 Missing and 3 partials ⚠️
fastdeploy/spec_decode/mtp.py	76.92%	6 Missing ⚠️
fastdeploy/model_executor/pre_and_post_process.py	57.14%	3 Missing ⚠️
...tdeploy/model_executor/xpu_pre_and_post_process.py	0.00%	3 Missing ⚠️
...el_executor/layers/attention/flash_attn_backend.py	50.00%	1 Missing ⚠️
...xecutor/layers/attention/moba_attention_backend.py	50.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #5779   +/-   ##
==========================================
  Coverage           ?   66.48%           
==========================================
  Files              ?      348           
  Lines              ?    44749           
  Branches           ?     6867           
==========================================
  Hits               ?    29753           
  Misses             ?    12806           
  Partials           ?     2190

Flag	Coverage Δ
GPU	`66.48% <74.24%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Deleter-D · 2026-01-06T06:36:21Z

fastdeploy/spec_decode/mtp.py

            self.model_inputs["input_ids"][idx : idx + 1, :input_length] = np.array([5] * input_length)
            self.model_inputs["eos_token_id"][:] = np.array([2], dtype="int64").reshape(-1, 1)
-            self.seq_lens_this_time_buffer[idx : idx + 1] = input_length
+            self.model_inputs["seq_lens_this_time_buffer"][idx : idx + 1] = input_length


这里model_inputs已经是一个object了，还并存了dict的key-value访问方式是否合理？
原本此处逻辑seq_lens_this_time_buffer是MTPProposer的成员变量，现在又合并回了model_inputs里，是否有其他影响？

改动的地方太多了，保留了用key访问的接口，gpumodelrunner用的 InputBatch ，mtp的对象是 ProposerInputBatch 都是自己独立的 MTPProposer的成员变量和 ProposerInputBatch 的成员变量应该就只是多套了一层的差别吧，seq_lens_this_time_buffer也是和 req id强相关的只能放到 InputBatch内部才能参与排序

Deleter-D · 2026-01-06T06:36:42Z

fastdeploy/spec_decode/mtp.py

        req_len = len(req_dicts)
-
+        self.model_inputs["num_running_requests"] = num_running_requests
+        self.model_inputs["running_requests_ids"] = range(num_running_requests)


Deleter-D · 2026-01-06T06:36:56Z

fastdeploy/spec_decode/mtp.py

-        # self.model_inputs["seq_lens_this_time"] = self.seq_lens_this_time_buffer[:num_running_requests]
-        self.model_inputs["seq_lens_this_time"] = self.seq_lens_this_time_buffer
+        # self.model_inputs["seq_lens_this_time"] = self.model_inputs["seq_lens_this_time_buffer"][:num_running_requests]
+        self.model_inputs.seq_lens_this_time = self.model_inputs["seq_lens_this_time_buffer"]


Deleter-D · 2026-01-06T06:38:13Z

fastdeploy/worker/gpu_model_runner.py

            self.proposer = NgramProposer(self.fd_config)
        elif self.speculative_method == "mtp":
-            self.share_inputs["seq_lens_this_time"] = self.seq_lens_this_time_buffer
+            self.share_inputs["seq_lens_this_time"] = self.share_inputs["seq_lens_this_time_buffer"]


与MTPProposer中的同问

CSWYF3634076 · 2026-01-06T11:22:16Z

代码行数比较多，单测覆盖率需要补齐，尤其是input_batch.py

CSWYF3634076 · 2026-01-06T14:09:06Z

fastdeploy/worker/gpu_model_runner.py

+                image_features_list.append(paddle.concat(merge_image_features, axis=0))
+            for _, index in req_idx_img_index_map.items():
+                if index != -1:
+                    self.share_inputs["image_features_list"][idx] = image_features_list[index]


请教下上次shape对不上是因为这里只for了一次吗，还是其他问题

对可以理解成只for了一次，并且以前视频输入也没有和req_id 绑定导致重排很困难，目前是用新的list和req_id绑定上，每次append 进 image_features_list 都是某一个req_id 的图像特征

kevincheng2 · 2026-01-07T03:58:15Z

fastdeploy/worker/gpu_model_runner.py

+                        )
+                    )

        if self.encoder_cache is not None:


kevincheng2 · 2026-01-07T03:58:50Z

fastdeploy/worker/gpu_model_runner.py

+                img_index = img_index + 1
                inputs = request.multimodal_inputs
                if self.encoder_cache is not None:
                    if envs.FD_ENABLE_MAX_PREFILL:


这里使用 encoder_cache 的场景感觉也需要 feature_position_list_batches 记录每条请求的位置信息？

这个我不太确定也，我看他都是用append的应该是自带位置信息的吧？

bukejiyu · 2026-01-07T08:33:45Z

代码行数比较多，单测覆盖率需要补齐，尤其是input_batch.py

有的新增了单测，目前覆盖率没有过的地方是 get_attention_meta这个函数没有命中

root added 18 commits November 24, 2025 09:01

support reorder ids

57324be

resove conflict

f97d4b1

perfect code

806ca96

fix

c043648

fix unittest

19b28c7

resolve conflict

43ab202

delete code

70e32df

fix

2377598

add python api

2353e4b

delete custom op

ebf48f8

update algorithm

0ad8e3a

fix swap

a1a93f1

support condense

0f8e157

support condense

1142575

resolve conflict

c037a6f

support mtp

717ca68

delete code

d25696d

resolve conflict

fd14a8a

bukejiyu had a problem deploying to Metax_ci December 25, 2025 13:46 — with GitHub Actions Error

Merge remote-tracking branch 'origin' into pd_reorder

8311b8b

bukejiyu force-pushed the pd_reorder branch from 7fe94f6 to 8311b8b Compare December 25, 2025 14:16

bukejiyu had a problem deploying to Metax_ci December 25, 2025 14:16 — with GitHub Actions Failure

Merge branch 'develop' into pd_reorder

81fc972

EmmonsCurse temporarily deployed to Metax_ci December 25, 2025 16:18 — with GitHub Actions Inactive

update

ba117e2

bukejiyu had a problem deploying to Metax_ci December 26, 2025 04:15 — with GitHub Actions Failure

update

e700009

bukejiyu had a problem deploying to Metax_ci December 31, 2025 10:27 — with GitHub Actions Error

bukejiyu force-pushed the pd_reorder branch from e3c1e79 to a949c9d Compare December 31, 2025 14:26

bukejiyu had a problem deploying to Metax_ci December 31, 2025 14:26 — with GitHub Actions Error

update for other platfrom

ce6fa70

bukejiyu had a problem deploying to Metax_ci December 31, 2025 14:42 — with GitHub Actions Failure

update

22d2465

bukejiyu temporarily deployed to Metax_ci December 31, 2025 16:01 — with GitHub Actions Inactive

fix

893e0ec

bukejiyu temporarily deployed to Metax_ci January 5, 2026 06:32 — with GitHub Actions Inactive

fix mtp

de05a3d

bukejiyu had a problem deploying to Metax_ci January 5, 2026 07:24 — with GitHub Actions Failure

fix ut

552f22f

bukejiyu had a problem deploying to Metax_ci January 5, 2026 09:35 — with GitHub Actions Failure

Merge branch 'develop' of https://github.com/PaddlePaddle/FastDeploy …

b676e56

…into pd_reorder

bukejiyu temporarily deployed to Metax_ci January 5, 2026 10:59 — with GitHub Actions Inactive

update

373c9d3

bukejiyu temporarily deployed to Metax_ci January 5, 2026 13:56 — with GitHub Actions Inactive

fix ut

027e4c1

bukejiyu temporarily deployed to Metax_ci January 6, 2026 04:36 — with GitHub Actions Inactive

update ut

118f44a

bukejiyu had a problem deploying to Metax_ci January 6, 2026 06:16 — with GitHub Actions Failure

Merge remote-tracking branch 'origin' into pd_reorder

d65ed57

bukejiyu temporarily deployed to Metax_ci January 6, 2026 06:23 — with GitHub Actions Inactive

Deleter-D reviewed Jan 6, 2026

View reviewed changes

CSWYF3634076 reviewed Jan 6, 2026

View reviewed changes

kevincheng2 requested changes Jan 7, 2026

View reviewed changes

fix

716b0f2

bukejiyu requested a deployment to Metax_ci January 7, 2026 08:31 — with GitHub Actions In progress

[Feature]Support reorder ids to split prefill and decodes #5779

Are you sure you want to change the base?

[Feature]Support reorder ids to split prefill and decodes #5779

Conversation

bukejiyu commented Dec 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

CLAassistant commented Dec 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

paddle-bot bot commented Dec 25, 2025

Uh oh!

codecov-commenter commented Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bukejiyu Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

CSWYF3634076 commented Jan 6, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bukejiyu commented Jan 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

bukejiyu commented Dec 25, 2025 •

edited

Loading

CLAassistant commented Dec 25, 2025 •

edited

Loading

codecov-commenter commented Jan 5, 2026 •

edited

Loading

bukejiyu Jan 6, 2026 •

edited

Loading