[Optimization]Merge Text processor by luukunn · Pull Request #7030 · PaddlePaddle/FastDeploy

luukunn · 2026-03-26T08:31:57Z

Motivation

💡 If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

💡 如若此PR是Cherry Pick，PR标题需遵循格式，在最开始加上[Cherry-Pick]标签，以及最后面加上原PR ID，例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

Modifications

Usage or Command

Accuracy Tests

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

paddle-bot · 2026-03-26T08:32:07Z

Thanks for your contribution!

Copilot

Pull request overview

该 PR 旨在合并/统一文本侧的 processor 实现：把原先分散在 DataProcessor 与 Ernie4_5Processor 的请求/响应处理逻辑抽到公共基类，并新增 TextProcessor 用一个类覆盖 auto 与 ernie4_5 两种 tokenizer 路径，从而简化 InputPreprocessor.create_processor() 的分发逻辑。

Changes:

新增 BaseTextProcessor（fastdeploy/input/base_processor.py），集中实现 request/response 处理与通用工具方法。
新增 TextProcessor 并调整 InputPreprocessor.create_processor()：非 V1 且纯文本场景统一走 TextProcessor(tokenizer_type=auto/ernie4_5)。
将 Ernie4_5Processor 改为弃用包装类，并补充 VL processor 的 tokenizer_type 以适配统一的增量解码逻辑；同步更新相关单测 mock。

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
tests/input/test_preprocess.py	更新路由单测：mock 目标由 `DataProcessor` 改为 `TextProcessor`。
fastdeploy/input/text_processor.py	引入 `TextProcessor`；`DataProcessor` 改为继承新的公共基类并移除本地 request/response 处理实现。
fastdeploy/input/preprocess.py	纯文本、非 V1 路径统一使用 `TextProcessor`，并按架构选择 `tokenizer_type`。
fastdeploy/input/ernie4_5_vl_processor/ernie4_5_vl_processor.py	增加 `tokenizer_type="ernie4_5"` 以保证继承的解码分支选择正确。
fastdeploy/input/ernie4_5_processor.py	将 `Ernie4_5Processor` 改为弃用包装类，转发到 `TextProcessor(tokenizer_type='ernie4_5')`。
fastdeploy/input/base_processor.py	新增公共基类 `BaseTextProcessor`，实现共享 request/response 处理与通用方法。

Comments suppressed due to low confidence (1)

fastdeploy/input/text_processor.py:248

这里 DataProcessor 现在继承了 BaseTextProcessor，而 BaseTextProcessor.process_response_dict_* 会假设 ids2tokens() 始终返回 3 元组 (delta_text, previous_token_ids, previous_texts)。但当前 DataProcessor.ids2tokens() 在 envs.FD_USE_HF_TOKENIZER=True 分支仍返回单个字符串，会导致 streaming / non-streaming 响应处理时解包失败（例如 delta_text, _, previous_texts = ...）。建议让 DataProcessor 直接复用 BaseTextProcessor.ids2tokens()（删除覆盖），或把 HF 分支返回值改为与基类一致的 3 元组。

class DataProcessor(BaseTextProcessor):
    def __init__(self, model_name_or_path, reasoning_parser_obj=None, tool_parser_obj=None):
        """
            Initializes the DecodeStatus object.

Copilot · 2026-03-26T08:38:06Z

fastdeploy/input/base_processor.py

+* tool_parser result never updates ``outputs["text"]``; only ``tool_calls`` is
+  set.  This matches DataProcessor behaviour.


这里的设计说明与实际实现不一致：文档写到 tool_parser 结果不会更新 outputs["text"]，但 process_response_dict_streaming() 中在解析到 tool_calls 时会把 outputs["text"] 赋值为 tool_call_delta_message.content。建议要么更新文档描述（区分 streaming 与 non-streaming 行为），要么调整实现以匹配文档中的约定。

Suggested change

* tool_parser result never updates ``outputs["text"]``; only ``tool_calls`` is

set. This matches DataProcessor behaviour.

* In non-streaming mode, tool_parser results never update ``outputs["text"]``;

only ``tool_calls`` is set, matching ``DataProcessor`` behaviour. In

streaming mode, ``process_response_dict_streaming()`` may temporarily set

``outputs["text"]`` to the current ``tool_call_delta_message.content`` while

parsing tool calls.

Copilot · 2026-03-26T08:38:06Z

fastdeploy/input/base_processor.py

+            if token_id == think_start_id:
+                started = True
+                ended = False
+                in_thinking = True
+            elif token_id == think_end_id and in_thinking:
+                ended = True
+                in_thinking = False


tokens_after_start 初始化为 0 但在遍历 token_list 时从未累加，导致 think_prompt_tokens_after_start 永远为 0。该字段会被 fastdeploy/model_executor/logits_processor/thinking_budget.py 读取用于推理预算状态更新，恒为 0 会让预算逻辑失真。建议在遇到 <think> 之后对非结束标记 token 进行计数，或移除该字段避免产生误导。

Suggested change

if token_id == think_start_id:

started = True

ended = False

in_thinking = True

elif token_id == think_end_id and in_thinking:

ended = True

in_thinking = False

if token_id == think_start_id:

# Enter thinking segment; do not count the start token itself

started = True

ended = False

in_thinking = True

elif token_id == think_end_id and in_thinking:

# Leave thinking segment; do not count the end token itself

ended = True

in_thinking = False

elif in_thinking:

# Count tokens inside the thinking segment, excluding start/end markers

tokens_after_start += 1

Copilot · 2026-03-26T08:38:07Z

fastdeploy/input/preprocess.py

+                    tokenizer_type = "ernie4_5" if ErnieArchitectures.contains_ernie_arch(architecture) else "auto"
+                    self.processor = TextProcessor(
                        model_name_or_path=self.model_name_or_path,
+                        tokenizer_type=tokenizer_type,
                        reasoning_parser_obj=reasoning_parser_obj,
                        tool_parser_obj=tool_parser_obj,


这里新增了根据架构选择 tokenizer_type（ernie4_5 vs auto）并走 TextProcessor 的分支，但当前单测只覆盖了“非 Ernie、非多模态”的路径。建议补一个用例覆盖 Ernie 架构（例如 architecture 为 ERNIE4.5 相关字符串）时 tokenizer_type 被设置为 ernie4_5，并验证 TextProcessor 的构造参数。

codecov-commenter · 2026-03-26T11:23:10Z

Codecov Report

❌ Patch coverage is 85.68129% with 62 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@25d64ef). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
fastdeploy/input/base_processor.py	86.34%	30 Missing and 23 partials ⚠️
fastdeploy/input/preprocess.py	42.85%	3 Missing and 1 partial ⚠️
fastdeploy/input/text_processor.py	90.32%	1 Missing and 2 partials ⚠️
fastdeploy/input/ernie4_5_processor.py	66.66%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #7030   +/-   ##
==========================================
  Coverage           ?   72.98%           
==========================================
  Files              ?      400           
  Lines              ?    56289           
  Branches           ?     8884           
==========================================
  Hits               ?    41081           
  Misses             ?    12300           
  Partials           ?     2908

Flag	Coverage Δ
GPU	`72.98% <85.68%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 5 comments.

Copilot · 2026-03-27T11:37:39Z

fastdeploy/input/base_processor.py

+* ``__init__`` only initialises response-handling state (decode_status,
+  model_status_dict, tool_parser_dict).  Tokeniser setup is the responsibility
+  of each subclass.  Subclasses that do not call ``super().__init__()`` must
+  initialise those three attributes themselves.
+


文件头部“Key design decisions”里提到“init only initialises response-handling state… Tokeniser setup is the responsibility of each subclass”，但当前 BaseTextProcessor.init 会加载 GenerationConfig、调用 _load_tokenizer、初始化 eos/pad/tokenizer.pad_token_id 等，描述与实现不一致，容易误导后续维护者。建议更新这段说明或改为与当前初始化流程一致。

Suggested change

* ``__init__`` only initialises response-handling state (decode_status,

model_status_dict, tool_parser_dict). Tokeniser setup is the responsibility

of each subclass. Subclasses that do not call ``super().__init__()`` must

initialise those three attributes themselves.

* ``__init__`` performs the full initialisation sequence: it loads the

generation config (if available), calls ``_load_tokenizer`` to construct

the tokenizer, configures EOS / pad token ids, and initialises the core

response-handling state (``decode_status``, ``model_status_dict``,

``tool_parser_dict``).

* Concrete subclasses are responsible for implementing ``_load_tokenizer``

and ``text2ids``. Subclasses that override ``__init__`` must call

``super().__init__(...)`` or replicate this initialisation logic

themselves.

Copilot · 2026-03-27T11:37:40Z

tests/model_executor/test_thinking_budget.py

+        with patch("fastdeploy.input.utils.process_stop_token_ids", lambda *args, **kwargs: None):
            processed = processor.process_request_dict(request, max_model_len=16)
        self.assertEqual(


这里同样 patch 到 fastdeploy.input.utils.process_stop_token_ids，但 TextDataProcessor.process_request_dict 实际实现来自 fastdeploy.input.base_processor.BaseTextProcessor（其中也是 from fastdeploy.input.utils import process_stop_token_ids 并用本地名调用）。因此 patch utils 不会影响 BaseTextProcessor 内部引用，测试可能仍会执行真实逻辑。建议 patch fastdeploy.input.base_processor.process_stop_token_ids（并把本文件其他同类 patch 一并调整）。

Copilot · 2026-03-27T11:37:40Z

tests/input/test_ernie4_5_processor.py

+        self.gen_patcher = patch(f"{TEXT_PROCESSOR_PATH}.GenerationConfig.from_pretrained", return_value=MagicMock())
        self.tokenizer_patcher = patch(


setUp 里 patch 的目标改成了 fastdeploy.input.text_processor.GenerationConfig.from_pretrained，但 TextProcessor/Ernie4_5Processor 现在是在 fastdeploy.input.base_processor 中 import 并调用 GenerationConfig.from_pretrained。当前 patch 可能不会生效，导致单测意外去读真实的 generation_config。建议把 patch 目标改为 fastdeploy.input.base_processor.GenerationConfig.from_pretrained（或实际被调用的模块路径）。

Copilot · 2026-03-27T11:37:40Z

fastdeploy/input/text_processor.py

@@ -18,16 +18,10 @@
 from collections import OrderedDict


PR 描述目前基本还是模板占位（Motivation/Modifications/Usage/Accuracy Tests 未填写，Checklist 也未勾选），不利于评审者确认“合并 Text processor”的动机、行为变化范围以及验证方式。建议补充：为何要合并、是否有兼容性/行为差异、如何回归（至少给出单测/命令或关键场景验证）。另外标题建议按模板格式在 tag 后加空格（例如 [Optimization] Merge Text processor）。

Copilot · 2026-03-27T11:37:40Z

fastdeploy/input/base_processor.py

+        if envs.FD_USE_HF_TOKENIZER:
+            if task_id not in self.decode_status:
+                # [all_token_ids, list_of_deltas, full_accumulated_string]
+                self.decode_status[task_id] = [[], [], ""]
+            status = self.decode_status[task_id]
+            status[0].extend(token_id)
+            decode_str = self.tokenizer.batch_decode(
+                [status[0]],
+                skip_special_tokens=True,
+                clean_up_tokenization_spaces=False,
+            )
+            if isinstance(decode_str, list) and len(decode_str):
+                new_str = decode_str[0].replace(status[2], "", 1)
+                status[1].append(new_str)
+                status[2] = decode_str[0]
+            else:
+                new_str = ""
+            # Return consistent three-tuple; previous_token_ids not available.
+            return new_str, [], status[2]


BaseTextProcessor.ids2tokens 在 HF tokenizer 分支返回的 third value 使用了更新后的累计文本（status[2]）。这里按 docstring 语义应该返回“previous_texts”（更新前的累计文本）；否则在 process_response_dict_streaming 里传入 previous_texts + delta_text 会把本次增量重复拼接，导致 reasoning/tool streaming 解析收到错误的 full_text。建议在更新 status[2] 前先保存旧值并返回旧值。

luukunn added 4 commits March 26, 2026 15:06

merge text processor

32cdc34

update

a5ab32c

fix unit test

4f9fe3a

merge messages2ids

ca1f105

Copilot AI review requested due to automatic review settings March 26, 2026 08:31

luukunn temporarily deployed to Metax_ci March 26, 2026 08:32 — with GitHub Actions Inactive

Copilot started reviewing on behalf of luukunn March 26, 2026 08:32 View session

Copilot AI reviewed Mar 26, 2026

View reviewed changes

fix unit test

b1796e5

luukunn had a problem deploying to Metax_ci March 26, 2026 12:53 — with GitHub Actions Failure

luukunn changed the title ~~Merge processor 2~~ [Optimization]Merge Text processor Mar 27, 2026

删除重复代码

59a0e3f

Copilot AI review requested due to automatic review settings March 27, 2026 08:10

luukunn temporarily deployed to Metax_ci March 27, 2026 08:10 — with GitHub Actions Inactive

Copilot started reviewing on behalf of luukunn March 27, 2026 08:11 View session

This comment was marked as resolved.

Sign in to view

remove redundant code

286a025

luukunn temporarily deployed to Metax_ci March 27, 2026 10:14 — with GitHub Actions Inactive

delete code

b21a433

Copilot AI review requested due to automatic review settings March 27, 2026 11:31

luukunn temporarily deployed to Metax_ci March 27, 2026 11:31 — with GitHub Actions Inactive

Copilot started reviewing on behalf of luukunn March 27, 2026 11:31 View session

Copilot AI reviewed Mar 27, 2026

View reviewed changes

		* tool_parser result never updates ``outputs["text"]``; only ``tool_calls`` is
		set. This matches DataProcessor behaviour.

-* tool_parser result never updates ``outputs["text"]``; only ``tool_calls`` is
-  set.  This matches DataProcessor behaviour.
+* In non-streaming mode, tool_parser results never update ``outputs["text"]``;
+  only ``tool_calls`` is set, matching ``DataProcessor`` behaviour.  In
+  streaming mode, ``process_response_dict_streaming()`` may temporarily set
+  ``outputs["text"]`` to the current ``tool_call_delta_message.content`` while
+  parsing tool calls.

-* ``__init__`` only initialises response-handling state (decode_status,
-  model_status_dict, tool_parser_dict).  Tokeniser setup is the responsibility
-  of each subclass.  Subclasses that do not call ``super().__init__()`` must
-  initialise those three attributes themselves.
+* ``__init__`` performs the full initialisation sequence: it loads the
+  generation config (if available), calls ``_load_tokenizer`` to construct
+  the tokenizer, configures EOS / pad token ids, and initialises the core
+  response-handling state (``decode_status``, ``model_status_dict``,
+  ``tool_parser_dict``).
+* Concrete subclasses are responsible for implementing ``_load_tokenizer``
+  and ``text2ids``.  Subclasses that override ``__init__`` must call
+  ``super().__init__(...)`` or replicate this initialisation logic
+  themselves.

		self.gen_patcher = patch(f"{TEXT_PROCESSOR_PATH}.GenerationConfig.from_pretrained", return_value=MagicMock())
		self.tokenizer_patcher = patch(

Conversation

luukunn commented Mar 26, 2026

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot bot commented Mar 26, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

This comment was marked as resolved.

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov-commenter commented Mar 26, 2026 •

edited

Loading