Skip to content

Support interleaved q_gate weight loader for Qwen3.5#7057

Open
wangna11BD wants to merge 1 commit intoPaddlePaddle:developfrom
wangna11BD:support_interleaved_qg
Open

Support interleaved q_gate weight loader for Qwen3.5#7057
wangna11BD wants to merge 1 commit intoPaddlePaddle:developfrom
wangna11BD:support_interleaved_qg

Conversation

@wangna11BD
Copy link
Copy Markdown

Motivation

为支持 Qwen3.5 系列模型的推理部署,需要在 QKVGateParallelLinear 层中新增对其特殊的 interleaved(交错)q_gate 权重格式的加载支持。

Qwen3.5 模型的 q_proj 权重采用 packed 格式,将 attention query 和 gate 按 head 维度交错存储(每个 head 的 query 和 gate 紧邻排列),与常规的分离格式不同,现有代码无法正确解析该格式。

Modifications

  1. fastdeploy/model_executor/layers/linear.py

weight_loader 方法扩展:在 QKVGateParallelLinear 中新增对 "q"、"k"、"v"、"split_q_gate" 四种 loaded_shard_id 的支持,原先只支持 "qkv" 和 "gate"。
新增 split_q_gate_weight_loader 方法:专门处理 Qwen3.5 的 interleaved q_gate 权重格式:
支持 PyTorch 格式转置(weight_need_transpose=True),并自动重置标志位避免后续重复转置
完整兼容张量并行(TP)切分场景

  1. tests/layers/test_qkvg_parallel_linear.py

新增 test_weight_loader_success 测试

Usage or Command

适用于加载 Qwen3.5 模型权重时,在模型配置中指定 loaded_shard_id="split_q_gate" 触发 interleaved 格式解析:

模型 weight loader 调用示例

layer.weight_loader(param, loaded_weight, loaded_shard_id="split_q_gate")

运行单元测试:

python -m pytest tests/layers/test_qkvg_parallel_linear.py -v

Accuracy Tests

本次 PR 为权重加载逻辑变更,不影响模型前向计算精度。已通过单元测试验证权重拆分与写入的正确性(包括 tp=1 和 tp=2 场景下数值对齐验证)。

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[Models]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.

@paddle-bot
Copy link
Copy Markdown

paddle-bot bot commented Mar 27, 2026

Thanks for your contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant