Support interleaved q_gate weight loader for Qwen3.5#7057
Open
wangna11BD wants to merge 1 commit intoPaddlePaddle:developfrom
Open
Support interleaved q_gate weight loader for Qwen3.5#7057wangna11BD wants to merge 1 commit intoPaddlePaddle:developfrom
wangna11BD wants to merge 1 commit intoPaddlePaddle:developfrom
Conversation
|
Thanks for your contribution! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
为支持 Qwen3.5 系列模型的推理部署,需要在 QKVGateParallelLinear 层中新增对其特殊的 interleaved(交错)q_gate 权重格式的加载支持。
Qwen3.5 模型的 q_proj 权重采用 packed 格式,将 attention query 和 gate 按 head 维度交错存储(每个 head 的 query 和 gate 紧邻排列),与常规的分离格式不同,现有代码无法正确解析该格式。
Modifications
weight_loader 方法扩展:在 QKVGateParallelLinear 中新增对 "q"、"k"、"v"、"split_q_gate" 四种 loaded_shard_id 的支持,原先只支持 "qkv" 和 "gate"。
新增 split_q_gate_weight_loader 方法:专门处理 Qwen3.5 的 interleaved q_gate 权重格式:
支持 PyTorch 格式转置(weight_need_transpose=True),并自动重置标志位避免后续重复转置
完整兼容张量并行(TP)切分场景
新增 test_weight_loader_success 测试
Usage or Command
适用于加载 Qwen3.5 模型权重时,在模型配置中指定 loaded_shard_id="split_q_gate" 触发 interleaved 格式解析:
模型 weight loader 调用示例
layer.weight_loader(param, loaded_weight, loaded_shard_id="split_q_gate")运行单元测试:
python -m pytest tests/layers/test_qkvg_parallel_linear.py -vAccuracy Tests
本次 PR 为权重加载逻辑变更,不影响模型前向计算精度。已通过单元测试验证权重拆分与写入的正确性(包括 tp=1 和 tp=2 场景下数值对齐验证)。
Checklist
[Models]]pre-commitbefore commit.