Skip to content

fix: fallback to all lora params to avoid empty adapter weights#212

Closed
xichengpro wants to merge 2 commits into
modelscope:mainfrom
xichengpro:main
Closed

fix: fallback to all lora params to avoid empty adapter weights#212
xichengpro wants to merge 2 commits into
modelscope:mainfrom
xichengpro:main

Conversation

@xichengpro
Copy link
Copy Markdown
Contributor

Added fallback_state_dict in get_adapter_state_dict to prevent saving an empty dictionary when the specific adapter suffix is not found in the parameter names.
fix #211

PR type

  • Bug Fix
  • New Feature
  • Document Updates
  • More Models or Datasets Support

PR information

Write the detail information belongs to this PR.

Experiment results

Paste your experiment result here(if needed).

Added `fallback_state_dict` in `get_adapter_state_dict` to prevent saving an empty dictionary when the specific adapter suffix is not found in the parameter names.
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request modifies the get_adapter_state_dict method to return a fallback state dict of all LoRA parameters if the requested adapter is not found. The reviewer pointed out that copying all non-matching parameters to the CPU is highly inefficient when the requested adapter is actually present, and suggested optimizing this process by checking for the adapter's existence first.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +213 to +221
fallback_state_dict = {}
for name, param in unwrapped.named_parameters():
if not _is_lora_state_key(name) or adapter_suffix not in name:
if not _is_lora_state_key(name):
continue
local = torch_util.to_local_tensor(param)
state_dict[name] = local.cpu()
target_dict = state_dict if adapter_suffix in name else fallback_state_dict
target_dict[name] = local.cpu()
del local
return state_dict
return state_dict or fallback_state_dict
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Populating fallback_state_dict by copying all non-matching LoRA parameters to the CPU is highly inefficient. In standard scenarios where the requested adapter is found, this results in unnecessary GPU-to-CPU transfers and memory allocations for all other adapters' parameters, which are then immediately discarded.

To optimize this, we can first check if any parameter name contains the adapter_suffix. If it does, we only copy the matching parameters. Otherwise, we fall back to copying all LoRA parameters.

Suggested change
fallback_state_dict = {}
for name, param in unwrapped.named_parameters():
if not _is_lora_state_key(name) or adapter_suffix not in name:
if not _is_lora_state_key(name):
continue
local = torch_util.to_local_tensor(param)
state_dict[name] = local.cpu()
target_dict = state_dict if adapter_suffix in name else fallback_state_dict
target_dict[name] = local.cpu()
del local
return state_dict
return state_dict or fallback_state_dict
has_adapter = any(
_is_lora_state_key(name) and adapter_suffix in name
for name, _ in unwrapped.named_parameters()
)
for name, param in unwrapped.named_parameters():
if not _is_lora_state_key(name):
continue
if has_adapter and adapter_suffix not in name:
continue
local = torch_util.to_local_tensor(param)
state_dict[name] = local.cpu()
del local
return state_dict

@xichengpro xichengpro marked this pull request as draft June 2, 2026 15:00
@xichengpro xichengpro closed this Jun 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

训练保存的ckpt的权重为空

1 participant