fix: fallback to all lora params to avoid empty adapter weights#212
fix: fallback to all lora params to avoid empty adapter weights#212xichengpro wants to merge 2 commits into
Conversation
Added `fallback_state_dict` in `get_adapter_state_dict` to prevent saving an empty dictionary when the specific adapter suffix is not found in the parameter names.
There was a problem hiding this comment.
Code Review
This pull request modifies the get_adapter_state_dict method to return a fallback state dict of all LoRA parameters if the requested adapter is not found. The reviewer pointed out that copying all non-matching parameters to the CPU is highly inefficient when the requested adapter is actually present, and suggested optimizing this process by checking for the adapter's existence first.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| fallback_state_dict = {} | ||
| for name, param in unwrapped.named_parameters(): | ||
| if not _is_lora_state_key(name) or adapter_suffix not in name: | ||
| if not _is_lora_state_key(name): | ||
| continue | ||
| local = torch_util.to_local_tensor(param) | ||
| state_dict[name] = local.cpu() | ||
| target_dict = state_dict if adapter_suffix in name else fallback_state_dict | ||
| target_dict[name] = local.cpu() | ||
| del local | ||
| return state_dict | ||
| return state_dict or fallback_state_dict |
There was a problem hiding this comment.
Populating fallback_state_dict by copying all non-matching LoRA parameters to the CPU is highly inefficient. In standard scenarios where the requested adapter is found, this results in unnecessary GPU-to-CPU transfers and memory allocations for all other adapters' parameters, which are then immediately discarded.
To optimize this, we can first check if any parameter name contains the adapter_suffix. If it does, we only copy the matching parameters. Otherwise, we fall back to copying all LoRA parameters.
| fallback_state_dict = {} | |
| for name, param in unwrapped.named_parameters(): | |
| if not _is_lora_state_key(name) or adapter_suffix not in name: | |
| if not _is_lora_state_key(name): | |
| continue | |
| local = torch_util.to_local_tensor(param) | |
| state_dict[name] = local.cpu() | |
| target_dict = state_dict if adapter_suffix in name else fallback_state_dict | |
| target_dict[name] = local.cpu() | |
| del local | |
| return state_dict | |
| return state_dict or fallback_state_dict | |
| has_adapter = any( | |
| _is_lora_state_key(name) and adapter_suffix in name | |
| for name, _ in unwrapped.named_parameters() | |
| ) | |
| for name, param in unwrapped.named_parameters(): | |
| if not _is_lora_state_key(name): | |
| continue | |
| if has_adapter and adapter_suffix not in name: | |
| continue | |
| local = torch_util.to_local_tensor(param) | |
| state_dict[name] = local.cpu() | |
| del local | |
| return state_dict |
Added
fallback_state_dictinget_adapter_state_dictto prevent saving an empty dictionary when the specific adapter suffix is not found in the parameter names.fix #211
PR type
PR information
Write the detail information belongs to this PR.
Experiment results
Paste your experiment result here(if needed).