fix(lora): sidecar-patch fp8 weights to avoid float8 add crash#9246
Open
lstein wants to merge 2 commits into
Open
fix(lora): sidecar-patch fp8 weights to avoid float8 add crash#9246lstein wants to merge 2 commits into
lstein wants to merge 2 commits into
Conversation
A full-precision Diffusers transformer loaded with fp8_storage enabled (PR #9231) keeps its Linear/Conv weights in float8_e4m3fn between forward passes. Applying a LoRA via direct patching does an in-place add on the model weight, and CUDA has no add kernel for float8, so it crashes with "ufunc_add_CUDA not implemented for Float8_e4m3fn". GGUF/BnB models avoid this because they are flagged quantized and use the sidecar path. Detect fp8 weights at the patching layer (analogous to the existing on-CPU check) and force sidecar patching for any module with float8 parameters. This takes precedence over force_direct_patching, since direct patching is not possible on fp8 weights, and fixes every model architecture at once (Flux.1/2, SD3, Qwen, SD1/SDXL UNet, etc.). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Applying a LoRA to a Flux.2 4B/9B Diffusers model that was loaded with FP8 storage enabled crashes with:
(at
layer_patcher.pymodule_param.data.copy_(module_param.data + param_weight_converted)).Root cause
PR #9231 introduced FP8 layerwise-casting storage, which keeps a full-precision Diffusers transformer's Linear/Conv weights in
float8_e4m3fnbetween forward passes (cast up to the compute dtype by forward hooks). LoRA application has two paths:module_param + lora_weight).Direct patching is chosen for any model not flagged "quantized." GGUF/BnB models are flagged and use the sidecar path, so they work. But an FP8 Diffusers model is not flagged quantized, so it takes the direct path — and CUDA has no
addkernel forfloat8, hence the crash. (GGUF works precisely because it is flagged quantized.)Fix
Detect fp8 weights at the patching layer itself, where the incompatibility actually lives — directly analogous to the existing
_is_any_part_of_layer_on_cpuguard that already forces sidecar patching. Any module withfloat8_e4m3fn/float8_e5m2parameters is now sidecar-patched.This was chosen over patching each denoise invocation because:
force_direct_patching(used by the SD1/SDXL UNet path inlora.py), since direct patching is simply not possible on fp8 weights.Related Issues / Discussions
Follow-up to #9231 (FP8 layerwise-casting storage), which introduced the fp8 weight storage that this path did not account for. See also #9241 (FP8 docs).
QA Instructions
tests/backend/patches/test_layer_patcher.py::test_apply_smart_model_patches_fp8_weights_force_sidecarverifies an fp8 module routes to sidecar patching even whenforce_direct_patching=True, and the patch is cleared on exit. Runs on CPU (verifies routing, not the CUDA-only arithmetic).pytest tests/backend/patches/test_layer_patcher.py— 19 passed.Merge Plan
Straight merge.
Checklist
🤖 Generated with Claude Code