Skip to content

fix(lora): sidecar-patch fp8 weights to avoid float8 add crash#9246

Open
lstein wants to merge 2 commits into
mainfrom
lstein/bugfix/flux2-fp8-lora
Open

fix(lora): sidecar-patch fp8 weights to avoid float8 add crash#9246
lstein wants to merge 2 commits into
mainfrom
lstein/bugfix/flux2-fp8-lora

Conversation

@lstein
Copy link
Copy Markdown
Collaborator

@lstein lstein commented May 29, 2026

Summary

Applying a LoRA to a Flux.2 4B/9B Diffusers model that was loaded with FP8 storage enabled crashes with:

RuntimeError: "ufunc_add_CUDA" not implemented for 'Float8_e4m3fn'

(at layer_patcher.py module_param.data.copy_(module_param.data + param_weight_converted)).

Root cause

PR #9231 introduced FP8 layerwise-casting storage, which keeps a full-precision Diffusers transformer's Linear/Conv weights in float8_e4m3fn between forward passes (cast up to the compute dtype by forward hooks). LoRA application has two paths:

  • Direct patching — does an in-place add on the model weight (module_param + lora_weight).
  • Sidecar patching — wraps the module and dequantizes to the compute dtype before any math.

Direct patching is chosen for any model not flagged "quantized." GGUF/BnB models are flagged and use the sidecar path, so they work. But an FP8 Diffusers model is not flagged quantized, so it takes the direct path — and CUDA has no add kernel for float8, hence the crash. (GGUF works precisely because it is flagged quantized.)

Fix

Detect fp8 weights at the patching layer itself, where the incompatibility actually lives — directly analogous to the existing _is_any_part_of_layer_on_cpu guard that already forces sidecar patching. Any module with float8_e4m3fn / float8_e5m2 parameters is now sidecar-patched.

This was chosen over patching each denoise invocation because:

  • FP8 storage is a generic Main-model feature, so the same crash is reachable from Flux.1, SD3, Qwen, Z-Image, and the SD1/SDXL UNet path — fixing it once at the patcher covers them all.
  • The guard takes precedence over force_direct_patching (used by the SD1/SDXL UNet path in lora.py), since direct patching is simply not possible on fp8 weights.

Related Issues / Discussions

Follow-up to #9231 (FP8 layerwise-casting storage), which introduced the fp8 weight storage that this path did not account for. See also #9241 (FP8 docs).

QA Instructions

  • Load a Flux.2 4B or 9B Diffusers model with FP8 storage enabled, apply a Flux.2 LoRA, and run a denoise — previously crashed, now succeeds. (Confirmed working.)
  • Regression test added: tests/backend/patches/test_layer_patcher.py::test_apply_smart_model_patches_fp8_weights_force_sidecar verifies an fp8 module routes to sidecar patching even when force_direct_patching=True, and the patch is cleared on exit. Runs on CPU (verifies routing, not the CUDA-only arithmetic).
  • pytest tests/backend/patches/test_layer_patcher.py — 19 passed.

Merge Plan

Straight merge.

Checklist

  • The PR has a short but descriptive title, suitable for a changelog
  • Tests added / updated (if applicable)
  • ❗Changes to a redux slice have a corresponding migration — n/a
  • Documentation added / updated (if applicable) — n/a

🤖 Generated with Claude Code

A full-precision Diffusers transformer loaded with fp8_storage enabled
(PR #9231) keeps its Linear/Conv weights in float8_e4m3fn between forward
passes. Applying a LoRA via direct patching does an in-place add on the
model weight, and CUDA has no add kernel for float8, so it crashes with
"ufunc_add_CUDA not implemented for Float8_e4m3fn". GGUF/BnB models avoid
this because they are flagged quantized and use the sidecar path.

Detect fp8 weights at the patching layer (analogous to the existing
on-CPU check) and force sidecar patching for any module with float8
parameters. This takes precedence over force_direct_patching, since
direct patching is not possible on fp8 weights, and fixes every model
architecture at once (Flux.1/2, SD3, Qwen, SD1/SDXL UNet, etc.).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added python PRs that change python files backend PRs that change backend files python-tests PRs that change python tests labels May 29, 2026
@lstein lstein added the 6.13.5 Library Updates label May 29, 2026
@lstein lstein moved this to 6.13.5 LIBRARY UPDATES in Invoke - Community Roadmap May 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

6.13.5 Library Updates backend PRs that change backend files python PRs that change python files python-tests PRs that change python tests

Projects

Status: 6.13.5 LIBRARY UPDATES

Development

Successfully merging this pull request may close these issues.

2 participants