fix(lora): sidecar-patch fp8 weights to avoid float8 add crash by lstein · Pull Request #9246 · invoke-ai/InvokeAI

lstein · 2026-05-29T02:15:27Z

Summary

Applying a LoRA to a Flux.2 4B/9B Diffusers model that was loaded with FP8 storage enabled crashes with:

RuntimeError: "ufunc_add_CUDA" not implemented for 'Float8_e4m3fn'

(at layer_patcher.py module_param.data.copy_(module_param.data + param_weight_converted)).

Root cause

PR #9231 introduced FP8 layerwise-casting storage, which keeps a full-precision Diffusers transformer's Linear/Conv weights in float8_e4m3fn between forward passes (cast up to the compute dtype by forward hooks). LoRA application has two paths:

Direct patching — does an in-place add on the model weight (module_param + lora_weight).
Sidecar patching — wraps the module and dequantizes to the compute dtype before any math.

Direct patching is chosen for any model not flagged "quantized." GGUF/BnB models are flagged and use the sidecar path, so they work. But an FP8 Diffusers model is not flagged quantized, so it takes the direct path — and CUDA has no add kernel for float8, hence the crash. (GGUF works precisely because it is flagged quantized.)

Fix

Detect fp8 weights at the patching layer itself, where the incompatibility actually lives — directly analogous to the existing _is_any_part_of_layer_on_cpu guard that already forces sidecar patching. Any module with float8_e4m3fn / float8_e5m2 parameters is now sidecar-patched.

This was chosen over patching each denoise invocation because:

FP8 storage is a generic Main-model feature, so the same crash is reachable from Flux.1, SD3, Qwen, Z-Image, and the SD1/SDXL UNet path — fixing it once at the patcher covers them all.
The guard takes precedence over force_direct_patching (used by the SD1/SDXL UNet path in lora.py), since direct patching is simply not possible on fp8 weights.

Related Issues / Discussions

Follow-up to #9231 (FP8 layerwise-casting storage), which introduced the fp8 weight storage that this path did not account for. See also #9241 (FP8 docs).

QA Instructions

Load a Flux.2 4B or 9B Diffusers model with FP8 storage enabled, apply a Flux.2 LoRA, and run a denoise — previously crashed, now succeeds. (Confirmed working.)
Regression test added: tests/backend/patches/test_layer_patcher.py::test_apply_smart_model_patches_fp8_weights_force_sidecar verifies an fp8 module routes to sidecar patching even when force_direct_patching=True, and the patch is cleared on exit. Runs on CPU (verifies routing, not the CUDA-only arithmetic).
pytest tests/backend/patches/test_layer_patcher.py — 19 passed.

Merge Plan

Straight merge.

Checklist

The PR has a short but descriptive title, suitable for a changelog
Tests added / updated (if applicable)
❗Changes to a redux slice have a corresponding migration — n/a
Documentation added / updated (if applicable) — n/a

🤖 Generated with Claude Code

A full-precision Diffusers transformer loaded with fp8_storage enabled (PR #9231) keeps its Linear/Conv weights in float8_e4m3fn between forward passes. Applying a LoRA via direct patching does an in-place add on the model weight, and CUDA has no add kernel for float8, so it crashes with "ufunc_add_CUDA not implemented for Float8_e4m3fn". GGUF/BnB models avoid this because they are flagged quantized and use the sidecar path. Detect fp8 weights at the patching layer (analogous to the existing on-CPU check) and force sidecar patching for any module with float8 parameters. This takes precedence over force_direct_patching, since direct patching is not possible on fp8 weights, and fixes every model architecture at once (Flux.1/2, SD3, Qwen, SD1/SDXL UNet, etc.). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

lstein requested review from JPPhoto, Pfannkuchensack, blessedcoolant and dunkeroni as code owners May 29, 2026 02:15

github-actions Bot added python PRs that change python files backend PRs that change backend files python-tests PRs that change python tests labels May 29, 2026

lstein assigned Pfannkuchensack May 29, 2026

lstein added the 6.13.5 Library Updates label May 29, 2026

lstein added this to Invoke - Community Roadmap May 29, 2026

lstein moved this to 6.13.5 LIBRARY UPDATES in Invoke - Community Roadmap May 29, 2026

Merge branch 'main' into lstein/bugfix/flux2-fp8-lora

f64e02f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(lora): sidecar-patch fp8 weights to avoid float8 add crash#9246

fix(lora): sidecar-patch fp8 weights to avoid float8 add crash#9246
lstein wants to merge 2 commits into
mainfrom
lstein/bugfix/flux2-fp8-lora

lstein commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

lstein commented May 29, 2026

Summary

Root cause

Fix

Related Issues / Discussions

QA Instructions

Merge Plan

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants