perf(cuda): fuse Laguna MoE expert paths by davide221 · Pull Request #23 · Luce-Org/llama.cpp-dflash-ggml

davide221 · 2026-06-26T17:58:11Z

Draft companion PR for the Laguna hub performance branch.

Summary:

Adds a fused ggml Laguna MoE combine op.
Extends CUDA MMVQ/MUL_MAT_ID for batched/tokenwise MoE expert paths and fusion.
Keeps the temporary FA tracing/vector-kernel experiments out of this branch.

Validation:

Built through the hub CUDA build: bench_laguna_generate and dflash_server.
Hub sanity run after this submodule commit: 128 prefill / 512 decode = 178.3 tok/s on RTX 3090 with f16 KV.

This is draft/WIP so we can preserve the branch while continuing byte-identical checks and the next decode-speed pass.

Keep the CUDA-only fused MoE op explicit in the CPU dispatcher so non-CUDA builds do not fail on missing switch coverage. Co-authored-by: Codex <codex@openai.com>

Bump the RPC protocol patch guard after the op-count change, include TQ3_0 in the unsupported CPU clamp cases, and avoid unreachable breaks after noreturn CPU-only guards. Co-authored-by: Codex <codex@openai.com>

…oe-kv-cuda

perf(cuda): fuse laguna moe expert paths

54af126

davide221 mentioned this pull request Jun 26, 2026

perf(laguna): align MoE router and verify paths Luce-Org/lucebox-hub#455

Draft

github-actions Bot added ggml CUDA labels Jun 26, 2026

davide221 and others added 3 commits June 28, 2026 00:27

fix(cpu): handle fused moe op in dispatcher

4b3128e

Keep the CUDA-only fused MoE op explicit in the CPU dispatcher so non-CUDA builds do not fail on missing switch coverage. Co-authored-by: Codex <codex@openai.com>

fix(ci): update ggml portability guards

7f14e81

Bump the RPC protocol patch guard after the op-count change, include TQ3_0 in the unsupported CPU clamp cases, and avoid unreachable breaks after noreturn CPU-only guards. Co-authored-by: Codex <codex@openai.com>

Merge remote-tracking branch 'origin/luce-dflash' into codex-laguna-m…

8b11267

…oe-kv-cuda

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf(cuda): fuse Laguna MoE expert paths#23

perf(cuda): fuse Laguna MoE expert paths#23
davide221 wants to merge 4 commits into
luce-dflashfrom
codex-laguna-moe-kv-cuda

davide221 commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

davide221 commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants