Skip to content

Kernel panic (IOGPUMemory.cpp:550) on M4 Max with large context prefill (~173K tokens) #3186

@kotono-amaha

Description

@kotono-amaha

Apple Feedback ID: FB22091885

This issue has been filed with Apple and is cross-referenced here for the MLX community. A fix may come from either side.

Kernel panic in IOGPUMemory.cpp:550 triggered by large Metal GPU memory allocation during MLX inference on M4 Max.

PANIC STRING:
"completeMemory() prepare count underflow" @IOGPUMemory.cpp:550

SYSTEM:

  • Hardware: Apple M4 Max (36GB unified memory)
  • macOS: 26.3 (25D125)
  • Kernel: Darwin 25.3.0 xnu-12377.81.4~5/RELEASE_ARM64_T6041

REPRODUCIBLE: Yes — confirmed twice with identical call stacks.

REPRODUCTION STEPS:

  1. Install MLX and mlx-lm via pip on Python 3.14 ARM64
  2. Load a large quantized LLM (Qwen3.5-27B Q5_K_M) via mlx_vlm.load()
  3. Construct a prompt consisting of 147 concatenated model outputs totalling approximately 173,000 tokens
  4. Call mlx_vlm.generate() with this prompt — prefill phase begins processing the full context
  5. Kernel panics during prefill, consistently at IOGPUMemory.cpp:550

ROOT COMPONENT:
com.apple.iokit.IOGPUFamily (129.3.2)

NOTES:

  • Panic does not occur with smaller prompts (under ~10,000 tokens)
  • Memory capacity is not the issue — system has 36GB and model occupies ~26GB, leaving sufficient headroom
  • Issue appears to be a GPU memory accounting state corruption triggered
    by a single contiguous Metal allocation for a very large attention computation, not an out-of-memory condition
  • Two panic logs attached with identical backtraces confirming deterministic reproducibility

Suggested mitigation for MLX:
Add a prefill token count guard in mlx_lm before the Metal allocator is called. If the prompt exceeds a safe threshold (empirically somewhere
below 173K tokens on M4 Max with 36GB), either raise a clear exception with guidance to chunk the prompt, or automatically split the prefill
into safe-sized segments. This would prevent the IOGPUFamily kernel panic without requiring a macOS fix.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions