Skip to content

backends/coreml: LFM2.5 1.2B PTE compiles to mlmodelc but ANE init fails (ANECCompile FAILED) #19635

@msluszniak

Description

@msluszniak

Problem

After applying the workaround in #19634, an LFM2.5 1.2B CoreML PTE loads via executorch.runtime.Runtime.load_program(...) and all metadata methods (get_eos_ids, get_max_seq_len, use_kv_cache, …) succeed. prog.load_method("forward") then fails:

[ETCoreMLModelManager.mm:495] Successfully got compiled model ...
[ETCoreMLModelAnalyzer.mm:68] [Core ML] Failed to create model profiler.
    Failed to build the model execution plan using a model architecture file '.../model.mil'
[coreml_backend_delegate.mm:324] CoreMLBackend: Failed to init the model.
[method.cpp:114] Init failed for backend CoreMLBackend: 0x23
E5RT encountered an STL exception. msg = MILCompilerForANE error:
    failed to compile ANE model using ANEF. Error=_ANECompiler : ANECCompile() FAILED.

MIL → mlmodelc compilation succeeds; the ANE-specific execution-plan build fails. Reproduces via executorch.runtime on macOS and on iPhone 17 Pro / iOS 26.4.2 (surfaces in react-native-executorch as code: 35 "Failed to load LLM runner"), so it's a CoreML/ANE-side issue rather than a runtime one. compute_units: cpu_only and cpu_and_gpu succeed, but XNNPACK already covers the CPU case at higher throughput — the value of CoreML for this model is the ANE.

Reproduces with two different quantisation modes, ruling out a quantiser-specific cause:

  1. unquantised fp16 (no quantization: block)
  2. weight-only 4-bit via the documented torchao quantize_ path (qmode: 4w, see docs/source/backends/coreml/coreml-quantization.md)

So the failure is in lowering LFM2's short-conv conv_state mutation (self.conv_state.copy_(new_state) in examples/models/lfm2/short_conv.py, which decomposes to slice_copy + index_put) to an ANE-compatible MIL representation. The same model graph works on cpu_and_gpu.

Reproduce

# Apply workaround from #19634 first.

cat > examples/models/lfm2/config/lfm2_coreml_4w.yaml <<'EOF'
base:
  metadata: '{"get_bos_id": 1, "get_eos_ids":[7]}'
model:
  use_kv_cache: True
  enable_dynamic_shape: False
  dtype_override: fp32
quantization:
  qmode: 4w
  group_size: 32
backend:
  coreml:
    enabled: True
    ios: 18
    enable_state: True
    preserve_sdpa: True
    compute_units: cpu_and_ne
EOF

python -m extension.llm.export.export_llm \
  --config examples/models/lfm2/config/lfm2_coreml_4w.yaml \
  +base.model_class=lfm2_5_1_2b \
  +base.params=examples/models/lfm2/config/lfm2_5_1_2b_config.json \
  +export.max_seq_length=2048 \
  +export.max_context_length=2048 \
  +export.output_name=lfm2_coreml_4w.pte

python -c "
from executorch.runtime import Runtime
prog = Runtime.get().load_program('lfm2_coreml_4w.pte')
prog.load_method('get_eos_ids')   # OK
prog.load_method('forward')       # fails with backend init 0x23 + ANECCompile FAILED
"

Asks

  1. Is the short-conv .copy_(...) mutation pattern expected to lower to ANE-compatible MIL? If not, what's the recommended rewrite of examples/models/lfm2/short_conv.py to produce an ANE-friendly graph?
  2. Is there a documented way to identify, before compilation, which ops in a model will block ANE compilation?

cc @kimishpatel @YifanShenSZ @cymbalrush @metascroy

Metadata

Metadata

Assignees

Labels

module: coremlIssues related to Apple's Core ML delegation and code under backends/apple/coreml/

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions