Skip to content

common, server : preserve HF file for cached models#25152

Open
mrexodia wants to merge 1 commit into
ggml-org:masterfrom
mrexodia:preserve-hf-file
Open

common, server : preserve HF file for cached models#25152
mrexodia wants to merge 1 commit into
ggml-org:masterfrom
mrexodia:preserve-hf-file

Conversation

@mrexodia

Copy link
Copy Markdown

Overview

Today I downloaded a model from: https://huggingface.co/mudler/Step-3.5-Flash-APEX-GGUF, which contains the following files:

Step-3.7-Flash-APEX-Balanced.gguf
Step-3.7-Flash-APEX-Compact.gguf
Step-3.7-Flash-APEX-I-Balanced.gguf
Step-3.7-Flash-APEX-I-Compact.gguf <-- I downloaded this one to my huggingface cache
Step-3.7-Flash-APEX-I-Mini.gguf
Step-3.7-Flash-APEX-I-Quality.gguf
Step-3.7-Flash-APEX-Quality.gguf

Using llama serve discovers the model with the following identifier: mudler/Step-3.7-Flash-APEX-GGUF:COMPACT. The issue is that the tag :COMPACT is a ambiguous and llama will (silently) start downloading Step-3.7-Flash-APEX-Compact.gguf from huggingface when you start using the model.

The fix is to pass the GGUF filename using LLAMA_ARG_HF_FILE so we avoid the ambiguity.

Additional information

I think there is a larger discussion to be had about the way the tag is determined, because a better tag name here would arguably be I-COMPACT. Unsloth's UD-xxx quants have a similar issue: https://https://huggingface.co/unsloth/Step-3.7-Flash-GGUF. This is not something I would feel comfortable submitting a PR for though.

Requirements

  • I have read and agree with the contributing guidelines
  • AI usage disclosure: YES, pi:gpt-5.5 was used to triage and fix the issue. I reviewed the code thoroughly by hand and fully understand the changes.

Duncan

@mrexodia mrexodia requested review from a team as code owners June 29, 2026 23:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant