Goal: download a real Hugging Face Qwen3.5 27B Q4_K_M GGUF, make it run end to end on native Psionic, and publish honest benchmark numbers in this issue.
Bounded target for this issue:
- pick one real HF artifact for the 27B Q4_K_M row
- make the current loader/runtime admit it instead of refusing or crashing
- keep support honest about backend boundaries
- run a real benchmark receipt on the host(s) we can reach now
- post TTFT, total latency, and end-to-end tok/s as comments
Likely references to inspect while implementing:
- current
qwen35 loader and runtime in psionic-models and psionic-serve
- MLX Qwen support if a matching 27B row exists
llama.cpp GGUF/Qwen handling for metadata and layer-shape assumptions
This issue is done when one explicit 27B Q4_K_M Hugging Face row runs on Psionic with posted receipts.
Goal: download a real Hugging Face Qwen3.5 27B Q4_K_M GGUF, make it run end to end on native Psionic, and publish honest benchmark numbers in this issue.
Bounded target for this issue:
Likely references to inspect while implementing:
qwen35loader and runtime inpsionic-modelsandpsionic-servellama.cppGGUF/Qwen handling for metadata and layer-shape assumptionsThis issue is done when one explicit 27B Q4_K_M Hugging Face row runs on Psionic with posted receipts.