Admit and benchmark Qwen3.5 27B Q4_K_M on native Psionic

Goal: download a real Hugging Face Qwen3.5 27B Q4_K_M GGUF, make it run end to end on native Psionic, and publish honest benchmark numbers in this issue.

Bounded target for this issue:
- pick one real HF artifact for the 27B Q4_K_M row
- make the current loader/runtime admit it instead of refusing or crashing
- keep support honest about backend boundaries
- run a real benchmark receipt on the host(s) we can reach now
- post TTFT, total latency, and end-to-end tok/s as comments

Likely references to inspect while implementing:
- current `qwen35` loader and runtime in `psionic-models` and `psionic-serve`
- MLX Qwen support if a matching 27B row exists
- `llama.cpp` GGUF/Qwen handling for metadata and layer-shape assumptions

This issue is done when one explicit 27B Q4_K_M Hugging Face row runs on Psionic with posted receipts.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Admit and benchmark Qwen3.5 27B Q4_K_M on native Psionic #898

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Admit and benchmark Qwen3.5 27B Q4_K_M on native Psionic #898

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions