Skip to content

Admit and benchmark Qwen3.5 27B Q4_K_M on native Psionic #898

@AtlantisPleb

Description

@AtlantisPleb

Goal: download a real Hugging Face Qwen3.5 27B Q4_K_M GGUF, make it run end to end on native Psionic, and publish honest benchmark numbers in this issue.

Bounded target for this issue:

  • pick one real HF artifact for the 27B Q4_K_M row
  • make the current loader/runtime admit it instead of refusing or crashing
  • keep support honest about backend boundaries
  • run a real benchmark receipt on the host(s) we can reach now
  • post TTFT, total latency, and end-to-end tok/s as comments

Likely references to inspect while implementing:

  • current qwen35 loader and runtime in psionic-models and psionic-serve
  • MLX Qwen support if a matching 27B row exists
  • llama.cpp GGUF/Qwen handling for metadata and layer-shape assumptions

This issue is done when one explicit 27B Q4_K_M Hugging Face row runs on Psionic with posted receipts.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions