Utilities to evaluate MLX quantizations.
See detailed results for more information:
# clone the repo
git clone git@github.com:deepsweet/mlx-eval.git
cd mlx-eval/
# install dependencies
uv sync
# prepare an original reference MLX model
# fof example, text-only using mlx-lm, or multimodal using mlx-vlm:
uv tool install mlx-vlm --with torchvision
mlx_vlm.convert \
--hf-path Qwen/Qwen3.6-35B-A3B \
--mlx-path /path/to/Qwen3.6-35B-A3B-MLX
# prepare a quantized target MLX model
# for example:
mlx_vlm.convert \
--hf-path Qwen/Qwen3.6-35B-A3B \
--mlx-path /path/to/Qwen3.6-35B-A3B-MLX-Q4 \
--quantize \
--q-bits 4
# compute and store the reference model data into outputs/
# mlx_eval.reference <reference_model_path> <window_count> <max_tokens>
uv run mlx_eval.reference /path/to/Qwen3.6-35B-A3B-MLX 16 8192
# and compare the target quantized model against it
# mlx_eval.compare <target_model_path> <window_count>
uv run mlx_eval.compare /path/to/Qwen3.6-35B-A3B-MLX-Q4 16uv run results/<model_name>.pyuv sync --group dev
uv run ruff check .
uv run pytest .MIT.
The evaluation prompt is derived from Aes Sedai's
combined_all_micro.txt.