[AMD] Add RyzenAI passes for latest flows; Restore legacy VitisAI passes for eager flow for backward compatibility by poganesh · Pull Request #2481 · microsoft/Olive

poganesh · 2026-05-29T22:13:42Z

Describe your changes

Adds the RyzenGenerateModelLLM Olive pass for the new RyzenAI full-fusion / token-fusion flow on AMD NPU/hybrid devices, and restores two legacy passes to support the older VitisAI eager flow in parallel:

VitisGenerateModelLLM (from commit a24d73a) - legacy eager model generation
QuarkQuantizationVitisAI (from commit 1615bda, originally QuarkQuantization) - legacy Quark 0.9 quantization

The legacy quantization class was renamed to avoid a name collision with the existing QuarkQuantization pass (Quark 0.11+) used by the new fusion flow. Legacy module path: olive.passes.quark_vitisai.

Pass mapping

Flow	Quark	Generation pass	Quantization pass
New (RyzenAI fusion)	0.11	`RyzenGenerateModelLLM`	`QuarkQuantization`
Legacy (VitisAI eager)	0.9	`VitisGenerateModelLLM`	`QuarkQuantizationVitisAI`

Companion olive-recipes PR

microsoft/olive-recipes#PR-440— uses these passes per model in RyzenAI/ and VitisAI/ subfolders.

…vitisai

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR adds a Quark + VitisAI quantization workflow (Torch + ONNX) into Olive, including model/data prep utilities, quantization configuration helpers, and a DBRX MoE expert-module replacement to support quantization/export.

Changes:

Add QuarkQuantizationVitisAI pass supporting Quark-ONNX and Quark-Torch flows and register it in olive_config.json.
Introduce Torch LLM PTQ utilities (model/data/config preparation, quantize runner) and ONNX config/runner utilities.
Add DBRX expert module replacement for MoE quantization, and update Vitis LLM model generation pass API and config.

Reviewed changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
olive/passes/quark_vitisai/torch/language_modeling/module_replacement/dbrx_expert.py	Adds DBRX MoE expert module replacement used during quantization.
olive/passes/quark_vitisai/torch/language_modeling/module_replacement/init.py	Package init for module replacement utilities.
olive/passes/quark_vitisai/torch/language_modeling/llm_utils/model_preparation.py	Adds tokenizer/model loading utilities and MoE preparation hook for DBRX.
olive/passes/quark_vitisai/torch/language_modeling/llm_utils/data_preparation.py	Adds calibration/training dataset utilities for PTQ.
olive/passes/quark_vitisai/torch/language_modeling/llm_utils/init.py	Package init for LLM utilities.
olive/passes/quark_vitisai/torch/language_modeling/llm_ptq/quantize_quark.py	Adds the Torch-side Quark quantization driver.
olive/passes/quark_vitisai/torch/language_modeling/llm_ptq/customized_configuration.py	Adds supported quant schemes and spec/config factories.
olive/passes/quark_vitisai/torch/language_modeling/llm_ptq/configuration_preparation.py	Builds Quark Torch quant/export configuration from CLI args/model-type.
olive/passes/quark_vitisai/torch/language_modeling/llm_ptq/init.py	Package init for PTQ module.
olive/passes/quark_vitisai/torch/language_modeling/init.py	Package init for torch language modeling module.
olive/passes/quark_vitisai/torch/init.py	Package init for quark_vitisai torch integration.
olive/passes/quark_vitisai/quark_quantization_vitisai.py	Implements Olive pass that routes to Quark-ONNX or Quark-Torch quantization.
olive/passes/quark_vitisai/onnx/quantize_quark.py	Adds ONNX-side Quark quantization runner wrapper.
olive/passes/quark_vitisai/onnx/configuration_preparation.py	Adds helpers to build ONNX QConfig (global/algo/extra options).
olive/passes/quark_vitisai/onnx/init.py	Package init for ONNX integration.
olive/passes/quark_vitisai/init.py	Package init for quark_vitisai pass package.
olive/passes/onnx/vitis_ai/vitis_generate_model_llm.py	Changes Vitis LLM model generation pass parameters/API and hardcodes output ONNX name.
olive/passes/onnx/ryzen_ai/ryzen_generate_model_llm.py	Adds Ryzen LLM model generation pass (existing full-feature config).
olive/passes/onnx/ryzen_ai/init.py	Package init for ryzen_ai passes.
olive/olive_config.json	Registers new Quark/VitisAI passes in Olive’s pass registry.

+        q_layers_name = MODEL_NAME_Q_LAYERS_MAP[model_type]
+        layer_quant_config[q_layers_name] = QuantizationConfig(
+            input_tensors=global_quant_config.input_tensors,
+            weight=global_quant_config.weight,
+            output_tensors=attn_qspec,
+        )


+def get_device_max_memory() -> dict[Union[int, str], Union[int, str]]:
+    for i in range(torch.cuda.device_count()):
+        _ = torch.tensor([0], device=i)
+        cuda_avail_memory = {i: torch.cuda.mem_get_info(i)[0] for i in range(torch.cuda.device_count())}
+        cpu_avail_memory = psutil.virtual_memory().available
+        max_memory = {}
+        for cuda_num, cuda_memory in cuda_avail_memory.items():
+            cuda_memory_gb = cuda_memory / (10**9)
+            logger.info("GPU%s cuda_avail_memory: %.1fGB", cuda_num, cuda_memory_gb)
+            if cuda_num == 0:
+                # The ratio is an experience value that you can manually adjust yourself.
+                gpu0_ratio = 0.5 if cuda_memory_gb > 30 else 0.3
+                max_memory[cuda_num] = f"{cuda_memory_gb * gpu0_ratio:.1f}GB"
+            else:
+                other_ratio = 0.875 if cuda_memory_gb > 30 else 0.7
+                max_memory[cuda_num] = f"{cuda_memory_gb * other_ratio:.1f}GB"
+        logger.info("cpu_avail_memory: %.1fGB", cpu_avail_memory / (10**9))
+        cpu_ratio = 0.875
+        max_memory["cpu"] = f"{cpu_avail_memory / (10**9) * cpu_ratio:.1f}GB"
+        logger.info("final_use_model_kwargs: %s", max_memory)
+        # max_memory =  {0: '0.1GB', 'cpu': '100GB'}
+
+    return max_memory


+    if tokenizer.pad_token != "<unk>":
+        tokenizer.pad_token = tokenizer.eos_token
+    if tokenizer.pad_token is None:
+        tokenizer.pad_token = tokenizer.eos_token


+def get_pileval(
+    tokenizer: PreTrainedTokenizer, nsamples: int, seqlen: int, device: str | None, seed: int = 0
+) -> DataLoader[torch.Tensor]:


+    def my_collate_fn(blocks: list[dict[str, list[list[str]]]]) -> dict[str, torch.Tensor]:
+        data_batch = {}
+        data_batch["input_ids"] = torch.Tensor([block["input_ids"] for block in blocks])
+        if device:
+            data_batch["input_ids"] = data_batch["input_ids"].to(device)
+        return data_batch


+        experts_module.mlp.w1 = None
+        experts_module.mlp.v1 = None
+        experts_module.mlp.w2 = None


        return ONNXModelHandler(
            model_path=output_dir,
-            onnx_file_name=output_model_name,
+            onnx_file_name="model.onnx",
        )


+        new_tmp_dir = tempfile.TemporaryDirectory(prefix="olive_tmp")  # pylint: disable=R1732
+        tmp_model_path = str(Path(new_tmp_dir.name) / Path(output_model_path).name)


+        onnx_model = onnx.load(tmp_model_path)
+        # the model is loaded into memory, so it's safe to delete previously exported files
+        new_tmp_dir.cleanup()


poganesh · 2026-05-29T23:37:06Z

The co-pilot comments are on code which already existed and is just refactored into separate folders.

VitisGenerateModelLLM (from commit a24d73a) - legacy eager model generation
QuarkQuantizationVitisAI (from commit 1615bda, originally QuarkQuantization) - legacy Quark 0.9 quantization

Hence our recommendation is to address these comments in a separate PR based on older Quark version if required.

poganesh and others added 3 commits May 23, 2026 03:32

moving full fusion flow pass to new ryzen_ai folder, add old pass in …

5f0c5cd

…vitisai

adding legacy vitisai + quark 0.9 files

acfef81

Fix pass name

3af2a3a

Copilot AI review requested due to automatic review settings May 29, 2026 22:13

poganesh mentioned this pull request May 29, 2026

[AMD] Add RyzenAI token-fusion recipes; Restore legacy VitisAI eager recipes microsoft/olive-recipes#440

Open

Copilot AI reviewed May 29, 2026

View reviewed changes

Merge branch 'main' into ryzen-folder

2b0dc45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMD] Add RyzenAI passes for latest flows; Restore legacy VitisAI passes for eager flow for backward compatibility#2481

[AMD] Add RyzenAI passes for latest flows; Restore legacy VitisAI passes for eager flow for backward compatibility#2481
poganesh wants to merge 4 commits into
microsoft:mainfrom
poganesh:ryzen-folder

poganesh commented May 29, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

poganesh commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		new_tmp_dir = tempfile.TemporaryDirectory(prefix="olive_tmp") # pylint: disable=R1732
		tmp_model_path = str(Path(new_tmp_dir.name) / Path(output_model_path).name)

Conversation

poganesh commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe your changes

Pass mapping

Companion olive-recipes PR

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

poganesh commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

poganesh commented May 29, 2026 •

edited

Loading