Skip to content

[AMD] Add RyzenAI passes for latest flows; Restore legacy VitisAI passes for eager flow for backward compatibility#2481

Open
poganesh wants to merge 4 commits into
microsoft:mainfrom
poganesh:ryzen-folder
Open

[AMD] Add RyzenAI passes for latest flows; Restore legacy VitisAI passes for eager flow for backward compatibility#2481
poganesh wants to merge 4 commits into
microsoft:mainfrom
poganesh:ryzen-folder

Conversation

@poganesh
Copy link
Copy Markdown
Contributor

@poganesh poganesh commented May 29, 2026

Describe your changes

Adds the RyzenGenerateModelLLM Olive pass for the new RyzenAI full-fusion / token-fusion flow on AMD NPU/hybrid devices, and restores two legacy passes to support the older VitisAI eager flow in parallel:

  • VitisGenerateModelLLM (from commit a24d73a) - legacy eager model generation
  • QuarkQuantizationVitisAI (from commit 1615bda, originally QuarkQuantization) - legacy Quark 0.9 quantization

The legacy quantization class was renamed to avoid a name collision with the existing QuarkQuantization pass (Quark 0.11+) used by the new fusion flow. Legacy module path: olive.passes.quark_vitisai.

Pass mapping

Flow Quark Generation pass Quantization pass
New (RyzenAI fusion) 0.11 RyzenGenerateModelLLM QuarkQuantization
Legacy (VitisAI eager) 0.9 VitisGenerateModelLLM QuarkQuantizationVitisAI

Companion olive-recipes PR

microsoft/olive-recipes#PR-440— uses these passes per model in RyzenAI/ and VitisAI/ subfolders.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR adds a Quark + VitisAI quantization workflow (Torch + ONNX) into Olive, including model/data prep utilities, quantization configuration helpers, and a DBRX MoE expert-module replacement to support quantization/export.

Changes:

  • Add QuarkQuantizationVitisAI pass supporting Quark-ONNX and Quark-Torch flows and register it in olive_config.json.
  • Introduce Torch LLM PTQ utilities (model/data/config preparation, quantize runner) and ONNX config/runner utilities.
  • Add DBRX expert module replacement for MoE quantization, and update Vitis LLM model generation pass API and config.

Reviewed changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
olive/passes/quark_vitisai/torch/language_modeling/module_replacement/dbrx_expert.py Adds DBRX MoE expert module replacement used during quantization.
olive/passes/quark_vitisai/torch/language_modeling/module_replacement/init.py Package init for module replacement utilities.
olive/passes/quark_vitisai/torch/language_modeling/llm_utils/model_preparation.py Adds tokenizer/model loading utilities and MoE preparation hook for DBRX.
olive/passes/quark_vitisai/torch/language_modeling/llm_utils/data_preparation.py Adds calibration/training dataset utilities for PTQ.
olive/passes/quark_vitisai/torch/language_modeling/llm_utils/init.py Package init for LLM utilities.
olive/passes/quark_vitisai/torch/language_modeling/llm_ptq/quantize_quark.py Adds the Torch-side Quark quantization driver.
olive/passes/quark_vitisai/torch/language_modeling/llm_ptq/customized_configuration.py Adds supported quant schemes and spec/config factories.
olive/passes/quark_vitisai/torch/language_modeling/llm_ptq/configuration_preparation.py Builds Quark Torch quant/export configuration from CLI args/model-type.
olive/passes/quark_vitisai/torch/language_modeling/llm_ptq/init.py Package init for PTQ module.
olive/passes/quark_vitisai/torch/language_modeling/init.py Package init for torch language modeling module.
olive/passes/quark_vitisai/torch/init.py Package init for quark_vitisai torch integration.
olive/passes/quark_vitisai/quark_quantization_vitisai.py Implements Olive pass that routes to Quark-ONNX or Quark-Torch quantization.
olive/passes/quark_vitisai/onnx/quantize_quark.py Adds ONNX-side Quark quantization runner wrapper.
olive/passes/quark_vitisai/onnx/configuration_preparation.py Adds helpers to build ONNX QConfig (global/algo/extra options).
olive/passes/quark_vitisai/onnx/init.py Package init for ONNX integration.
olive/passes/quark_vitisai/init.py Package init for quark_vitisai pass package.
olive/passes/onnx/vitis_ai/vitis_generate_model_llm.py Changes Vitis LLM model generation pass parameters/API and hardcodes output ONNX name.
olive/passes/onnx/ryzen_ai/ryzen_generate_model_llm.py Adds Ryzen LLM model generation pass (existing full-feature config).
olive/passes/onnx/ryzen_ai/init.py Package init for ryzen_ai passes.
olive/olive_config.json Registers new Quark/VitisAI passes in Olive’s pass registry.

Comment on lines +164 to +169
q_layers_name = MODEL_NAME_Q_LAYERS_MAP[model_type]
layer_quant_config[q_layers_name] = QuantizationConfig(
input_tensors=global_quant_config.input_tensors,
weight=global_quant_config.weight,
output_tensors=attn_qspec,
)
Comment on lines +251 to +273
def get_device_max_memory() -> dict[Union[int, str], Union[int, str]]:
for i in range(torch.cuda.device_count()):
_ = torch.tensor([0], device=i)
cuda_avail_memory = {i: torch.cuda.mem_get_info(i)[0] for i in range(torch.cuda.device_count())}
cpu_avail_memory = psutil.virtual_memory().available
max_memory = {}
for cuda_num, cuda_memory in cuda_avail_memory.items():
cuda_memory_gb = cuda_memory / (10**9)
logger.info("GPU%s cuda_avail_memory: %.1fGB", cuda_num, cuda_memory_gb)
if cuda_num == 0:
# The ratio is an experience value that you can manually adjust yourself.
gpu0_ratio = 0.5 if cuda_memory_gb > 30 else 0.3
max_memory[cuda_num] = f"{cuda_memory_gb * gpu0_ratio:.1f}GB"
else:
other_ratio = 0.875 if cuda_memory_gb > 30 else 0.7
max_memory[cuda_num] = f"{cuda_memory_gb * other_ratio:.1f}GB"
logger.info("cpu_avail_memory: %.1fGB", cpu_avail_memory / (10**9))
cpu_ratio = 0.875
max_memory["cpu"] = f"{cpu_avail_memory / (10**9) * cpu_ratio:.1f}GB"
logger.info("final_use_model_kwargs: %s", max_memory)
# max_memory = {0: '0.1GB', 'cpu': '100GB'}

return max_memory
Comment on lines +124 to +127
if tokenizer.pad_token != "<unk>":
tokenizer.pad_token = tokenizer.eos_token
if tokenizer.pad_token is None:
tokenizer.pad_token = tokenizer.eos_token
Comment on lines +20 to +22
def get_pileval(
tokenizer: PreTrainedTokenizer, nsamples: int, seqlen: int, device: str | None, seed: int = 0
) -> DataLoader[torch.Tensor]:
Comment on lines +146 to +151
def my_collate_fn(blocks: list[dict[str, list[list[str]]]]) -> dict[str, torch.Tensor]:
data_batch = {}
data_batch["input_ids"] = torch.Tensor([block["input_ids"] for block in blocks])
if device:
data_batch["input_ids"] = data_batch["input_ids"].to(device)
return data_batch
Comment on lines +43 to +45
experts_module.mlp.w1 = None
experts_module.mlp.v1 = None
experts_module.mlp.w2 = None
Comment on lines 51 to 54
return ONNXModelHandler(
model_path=output_dir,
onnx_file_name=output_model_name,
onnx_file_name="model.onnx",
)
Comment on lines +129 to +130
new_tmp_dir = tempfile.TemporaryDirectory(prefix="olive_tmp") # pylint: disable=R1732
tmp_model_path = str(Path(new_tmp_dir.name) / Path(output_model_path).name)
Comment on lines +156 to +158
onnx_model = onnx.load(tmp_model_path)
# the model is loaded into memory, so it's safe to delete previously exported files
new_tmp_dir.cleanup()
@poganesh
Copy link
Copy Markdown
Contributor Author

The co-pilot comments are on code which already existed and is just refactored into separate folders.

  • VitisGenerateModelLLM (from commit a24d73a) - legacy eager model generation
  • QuarkQuantizationVitisAI (from commit 1615bda, originally QuarkQuantization) - legacy Quark 0.9 quantization

Hence our recommendation is to address these comments in a separate PR based on older Quark version if required.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants