Error during quantizing the executorch model

I have followed the quantization [documentation](https://docs.pytorch.org/executorch/stable/backends/xnnpack/xnnpack-quantization.html) for the XNN backend.

Using that document, I have successfully quantized a Matmul executorch model.
Then, I have started with llm model with XNN backend. 
For it, I am using the export script as :
[quantization_script.py](https://github.com/VEERA-NAGENDRA-KUMAR-EETHA/executorch_quantization/blob/main/quantization_script.py)

From the above script, I have successfully generated a quantized model.
As a next step, I perform inference for FP32 as well as the int8 quantized model, and the output logs are as follows.
Inference code: [inference.py](https://github.com/VEERA-NAGENDRA-KUMAR-EETHA/executorch_quantization/blob/main/inference.py)
For FP32:
<img width="1584" height="565" alt="Image" src="https://github.com/user-attachments/assets/d8b6117c-41bc-4ce5-b4ae-dd06fd3380e7" />

For int8 quantized model:

<img width="1230" height="209" alt="Image" src="https://github.com/user-attachments/assets/5c60990c-d1f0-41fa-9bf4-a33a03f035e5" />

Here, I can't understand why I am getting the error. Can you please suggest a solution for this?

cc @GregoryComer @digantdesai @cbilgin

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error during quantizing the executorch model #16432

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Error during quantizing the executorch model #16432

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions