I have followed the quantization documentation for the XNN backend.
Using that document, I have successfully quantized a Matmul executorch model.
Then, I have started with llm model with XNN backend.
For it, I am using the export script as :
quantization_script.py
From the above script, I have successfully generated a quantized model.
As a next step, I perform inference for FP32 as well as the int8 quantized model, and the output logs are as follows.
Inference code: inference.py
For FP32:

For int8 quantized model:
Here, I can't understand why I am getting the error. Can you please suggest a solution for this?
cc @GregoryComer @digantdesai @cbilgin
I have followed the quantization documentation for the XNN backend.
Using that document, I have successfully quantized a Matmul executorch model.
Then, I have started with llm model with XNN backend.
For it, I am using the export script as :
quantization_script.py
From the above script, I have successfully generated a quantized model.

As a next step, I perform inference for FP32 as well as the int8 quantized model, and the output logs are as follows.
Inference code: inference.py
For FP32:
For int8 quantized model:
Here, I can't understand why I am getting the error. Can you please suggest a solution for this?
cc @GregoryComer @digantdesai @cbilgin