DimaBir · DimaBir · Oct 2, 2023 · Oct 2, 2023 · Oct 2, 2023 · Oct 2, 2023
diff --git a/Dockerfile b/Dockerfile
@@ -6,11 +6,11 @@ RUN apt-get update && apt-get install -y \
     python3-pip \
     git
 
-# Install Python packages
-RUN pip3 install torch torchvision torch-tensorrt pandas Pillow numpy packaging onnx
-ø
 # Set the working directory
 WORKDIR /workspace
 
 # Copy local project files to /workspace in the image
 COPY . /workspace
+
+# Install Python packages
+RUN pip3 install --no-cache-dir -r /workspace/requirements.txt
diff --git a/README.md b/README.md
@@ -7,9 +7,9 @@
 5. [Inference Benchmark Results](#inference-benchmark-results)
    - [Example of Results](#example-of-results)
    - [Explanation of Results](#explanation-of-results)
-6. [Author](#author)
-7. [References](#references)
-8. [Notes](#notes)
+6. [ONNX Exporter](#onnx-exporter)
+7. [Author](#author)
+8. [References](#references)
 
 ## Overview
 This project demonstrates how to perform inference with a PyTorch model and optimize it using NVIDIA TensorRT. The script loads a pre-trained ResNet-50 model from torchvision, performs inference on a user-provided image, and prints the top-K predicted classes. Additionally, the script benchmarks the model's performance in the following configurations: CPU, CUDA, TensorRT-FP32, and TensorRT-FP16, providing insights into the speedup gained through optimization.
@@ -30,19 +30,20 @@ docker build -t awesome-tesnorrt .
 docker run --gpus all --rm -it awesome-tesnorrt
 
 # 3. Run the Script inside the Container
-python src/main.py --image_path /path-to-image/image.jpg --topk 2
+python src/main.py
 ```
 
 ### Arguments
-- `--image_path`: Specifies the path to the image you want to predict.
+- `--image_path`: (Optional) Specifies the path to the image you want to predict.
 - `--topk`: (Optional) Specifies the number of top predictions to show. Defaults to 5 if not provided.
+- `--onnx`: (Optional) Specifies if we want export ResNet50 model to ONNX and run benchmark only for this model
 
 ## Example Command
 ```sh
-python src/main.py --image_path ./inference/cat3.jpg --topk 3 --show_image
+python src/main.py --image_path ./inference/cat3.jpg --topk 3 --onnx
 ```
 
-This command will run predictions on the image at the specified path, show the top 3 predictions, and display the image. If you do not want to display the image, omit the `--show_image` flag. For the default 5 top predictions, omit the `--topk` argument or set it to 5.
+This command will run predictions on the image at the specified path and show the top 3 predictions using both PyTorch and ONNX Runtime models. For the default 5 top predictions, omit the --topk argument or set it to 5.
 
 ## Inference Benchmark Results
 
@@ -58,6 +59,7 @@ My prediction: %33 tabby
 My prediction: %26 Egyptian cat
 Running Benchmark for CPU
 Average batch time: 942.47 ms
+Average ONNX inference time: 15.59 ms
 Running Benchmark for CUDA
 Average batch time: 41.02 ms
 Compiling and Running Inference Benchmark for TensorRT with precision: torch.float32
@@ -70,16 +72,16 @@ Average batch time: 7.25 ms
 - First k lines show the topk predictions. For example, `My prediction: %33 tabby` displays the highest confidence prediction made by the model for the input image, confidence level (`%33`), and the predicted class (`tabby`).
 - The following lines provide information about the average batch time for running the model in different configurations:
   - `Running Benchmark for CPU` and `Average batch time: 942.47 ms` indicate the average batch time when running the model on the CPU.
+  - `Average ONNX inference time: 15.59 ms` indicate the average batch time when running the ONNX model on the CPU.
   - `Running Benchmark for CUDA` and `Average batch time: 41.02 ms` indicate the average batch time when running the model on CUDA.
   - `Compiling and Running Inference Benchmark for TensorRT with precision: torch.float32` and `Average batch time: 19.20 ms` show the average batch time when running the model with TensorRT using `float32` precision.
   - `Compiling and Running Inference Benchmark for TensorRT with precision: torch.float16` and `Average batch time: 7.25 ms` indicate the average batch time when running the model with TensorRT using `float16` precision.
 
+## ONNX Exporter
+The ONNX Exporter utility is integrated into this project to allow the conversion of the PyTorch model to ONNX format, enabling inference and benchmarking using ONNX Runtime. The ONNX model can provide hardware-agnostic optimizations and is widely supported across various platforms and devices.
+
 ## Author
 [DimaBir](https://github.com/DimaBir)
 
 ## References
 - [ResNetTensorRT Project](https://github.com/DimaBir/ResNetTensorRT/tree/main)
-
-## Notes
-- The project uses a Docker container built on top of the NVIDIA TensorRT image to ensure that all dependencies, including CUDA and TensorRT, are correctly installed and configured.
-- Please ensure you have the NVIDIA Container Toolkit installed to run the container with GPU support.
diff --git a/inference/cat3.jpg b/inference/cat3.jpg
diff --git a/inference/fan.jpg b/inference/fan.jpg
diff --git a/inference/image-2.jpg b/inference/image-2.jpg
diff --git a/inference/vase.jpg b/inference/vase.jpg
diff --git a/requirements.txt b/requirements.txt
@@ -0,0 +1,9 @@
+torch
+torchvision
+torch-tensorrt
+pandas
+Pillow
+numpy
+packaging
+onnx
+onnxruntime
diff --git a/src/benchmark.py b/src/benchmark.py
@@ -1,16 +1,35 @@
 import time
 from typing import Tuple
 
+from abc import ABC, abstractmethod
 import numpy as np
 import torch
 import torch.backends.cudnn as cudnn
 import logging
+import onnxruntime as ort
 
 # Configure logging
 logging.basicConfig(filename="model.log", level=logging.INFO)
 
 
-class Benchmark:
+class Benchmark(ABC):
+    """
+    Abstract class representing a benchmark.
+    """
+
+    def __init__(self, nruns: int = 100, nwarmup: int = 50):
+        self.nruns = nruns
+        self.nwarmup = nwarmup
+
+    @abstractmethod
+    def run(self) -> None:
+        """
+        Abstract method to run the benchmark.
+        """
+        pass
+
+
+class PyTorchBenchmark:
     def __init__(
         self,
         model: torch.nn.Module,
@@ -74,3 +93,43 @@ def run(self) -> None:
         print(f"Input shape: {input_data.size()}")
         print(f"Output features size: {features.size()}")
         logging.info(f"Average batch time: {np.mean(timings) * 1000:.2f} ms")
+
+
+class ONNXBenchmark(Benchmark):
+    """
+    A class used to benchmark the performance of an ONNX model.
+    """
+
+    def __init__(
+        self,
+        ort_session: ort.InferenceSession,
+        input_shape: tuple,
+        nruns: int = 100,
+        nwarmup: int = 50,
+    ):
+        super().__init__(nruns)
+        self.ort_session = ort_session
+        self.input_shape = input_shape
+        self.nwarmup = nwarmup
+        self.nruns = nruns
+
+    def run(self) -> None:
+        print("Warming up ...")
+        # Adjusting the batch size in the input shape to match the expected input size of the model.
+        input_shape = (1,) + self.input_shape[1:]
+        input_data = np.random.randn(*input_shape).astype(np.float32)
+
+        for _ in range(self.nwarmup):  # Warm-up runs
+            _ = self.ort_session.run(None, {"input": input_data})
+
+        print("Starting benchmark ...")
+        timings = []
+
+        for _ in range(self.nruns):
+            start_time = time.time()
+            _ = self.ort_session.run(None, {"input": input_data})
+            end_time = time.time()
+            timings.append(end_time - start_time)
+
+        avg_time = np.mean(timings) * 1000
+        logging.info(f"Average ONNX inference time: {avg_time:.2f} ms")