DimaBir · DimaBir · Oct 6, 2023 · Oct 6, 2023 · Oct 6, 2023 · Oct 6, 2023
diff --git a/README.md b/README.md
@@ -6,20 +6,21 @@
 2. [Requirements](#requirements)
     - [Steps to Run](#steps-to-run)
     - [Example Command](#example-command)
-5. [RESULTS](#results) ![Static Badge](https://img.shields.io/badge/update-orange)
+3. [RESULTS](#results) ![Static Badge](https://img.shields.io/badge/update-orange)
     - [Results explanation](#results-explanation)
     - [Example Input](#example-input)
-6. [Benchmark Implementation Details](#benchmark-implementation-details) ![New](https://img.shields.io/badge/-New-842E5B)
+    - [Example prediction results](#example-prediction-results)
+4. [Benchmark Implementation Details](#benchmark-implementation-details) ![New](https://img.shields.io/badge/-New-842E5B)
     - [PyTorch CPU & CUDA](#pytorch-cpu--cuda)
     - [TensorRT FP32 & FP16](#tensorrt-fp32--fp16)
     - [ONNX](#onnx)
     - [OpenVINO](#openvino)
-7. [Used methodologies](#used-methodologies) ![New](https://img.shields.io/badge/-New-96E5FE)
+5. [Benchmarking and Visualization](#benchmarking-and-visualization) ![New](https://img.shields.io/badge/-New-96E5FE)
     - [TensorRT Optimization](#tensorrt-optimization)
     - [ONNX Exporter](#onnx-exporter)
     - [OV Exporter](#ov-exporter)
-10. [Author](#author)
-11. [References](#references)
+6. [Author](#author)
+7. [References](#references)
 
 
 <img src="./inference/plot.png" width="100%">
@@ -44,20 +45,20 @@ docker build -t awesome-tensorrt
 docker run --gpus all --rm -it awesome-tensorrt
 
 # 3. Run the Script inside the Container
-python src/main.py
+python main.py [--mode all]
 ```
 
 ### Arguments
 - `--image_path`: (Optional) Specifies the path to the image you want to predict.
 - `--topk`: (Optional) Specifies the number of top predictions to show. Defaults to 5 if not provided.
-- `--mode`: Specifies the mode for exporting and running the model. Choices are: `onnx`, `ov`, `all`.
+- `--mode`: (Optional) Specifies the mode for exporting and running the model. Choices are: `onnx`, `ov`, `all`.  If not provided, it defaults to `all`.
 
 ### Example Command
 ```sh
-python src/main.py --topk 3 --mode=all
+python main.py --topk 3 --mode=ov
 ```
 
-This command will run predictions on the default image (`./inference/cat3.jpg`), show the top 3 predictions, and run all models (PyTorch CPU, CUDA, ONNX, OV, TRT-FP16, TRT-FP32). At the end results plot will be saved to `./inference/plot.png`
+This command will run predictions on the default image (`./inference/cat3.jpg`), show the top 3 predictions, and run OpenVINO model. Note: plot created only for `--mode=all` and results plotted and saved to `./inference/plot.png`
 
 ## RESULTS
 ### Inference Benchmark Results
@@ -76,6 +77,15 @@ Here is an example of the input image to run predictions and benchmarks on:
 
 <img src="./inference/cat3.jpg" width="20%">
 
+### Example prediction results
+```
+#1: 15% Egyptian cat
+#2: 14% tiger cat
+#3: 9% tabby
+#4: 2% doormat
+#5: 2% lynx
+```
+
 ## Benchmark Implementation Details
 Here you can see the flow for each model and benchmark.
 
@@ -116,62 +126,8 @@ OpenVINO is a toolkit from Intel that optimizes deep learning model inference fo
 4. Perform inference on the provided image using the OpenVINO model.
 5. Benchmark results, including average inference time, are logged for the OpenVINO model.
 
-## Used methodologies
-### TensorRT Optimization
-TensorRT is a high-performance deep learning inference optimizer and runtime library developed by NVIDIA. It is designed for optimizing and deploying trained neural network models on production environments. This project supports TensorRT optimizations in FP32 (single precision) and FP16 (half precision) modes, offering different trade-offs between inference speed and model accuracy.
-
-#### Features
-- **Performance Boost**: TensorRT can significantly accelerate the inference of neural network models, making it suitable for deployment in resource-constrained environments.
-- **Precision Modes**: Supports FP32 for maximum accuracy and FP16 for faster performance with a minor trade-off in accuracy.
-- **Layer Fusion**: TensorRT fuses layers and tensors in the neural network to reduce memory access overhead and improve execution speed.
-- **Dynamic Tensor Memory**: Efficiently handles varying batch sizes without re-optimization.
-
-#### Usage
-When running the main script, use the'- mode all' argument to employ TensorRT optimizations in the project.
-This will initiate all models, including PyTorch models, that will be compiled to the TRT model with `FP16` and `FP32` precision modes. Then, in one of the steps, we will run inference on the specified image using the TensorRT-optimized model.
-Example:
-```sh
-python src/main.py --mode all
-```
-#### Requirements
-Ensure you have the TensorRT library and the torch_tensorrt package installed in your environment. Also, for FP16 optimizations, it's recommended to have a GPU that supports half-precision arithmetic (like NVIDIA GPUs with Tensor Cores).
-
-### ONNX Exporter
-ONNX Model Exporter (`ONNXExporter`) utility is incorporated within this project to enable converting the native PyTorch model into the ONNX format.
-Using the ONNX format, inference and benchmarking can be performed with the ONNX Runtime, which offers platform-agnostic optimizations and is widely supported across numerous platforms and devices.
-
-#### Features
-- **Standardized Format**: ONNX provides an open-source format for AI models. It defines an extensible computation graph model and definitions of built-in operators and standard data types.
-- **Interoperability**: Models in ONNX format can be used across various frameworks, tools, runtimes, and compilers.
-- **Optimizations**: The ONNX Runtime provides performance optimizations for both cloud and edge devices.
-
-#### Usage
-To leverage the `ONNXExporter` and conduct inference using the ONNX Runtime, utilize the `--mode onnx` argument when executing the main script.
-This will initiate the conversion process and then run inference on the specified image using the ONNX model.
-Example:
-```sh
-python src/main.py --mode onnx
-```
-
-#### Requirements
-Ensure the ONNX library is installed in your environment to use the ONNXExporter. Additionally, if you want to run inference using the ONNX model, install the ONNX Runtime.
-
-### OV Exporter
-OpenVINO Model Exporter utility (`OVExporter`) has been integrated into this project to facilitate the conversion of the ONNX model to the OpenVINO format.
-This enables inference and benchmarking using OpenVINO, a framework optimized for Intel hardware, providing substantial speed improvements, especially on CPUs.
-
-#### Features
-- **Model Optimization**: Converts the ONNX model to OpenVINO's Intermediate Representation (IR) format. This optimized format allows for faster inference times on Intel hardware.
-- **Versatility**: OpenVINO can target various Intel hardware devices such as CPUs, integrated GPUs, FPGAs, and VPUs.
-- **Ease of Use**: The `OVExporter` seamlessly transitions from ONNX to OpenVINO, abstracting the conversion details and providing a straightforward interface.
-
-#### Usage
-To utilize `OVExporter` and perform inference using OpenVINO, use the `--mode ov` argument when running the main script.
-This will trigger the conversion process and subsequently run inference on the provided image using the optimized OpenVINO model.
-Example:
-```sh
-python src/main.py --mode ov
-```
+## Benchmarking and Visualization
+The results of the benchmarks for all modes are saved and visualized in a bar chart, showcasing the average inference times across different backends. The visualization aids in comparing the performance gains achieved with different optimizations.
 
 #### Requirements
 Ensure you have installed the OpenVINO Toolkit and the necessary dependencies to use OpenVINO's model optimizer and inference engine.

diff --git a/benchmark/__init__.py b/benchmark/__init__.py
diff --git a/benchmark/benchmark_models.py b/benchmark/benchmark_models.py
@@ -0,0 +1,20 @@
+import src.benchmark_class
+from benchmark.benchmark_utils import run_benchmark
+from src.benchmark_class import PyTorchBenchmark, ONNXBenchmark, OVBenchmark
+import openvino as ov
+import torch
+import onnxruntime as ort
+
+
+def benchmark_onnx_model(ort_session: ort.InferenceSession):
+    run_benchmark(None, None, None, ort_session, onnx=True)
+
+
+def benchmark_ov_model(ov_model: ov.CompiledModel) -> src.benchmark_class.OVBenchmark:
+    ov_benchmark = OVBenchmark(ov_model, input_shape=(1, 3, 224, 224))
+    ov_benchmark.run()
+    return ov_benchmark
+
+
+def benchmark_cuda_model(cuda_model: torch.nn.Module, device: str, dtype: torch.dtype):
+    run_benchmark(cuda_model, device, dtype)
diff --git a/benchmark/benchmark_utils.py b/benchmark/benchmark_utils.py
@@ -0,0 +1,125 @@
+import logging
+
+import numpy as np
+import pandas as pd
+import matplotlib.pyplot as plt
+import seaborn as sns
+from typing import Dict, Any
+import torch
+import onnxruntime as ort
+
+from src.benchmark_class import PyTorchBenchmark, ONNXBenchmark, OVBenchmark
+
+
+def run_benchmark(
+    model: torch.nn.Module,
+    device: str,
+    dtype: torch.dtype,
+    ort_session: ort.InferenceSession = None,
+    onnx: bool = False,
+) -> None:
+    """
+    Run and log the benchmark for the given model, device, and dtype.
+
+    :param onnx:
+    :param ort_session:
+    :param model: The model to be benchmarked.
+    :param device: The device to run the benchmark on ("cpu" or "cuda").
+    :param dtype: The data type to be used in the benchmark (typically torch.float32 or torch.float16).
+    """
+    if onnx:
+        logging.info(f"Running Benchmark for ONNX")
+        benchmark = ONNXBenchmark(ort_session, input_shape=(32, 3, 224, 224))
+    else:
+        logging.info(f"Running Benchmark for {device.upper()} and precision {dtype}")
+        benchmark = PyTorchBenchmark(model, device=device, dtype=dtype)
+    benchmark.run()
+
+
+def run_all_benchmarks(
+    models: Dict[str, Any], img_batch: np.ndarray
+) -> Dict[str, float]:
+    """
+    Run benchmarks for all models and return a dictionary of average inference times.
+
+    :param models: Dictionary of models. Key is model type ("onnx", "ov", "pytorch", "trt_fp32", "trt_fp16"), value is the model.
+    :param img_batch: The batch of images to run the benchmark on.
+    :return: Dictionary of average inference times. Key is model type, value is average inference time.
+    """
+    results = {}
+
+    # ONNX benchmark
+    logging.info(f"Running benchmark inference for ONNX model")
+    onnx_benchmark = ONNXBenchmark(models["onnx"], img_batch.shape)
+    avg_time_onnx = onnx_benchmark.run()
+    results["ONNX"] = avg_time_onnx
+
+    # OpenVINO benchmark
+    logging.info(f"Running benchmark inference for OpenVINO model")
+    ov_benchmark = OVBenchmark(models["ov"], img_batch.shape)
+    avg_time_ov = ov_benchmark.run()
+    results["OpenVINO"] = avg_time_ov
+
+    # PyTorch + TRT benchmark
+    configs = [
+        ("cpu", torch.float32, False),
+        ("cuda", torch.float32, False),
+        ("cuda", torch.float32, True),
+        ("cuda", torch.float16, True),
+    ]
+    for device, precision, is_trt in configs:
+        model_to_use = models[f"PyTorch_{device}"].to(device)
+
+        if not is_trt:
+            pytorch_benchmark = PyTorchBenchmark(
+                model_to_use, device=device, dtype=precision
+            )
+            logging.info(f"Running benchmark inference for PyTorch_{device} model")
+            avg_time_pytorch = pytorch_benchmark.run()
+            results[f"PyTorch_{device}"] = avg_time_pytorch
+
+        else:
+            # TensorRT benchmarks
+            if precision == torch.float32 or precision == torch.float16:
+                mode = "fp32" if precision == torch.float32 else "fp16"
+                logging.info(f"Running benchmark inference for TRT_{mode} model")
+                trt_benchmark = PyTorchBenchmark(
+                    models[f"trt_{mode}"], device=device, dtype=precision
+                )
+                avg_time_trt = trt_benchmark.run()
+                results[f"TRT_{mode}"] = avg_time_trt
+
+    return results
+
+
+def plot_benchmark_results(results: Dict[str, float]):
+    """
+    Plot the benchmark results using Seaborn.
+
+    :param results: Dictionary of average inference times. Key is model type, value is average inference time.
+    """
+    # Convert dictionary to two lists for plotting
+    models = list(results.keys())
+    times = list(results.values())
+
+    # Create a DataFrame for plotting
+    data = pd.DataFrame({"Model": models, "Time": times})
+
+    # Sort the DataFrame by Time
+    data = data.sort_values("Time", ascending=True)
+
+    # Plot
+    plt.figure(figsize=(10, 6))
+    ax = sns.barplot(x=data["Time"], y=data["Model"], hue=data["Model"], palette="rocket", legend=False)
+
+    # Adding the actual values on the bars
+    for index, value in enumerate(data["Time"]):
+        ax.text(value, index, f"{value:.2f} ms", color="black", ha="left", va="center")
+
+    plt.xlabel("Average Inference Time (ms)")
+    plt.ylabel("Model Type")
+    plt.title("ResNet50 - Inference Benchmark Results")
+
+    # Save the plot to a file
+    plt.savefig("./inference/plot.png", bbox_inches="tight")
+    plt.show()
diff --git a/common/__init__.py b/common/__init__.py
diff --git a/common/utils.py b/common/utils.py
@@ -0,0 +1,65 @@
+import argparse
+import openvino as ov
+import torch
+from src.model import ModelLoader
+from src.onnx_exporter import ONNXExporter
+from src.ov_exporter import OVExporter
+import onnxruntime as ort
+
+
+def export_onnx_model(
+    onnx_path: str, model_loader: ModelLoader, device: torch.device
+) -> None:
+    onnx_exporter = ONNXExporter(model_loader.model, device, onnx_path)
+    onnx_exporter.export_model()
+
+
+def init_onnx_model(
+    onnx_path: str, model_loader: ModelLoader, device: torch.device
+) -> ort.InferenceSession:
+    export_onnx_model(onnx_path=onnx_path, model_loader=model_loader, device=device)
+    return ort.InferenceSession(onnx_path, providers=["CPUExecutionProvider"])
+
+
+def init_ov_model(onnx_path: str) -> ov.CompiledModel:
+    ov_exporter = OVExporter(onnx_path)
+    return ov_exporter.export_model()
+
+
+def init_cuda_model(
+    model_loader: ModelLoader, device: torch.device, dtype: torch.dtype
+) -> torch.nn.Module:
+    cuda_model = model_loader.model.to(device)
+    if device == "cuda":
+        cuda_model = torch.jit.trace(
+            cuda_model, [torch.randn((1, 3, 224, 224)).to(device)]
+        )
+    return cuda_model
+
+
+def parse_arguments():
+    # Initialize ArgumentParser with description
+    parser = argparse.ArgumentParser(description="PyTorch Inference")
+    parser.add_argument(
+        "--image_path",
+        type=str,
+        default="./inference/cat3.jpg",
+        help="Path to the image to predict",
+    )
+    parser.add_argument(
+        "--topk", type=int, default=5, help="Number of top predictions to show"
+    )
+    parser.add_argument(
+        "--onnx_path",
+        type=str,
+        default="./inference/model.onnx",
+        help="Path where model in ONNX format will be exported",
+    )
+    parser.add_argument(
+        "--mode",
+        choices=["onnx", "ov", "cuda", "all"],
+        default="all",
+        help="Mode for exporting and running the model. Choices are: onnx, ov, cuda or all.",
+    )
+
+    return parser.parse_args()