diff --git a/README.md b/README.md
index bb10f69..7feb195 100644
--- a/README.md
+++ b/README.md
@@ -6,20 +6,21 @@
2. [Requirements](#requirements)
- [Steps to Run](#steps-to-run)
- [Example Command](#example-command)
-5. [RESULTS](#results) 
+3. [RESULTS](#results) 
- [Results explanation](#results-explanation)
- [Example Input](#example-input)
-6. [Benchmark Implementation Details](#benchmark-implementation-details) 
+ - [Example prediction results](#example-prediction-results)
+4. [Benchmark Implementation Details](#benchmark-implementation-details) 
- [PyTorch CPU & CUDA](#pytorch-cpu--cuda)
- [TensorRT FP32 & FP16](#tensorrt-fp32--fp16)
- [ONNX](#onnx)
- [OpenVINO](#openvino)
-7. [Used methodologies](#used-methodologies) 
+5. [Benchmarking and Visualization](#benchmarking-and-visualization) 
- [TensorRT Optimization](#tensorrt-optimization)
- [ONNX Exporter](#onnx-exporter)
- [OV Exporter](#ov-exporter)
-10. [Author](#author)
-11. [References](#references)
+6. [Author](#author)
+7. [References](#references)
@@ -44,20 +45,20 @@ docker build -t awesome-tensorrt
docker run --gpus all --rm -it awesome-tensorrt
# 3. Run the Script inside the Container
-python src/main.py
+python main.py [--mode all]
```
### Arguments
- `--image_path`: (Optional) Specifies the path to the image you want to predict.
- `--topk`: (Optional) Specifies the number of top predictions to show. Defaults to 5 if not provided.
-- `--mode`: Specifies the mode for exporting and running the model. Choices are: `onnx`, `ov`, `all`.
+- `--mode`: (Optional) Specifies the mode for exporting and running the model. Choices are: `onnx`, `ov`, `all`. If not provided, it defaults to `all`.
### Example Command
```sh
-python src/main.py --topk 3 --mode=all
+python main.py --topk 3 --mode=ov
```
-This command will run predictions on the default image (`./inference/cat3.jpg`), show the top 3 predictions, and run all models (PyTorch CPU, CUDA, ONNX, OV, TRT-FP16, TRT-FP32). At the end results plot will be saved to `./inference/plot.png`
+This command will run predictions on the default image (`./inference/cat3.jpg`), show the top 3 predictions, and run OpenVINO model. Note: plot created only for `--mode=all` and results plotted and saved to `./inference/plot.png`
## RESULTS
### Inference Benchmark Results
@@ -76,6 +77,15 @@ Here is an example of the input image to run predictions and benchmarks on:
+### Example prediction results
+```
+#1: 15% Egyptian cat
+#2: 14% tiger cat
+#3: 9% tabby
+#4: 2% doormat
+#5: 2% lynx
+```
+
## Benchmark Implementation Details
Here you can see the flow for each model and benchmark.
@@ -116,62 +126,8 @@ OpenVINO is a toolkit from Intel that optimizes deep learning model inference fo
4. Perform inference on the provided image using the OpenVINO model.
5. Benchmark results, including average inference time, are logged for the OpenVINO model.
-## Used methodologies
-### TensorRT Optimization
-TensorRT is a high-performance deep learning inference optimizer and runtime library developed by NVIDIA. It is designed for optimizing and deploying trained neural network models on production environments. This project supports TensorRT optimizations in FP32 (single precision) and FP16 (half precision) modes, offering different trade-offs between inference speed and model accuracy.
-
-#### Features
-- **Performance Boost**: TensorRT can significantly accelerate the inference of neural network models, making it suitable for deployment in resource-constrained environments.
-- **Precision Modes**: Supports FP32 for maximum accuracy and FP16 for faster performance with a minor trade-off in accuracy.
-- **Layer Fusion**: TensorRT fuses layers and tensors in the neural network to reduce memory access overhead and improve execution speed.
-- **Dynamic Tensor Memory**: Efficiently handles varying batch sizes without re-optimization.
-
-#### Usage
-When running the main script, use the'- mode all' argument to employ TensorRT optimizations in the project.
-This will initiate all models, including PyTorch models, that will be compiled to the TRT model with `FP16` and `FP32` precision modes. Then, in one of the steps, we will run inference on the specified image using the TensorRT-optimized model.
-Example:
-```sh
-python src/main.py --mode all
-```
-#### Requirements
-Ensure you have the TensorRT library and the torch_tensorrt package installed in your environment. Also, for FP16 optimizations, it's recommended to have a GPU that supports half-precision arithmetic (like NVIDIA GPUs with Tensor Cores).
-
-### ONNX Exporter
-ONNX Model Exporter (`ONNXExporter`) utility is incorporated within this project to enable converting the native PyTorch model into the ONNX format.
-Using the ONNX format, inference and benchmarking can be performed with the ONNX Runtime, which offers platform-agnostic optimizations and is widely supported across numerous platforms and devices.
-
-#### Features
-- **Standardized Format**: ONNX provides an open-source format for AI models. It defines an extensible computation graph model and definitions of built-in operators and standard data types.
-- **Interoperability**: Models in ONNX format can be used across various frameworks, tools, runtimes, and compilers.
-- **Optimizations**: The ONNX Runtime provides performance optimizations for both cloud and edge devices.
-
-#### Usage
-To leverage the `ONNXExporter` and conduct inference using the ONNX Runtime, utilize the `--mode onnx` argument when executing the main script.
-This will initiate the conversion process and then run inference on the specified image using the ONNX model.
-Example:
-```sh
-python src/main.py --mode onnx
-```
-
-#### Requirements
-Ensure the ONNX library is installed in your environment to use the ONNXExporter. Additionally, if you want to run inference using the ONNX model, install the ONNX Runtime.
-
-### OV Exporter
-OpenVINO Model Exporter utility (`OVExporter`) has been integrated into this project to facilitate the conversion of the ONNX model to the OpenVINO format.
-This enables inference and benchmarking using OpenVINO, a framework optimized for Intel hardware, providing substantial speed improvements, especially on CPUs.
-
-#### Features
-- **Model Optimization**: Converts the ONNX model to OpenVINO's Intermediate Representation (IR) format. This optimized format allows for faster inference times on Intel hardware.
-- **Versatility**: OpenVINO can target various Intel hardware devices such as CPUs, integrated GPUs, FPGAs, and VPUs.
-- **Ease of Use**: The `OVExporter` seamlessly transitions from ONNX to OpenVINO, abstracting the conversion details and providing a straightforward interface.
-
-#### Usage
-To utilize `OVExporter` and perform inference using OpenVINO, use the `--mode ov` argument when running the main script.
-This will trigger the conversion process and subsequently run inference on the provided image using the optimized OpenVINO model.
-Example:
-```sh
-python src/main.py --mode ov
-```
+## Benchmarking and Visualization
+The results of the benchmarks for all modes are saved and visualized in a bar chart, showcasing the average inference times across different backends. The visualization aids in comparing the performance gains achieved with different optimizations.
#### Requirements
Ensure you have installed the OpenVINO Toolkit and the necessary dependencies to use OpenVINO's model optimizer and inference engine.
diff --git a/benchmark/__init__.py b/benchmark/__init__.py
new file mode 100644
index 0000000..e69de29
diff --git a/benchmark/benchmark_models.py b/benchmark/benchmark_models.py
new file mode 100644
index 0000000..e772360
--- /dev/null
+++ b/benchmark/benchmark_models.py
@@ -0,0 +1,20 @@
+import src.benchmark_class
+from benchmark.benchmark_utils import run_benchmark
+from src.benchmark_class import PyTorchBenchmark, ONNXBenchmark, OVBenchmark
+import openvino as ov
+import torch
+import onnxruntime as ort
+
+
+def benchmark_onnx_model(ort_session: ort.InferenceSession):
+ run_benchmark(None, None, None, ort_session, onnx=True)
+
+
+def benchmark_ov_model(ov_model: ov.CompiledModel) -> src.benchmark_class.OVBenchmark:
+ ov_benchmark = OVBenchmark(ov_model, input_shape=(1, 3, 224, 224))
+ ov_benchmark.run()
+ return ov_benchmark
+
+
+def benchmark_cuda_model(cuda_model: torch.nn.Module, device: str, dtype: torch.dtype):
+ run_benchmark(cuda_model, device, dtype)
diff --git a/benchmark/benchmark_utils.py b/benchmark/benchmark_utils.py
new file mode 100644
index 0000000..38973be
--- /dev/null
+++ b/benchmark/benchmark_utils.py
@@ -0,0 +1,125 @@
+import logging
+
+import numpy as np
+import pandas as pd
+import matplotlib.pyplot as plt
+import seaborn as sns
+from typing import Dict, Any
+import torch
+import onnxruntime as ort
+
+from src.benchmark_class import PyTorchBenchmark, ONNXBenchmark, OVBenchmark
+
+
+def run_benchmark(
+ model: torch.nn.Module,
+ device: str,
+ dtype: torch.dtype,
+ ort_session: ort.InferenceSession = None,
+ onnx: bool = False,
+) -> None:
+ """
+ Run and log the benchmark for the given model, device, and dtype.
+
+ :param onnx:
+ :param ort_session:
+ :param model: The model to be benchmarked.
+ :param device: The device to run the benchmark on ("cpu" or "cuda").
+ :param dtype: The data type to be used in the benchmark (typically torch.float32 or torch.float16).
+ """
+ if onnx:
+ logging.info(f"Running Benchmark for ONNX")
+ benchmark = ONNXBenchmark(ort_session, input_shape=(32, 3, 224, 224))
+ else:
+ logging.info(f"Running Benchmark for {device.upper()} and precision {dtype}")
+ benchmark = PyTorchBenchmark(model, device=device, dtype=dtype)
+ benchmark.run()
+
+
+def run_all_benchmarks(
+ models: Dict[str, Any], img_batch: np.ndarray
+) -> Dict[str, float]:
+ """
+ Run benchmarks for all models and return a dictionary of average inference times.
+
+ :param models: Dictionary of models. Key is model type ("onnx", "ov", "pytorch", "trt_fp32", "trt_fp16"), value is the model.
+ :param img_batch: The batch of images to run the benchmark on.
+ :return: Dictionary of average inference times. Key is model type, value is average inference time.
+ """
+ results = {}
+
+ # ONNX benchmark
+ logging.info(f"Running benchmark inference for ONNX model")
+ onnx_benchmark = ONNXBenchmark(models["onnx"], img_batch.shape)
+ avg_time_onnx = onnx_benchmark.run()
+ results["ONNX"] = avg_time_onnx
+
+ # OpenVINO benchmark
+ logging.info(f"Running benchmark inference for OpenVINO model")
+ ov_benchmark = OVBenchmark(models["ov"], img_batch.shape)
+ avg_time_ov = ov_benchmark.run()
+ results["OpenVINO"] = avg_time_ov
+
+ # PyTorch + TRT benchmark
+ configs = [
+ ("cpu", torch.float32, False),
+ ("cuda", torch.float32, False),
+ ("cuda", torch.float32, True),
+ ("cuda", torch.float16, True),
+ ]
+ for device, precision, is_trt in configs:
+ model_to_use = models[f"PyTorch_{device}"].to(device)
+
+ if not is_trt:
+ pytorch_benchmark = PyTorchBenchmark(
+ model_to_use, device=device, dtype=precision
+ )
+ logging.info(f"Running benchmark inference for PyTorch_{device} model")
+ avg_time_pytorch = pytorch_benchmark.run()
+ results[f"PyTorch_{device}"] = avg_time_pytorch
+
+ else:
+ # TensorRT benchmarks
+ if precision == torch.float32 or precision == torch.float16:
+ mode = "fp32" if precision == torch.float32 else "fp16"
+ logging.info(f"Running benchmark inference for TRT_{mode} model")
+ trt_benchmark = PyTorchBenchmark(
+ models[f"trt_{mode}"], device=device, dtype=precision
+ )
+ avg_time_trt = trt_benchmark.run()
+ results[f"TRT_{mode}"] = avg_time_trt
+
+ return results
+
+
+def plot_benchmark_results(results: Dict[str, float]):
+ """
+ Plot the benchmark results using Seaborn.
+
+ :param results: Dictionary of average inference times. Key is model type, value is average inference time.
+ """
+ # Convert dictionary to two lists for plotting
+ models = list(results.keys())
+ times = list(results.values())
+
+ # Create a DataFrame for plotting
+ data = pd.DataFrame({"Model": models, "Time": times})
+
+ # Sort the DataFrame by Time
+ data = data.sort_values("Time", ascending=True)
+
+ # Plot
+ plt.figure(figsize=(10, 6))
+ ax = sns.barplot(x=data["Time"], y=data["Model"], hue=data["Model"], palette="rocket", legend=False)
+
+ # Adding the actual values on the bars
+ for index, value in enumerate(data["Time"]):
+ ax.text(value, index, f"{value:.2f} ms", color="black", ha="left", va="center")
+
+ plt.xlabel("Average Inference Time (ms)")
+ plt.ylabel("Model Type")
+ plt.title("ResNet50 - Inference Benchmark Results")
+
+ # Save the plot to a file
+ plt.savefig("./inference/plot.png", bbox_inches="tight")
+ plt.show()
diff --git a/common/__init__.py b/common/__init__.py
new file mode 100644
index 0000000..e69de29
diff --git a/common/utils.py b/common/utils.py
new file mode 100644
index 0000000..8cadf76
--- /dev/null
+++ b/common/utils.py
@@ -0,0 +1,65 @@
+import argparse
+import openvino as ov
+import torch
+from src.model import ModelLoader
+from src.onnx_exporter import ONNXExporter
+from src.ov_exporter import OVExporter
+import onnxruntime as ort
+
+
+def export_onnx_model(
+ onnx_path: str, model_loader: ModelLoader, device: torch.device
+) -> None:
+ onnx_exporter = ONNXExporter(model_loader.model, device, onnx_path)
+ onnx_exporter.export_model()
+
+
+def init_onnx_model(
+ onnx_path: str, model_loader: ModelLoader, device: torch.device
+) -> ort.InferenceSession:
+ export_onnx_model(onnx_path=onnx_path, model_loader=model_loader, device=device)
+ return ort.InferenceSession(onnx_path, providers=["CPUExecutionProvider"])
+
+
+def init_ov_model(onnx_path: str) -> ov.CompiledModel:
+ ov_exporter = OVExporter(onnx_path)
+ return ov_exporter.export_model()
+
+
+def init_cuda_model(
+ model_loader: ModelLoader, device: torch.device, dtype: torch.dtype
+) -> torch.nn.Module:
+ cuda_model = model_loader.model.to(device)
+ if device == "cuda":
+ cuda_model = torch.jit.trace(
+ cuda_model, [torch.randn((1, 3, 224, 224)).to(device)]
+ )
+ return cuda_model
+
+
+def parse_arguments():
+ # Initialize ArgumentParser with description
+ parser = argparse.ArgumentParser(description="PyTorch Inference")
+ parser.add_argument(
+ "--image_path",
+ type=str,
+ default="./inference/cat3.jpg",
+ help="Path to the image to predict",
+ )
+ parser.add_argument(
+ "--topk", type=int, default=5, help="Number of top predictions to show"
+ )
+ parser.add_argument(
+ "--onnx_path",
+ type=str,
+ default="./inference/model.onnx",
+ help="Path where model in ONNX format will be exported",
+ )
+ parser.add_argument(
+ "--mode",
+ choices=["onnx", "ov", "cuda", "all"],
+ default="all",
+ help="Mode for exporting and running the model. Choices are: onnx, ov, cuda or all.",
+ )
+
+ return parser.parse_args()
diff --git a/main.py b/main.py
new file mode 100644
index 0000000..7e34521
--- /dev/null
+++ b/main.py
@@ -0,0 +1,103 @@
+import logging
+import os.path
+
+import torch_tensorrt
+
+from benchmark.benchmark_models import benchmark_onnx_model, benchmark_ov_model
+from benchmark.benchmark_utils import run_all_benchmarks, plot_benchmark_results
+from common.utils import (
+ parse_arguments,
+ init_onnx_model,
+ init_ov_model,
+ init_cuda_model, export_onnx_model,
+)
+from src.image_processor import ImageProcessor
+from prediction.prediction_models import *
+from src.model import ModelLoader
+
+# Configure logging
+logging.basicConfig(filename="model.log", level=logging.INFO)
+
+
+def main():
+ """
+ Main function to run inference, benchmarks, and predictions on the model
+ using provided image and optional parameters.
+ """
+ args = parse_arguments()
+
+ # Model and Image Initialization
+ models = {}
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+ model_loader = ModelLoader(device=device)
+ img_processor = ImageProcessor(img_path=args.image_path, device=device)
+ img_batch = img_processor.process_image()
+
+ # ONNX
+ if args.mode in ["onnx", "all"]:
+ ort_session = init_onnx_model(args.onnx_path, model_loader, device)
+ if args.mode != "all":
+ benchmark_onnx_model(ort_session)
+ predict_onnx_model(ort_session, img_batch, args.topk, model_loader.categories)
+
+ # OpenVINO
+ if args.mode in ["ov", "all"]:
+ # Check if ONNX model wasn't exported previously
+ if not os.path.isfile(args.onnx_path):
+ export_onnx_model(onnx_path=args.onnx_path, model_loader=model_loader, device=device)
+
+ ov_model = init_ov_model(args.onnx_path)
+ if args.mode != "all":
+ ov_benchmark = benchmark_ov_model(ov_model)
+ predict_ov_model(ov_benchmark.compiled_model, img_batch, args.topk, model_loader.categories)
+
+ # CUDA
+ if args.mode in ["cuda", "all"]:
+ # CUDA configurations
+ cuda_configs = [
+ {"device": "cpu", "precision": torch.float32, "is_trt": False},
+ {"device": "cuda", "precision": torch.float32, "is_trt": False},
+ {"device": "cuda", "precision": torch.float32, "is_trt": True},
+ {"device": "cuda", "precision": torch.float16, "is_trt": True},
+ ]
+
+ for config in cuda_configs:
+ device = config["device"]
+ precision = config["precision"]
+ is_trt = config["is_trt"]
+
+ model = init_cuda_model(model_loader, device, precision)
+
+ # If the configuration is not for TensorRT, store the model under a PyTorch key
+ if not is_trt:
+ models[f"PyTorch_{device}"] = model
+ model = model.to(device)
+ img_batch = img_batch.to(device)
+ else:
+ print("Compiling TensorRT model")
+ model = torch_tensorrt.compile(
+ model,
+ inputs=[torch_tensorrt.Input((32, 3, 224, 224), dtype=precision)],
+ enabled_precisions={precision},
+ truncate_long_and_double=True,
+ )
+ # If it is for TensorRT, determine the mode (FP32 or FP16) and store under a TensorRT key
+ mode = "fp32" if precision == torch.float32 else "fp16"
+ models[f"trt_{mode}"] = model
+
+ if args.mode != "all":
+ predict_cuda_model(
+ model, img_batch, args.topk, model_loader.categories, precision
+ )
+
+ # Aggregate Benchmark (if mode is "all")
+ if args.mode == "all":
+ models["onnx"] = ort_session
+ models["ov"] = ov_model
+
+ results = run_all_benchmarks(models, img_batch)
+ plot_benchmark_results(results)
+
+
+if __name__ == "__main__":
+ main()
diff --git a/prediction/__init__.py b/prediction/__init__.py
new file mode 100644
index 0000000..e69de29
diff --git a/prediction/prediction_models.py b/prediction/prediction_models.py
new file mode 100644
index 0000000..aaaf230
--- /dev/null
+++ b/prediction/prediction_models.py
@@ -0,0 +1,32 @@
+import onnxruntime as ort
+import openvino as ov
+import numpy as np
+import torch
+from typing import List
+from prediction.prediction_utils import make_prediction
+
+
+# Prediction Functions
+def predict_onnx_model(
+ ort_session: ort.InferenceSession,
+ img_batch: np.ndarray,
+ topk: int,
+ categories: List[str],
+):
+ make_prediction(ort_session, img_batch.cpu().numpy(), topk, categories)
+
+
+def predict_ov_model(
+ ov_model: ov.CompiledModel, img_batch: np.ndarray, topk: int, categories: List[str]
+):
+ make_prediction(ov_model, img_batch.cpu().numpy(), topk, categories)
+
+
+def predict_cuda_model(
+ cuda_model: torch.nn.Module,
+ img_batch: torch.Tensor,
+ topk: int,
+ categories: List[str],
+ precision: torch.dtype,
+):
+ make_prediction(cuda_model, img_batch, topk, categories, precision)
diff --git a/prediction/prediction_utils.py b/prediction/prediction_utils.py
new file mode 100644
index 0000000..a9ea429
--- /dev/null
+++ b/prediction/prediction_utils.py
@@ -0,0 +1,82 @@
+import logging
+from typing import List, Tuple, Union, Dict, Any
+import openvino as ov
+import torch
+import onnxruntime as ort
+import numpy as np
+
+
+def make_prediction(
+ model: Union[torch.nn.Module, ort.InferenceSession, ov.CompiledModel],
+ img_batch: Union[torch.Tensor, np.ndarray],
+ topk: int,
+ categories: List[str],
+ precision: torch.dtype = None,
+) -> None:
+ """
+ Make and print predictions for the given model, img_batch, topk, and categories.
+
+ :param model: The model (or ONNX Runtime InferenceSession) to make predictions with.
+ :param img_batch: The batch of images to make predictions on.
+ :param topk: The number of top predictions to show.
+ :param categories: The list of categories to label the predictions.
+ :param precision: The data type to be used for the predictions (typically torch.float32 or torch.float16) for PyTorch models.
+ """
+ is_onnx_model = isinstance(model, ort.InferenceSession)
+ is_ov_model = isinstance(model, ov.CompiledModel)
+
+ if is_onnx_model:
+ logging.info(f"Running prediction for ONNX model")
+ # Get the input name for the ONNX model.
+ input_name = model.get_inputs()[0].name
+
+ # Run the model with the properly named input.
+ ort_inputs = {input_name: img_batch}
+ ort_outs = model.run(None, ort_inputs)
+
+ # Assuming the model returns a list with one array of class probabilities.
+ if len(ort_outs) > 0:
+ prob = ort_outs[0]
+
+ # Checking if prob has more than one dimension and selecting the right one.
+ if prob.ndim > 1:
+ prob = prob[0]
+
+ # Apply Softmax to get probabilities
+ prob = np.exp(prob) / np.sum(np.exp(prob))
+ elif is_ov_model:
+ logging.info(f"Running prediction for OV model")
+ # For OV, the input name is usually the first input
+ input_name = next(iter(model.inputs))
+ outputs = model(inputs={input_name: img_batch})
+
+ # Assuming the model returns a dictionary with one key for class probabilities
+ prob_key = next(iter(outputs))
+ prob = outputs[prob_key]
+
+ # Apply Softmax to get probabilities
+ prob = np.exp(prob[0]) / np.sum(np.exp(prob[0]))
+
+ else: # PyTorch Model
+ logging.info(f"Running prediction for PyTorch model")
+ if isinstance(img_batch, np.ndarray):
+ img_batch = torch.tensor(img_batch)
+ else:
+ img_batch = img_batch.clone().to(precision)
+ model.eval()
+ with torch.no_grad():
+ outputs = model(img_batch.to(precision))
+ prob = torch.nn.functional.softmax(outputs[0], dim=0)
+ prob = prob.cpu().numpy()
+
+ top_indices = prob.argsort()[-topk:][::-1]
+ top_probs = prob[top_indices]
+
+ for i in range(topk):
+ probability = top_probs[i]
+ if is_onnx_model:
+ # Accessing the DataFrame by row number using .iloc[]
+ class_label = categories.iloc[top_indices[i]].item()
+ else:
+ class_label = categories[0][int(top_indices[i])]
+ logging.info(f"#{i + 1}: {int(probability * 100)}% {class_label}")
diff --git a/src/benchmark.py b/src/benchmark_class.py
similarity index 90%
rename from src/benchmark.py
rename to src/benchmark_class.py
index cf24764..47e8069 100644
--- a/src/benchmark.py
+++ b/src/benchmark_class.py
@@ -90,9 +90,6 @@ def run(self):
f"Iteration {i}/{self.nruns}, ave batch time {np.mean(timings) * 1000:.2f} ms"
)
- # Print and log results
- print(f"Input shape: {input_data.size()}")
- print(f"Output features size: {features.size()}")
logging.info(f"Average batch time: {np.mean(timings) * 1000:.2f} ms")
return np.mean(timings) * 1000
@@ -115,7 +112,6 @@ def __init__(
self.nwarmup = nwarmup
self.nruns = nruns
-
def run(self):
print("Warming up ...")
# Adjusting the batch size in the input shape to match the expected input size of the model.
@@ -128,12 +124,17 @@ def run(self):
print("Starting benchmark ...")
timings = []
- for _ in range(self.nruns):
+ for i in range(1, self.nruns+1):
start_time = time.time()
_ = self.ort_session.run(None, {"input": input_data})
end_time = time.time()
timings.append(end_time - start_time)
+ if i % 10 == 0:
+ print(
+ f"Iteration {i}/{self.nruns}, ave batch time {np.mean(timings) * 1000:.2f} ms"
+ )
+
avg_time = np.mean(timings) * 1000
logging.info(f"Average ONNX inference time: {avg_time:.2f} ms")
return avg_time
@@ -155,8 +156,8 @@ def __init__(
self.core = ov.Core()
self.compiled_model = None
self.input_shape = input_shape
- self.warmup_runs = 50
- self.num_runs = 100
+ self.nwarmup = 50
+ self.nruns = 100
self.dummy_input = np.random.randn(*input_shape).astype(np.float32)
def warmup(self):
@@ -184,16 +185,21 @@ def run(self):
"""
# Warm-up runs
logging.info("Warming up ...")
- for _ in range(self.warmup_runs):
+ for _ in range(self.nwarmup):
self.warmup()
# Benchmarking
total_time = 0
- for _ in range(self.num_runs):
+ for i in range(1, self.nruns+1):
start_time = time.time()
_ = self.inference(self.dummy_input)
total_time += time.time() - start_time
- avg_time = total_time / self.num_runs
+ if i % 10 == 0:
+ print(
+ f"Iteration {i}/{self.nruns}, ave batch time {total_time / self.nruns * 1000:.2f} ms"
+ )
+
+ avg_time = total_time / self.nruns
logging.info(f"Average inference time: {avg_time * 1000:.2f} ms")
- return avg_time * 1000
\ No newline at end of file
+ return avg_time * 1000
diff --git a/src/main.py b/src/main.py
deleted file mode 100644
index aed87bf..0000000
--- a/src/main.py
+++ /dev/null
@@ -1,346 +0,0 @@
-import argparse
-import os
-import logging
-import pandas as pd
-import openvino as ov
-import torch
-import torch_tensorrt
-from typing import List, Tuple, Union, Dict, Any
-import onnxruntime as ort
-import numpy as np
-import seaborn as sns
-import matplotlib.pyplot as plt
-
-from model import ModelLoader
-from image_processor import ImageProcessor
-from benchmark import PyTorchBenchmark, ONNXBenchmark, OVBenchmark
-from onnx_exporter import ONNXExporter
-from ov_exporter import OVExporter
-
-# Configure logging
-logging.basicConfig(filename="model.log", level=logging.INFO)
-
-
-def run_benchmark(
- model: torch.nn.Module,
- device: str,
- dtype: torch.dtype,
- ort_session: ort.InferenceSession = None,
- onnx: bool = False,
-) -> None:
- """
- Run and log the benchmark for the given model, device, and dtype.
-
- :param onnx:
- :param ort_session:
- :param model: The model to be benchmarked.
- :param device: The device to run the benchmark on ("cpu" or "cuda").
- :param dtype: The data type to be used in the benchmark (typically torch.float32 or torch.float16).
- """
- if onnx:
- logging.info(f"Running Benchmark for ONNX")
- benchmark = ONNXBenchmark(ort_session, input_shape=(32, 3, 224, 224))
- else:
- logging.info(f"Running Benchmark for {device.upper()}")
- benchmark = PyTorchBenchmark(model, device=device, dtype=dtype)
- benchmark.run()
-
-
-def make_prediction(
- model: Union[torch.nn.Module, ort.InferenceSession, ov.CompiledModel],
- img_batch: Union[torch.Tensor, np.ndarray],
- topk: int,
- categories: List[str],
- precision: torch.dtype = None,
-) -> None:
- """
- Make and print predictions for the given model, img_batch, topk, and categories.
-
- :param model: The model (or ONNX Runtime InferenceSession) to make predictions with.
- :param img_batch: The batch of images to make predictions on.
- :param topk: The number of top predictions to show.
- :param categories: The list of categories to label the predictions.
- :param precision: The data type to be used for the predictions (typically torch.float32 or torch.float16) for PyTorch models.
- """
- is_onnx_model = isinstance(model, ort.InferenceSession)
- is_ov_model = isinstance(model, ov.CompiledModel)
-
- if is_onnx_model:
- # Get the input name for the ONNX model.
- input_name = model.get_inputs()[0].name
-
- # Run the model with the properly named input.
- ort_inputs = {input_name: img_batch}
- ort_outs = model.run(None, ort_inputs)
-
- # Assuming the model returns a list with one array of class probabilities.
- if len(ort_outs) > 0:
- prob = ort_outs[0]
-
- # Checking if prob has more than one dimension and selecting the right one.
- if prob.ndim > 1:
- prob = prob[0]
-
- # Apply Softmax to get probabilities
- prob = np.exp(prob) / np.sum(np.exp(prob))
- elif is_ov_model:
- # For OV, the input name is usually the first input
- input_name = next(iter(model.inputs))
- outputs = model(inputs={input_name: img_batch})
-
- # Assuming the model returns a dictionary with one key for class probabilities
- prob_key = next(iter(outputs))
- prob = outputs[prob_key]
-
- # Apply Softmax to get probabilities
- prob = np.exp(prob[0]) / np.sum(np.exp(prob[0]))
-
- else: # PyTorch Model
- if isinstance(img_batch, np.ndarray):
- img_batch = torch.tensor(img_batch)
- else:
- img_batch = img_batch.clone().to(precision)
- model.eval()
- with torch.no_grad():
- outputs = model(img_batch.to(precision))
- prob = torch.nn.functional.softmax(outputs[0], dim=0)
- prob = prob.cpu().numpy()
-
- top_indices = prob.argsort()[-topk:][::-1]
- top_probs = prob[top_indices]
-
- for i in range(topk):
- probability = top_probs[i]
- if is_onnx_model:
- # Accessing the DataFrame by row number using .iloc[]
- class_label = categories.iloc[top_indices[i]].item()
- else:
- class_label = categories[0][int(top_indices[i])]
- logging.info(f"#{i + 1}: {int(probability * 100)}% {class_label}")
-
-
-def run_all_benchmarks(
- models: Dict[str, Any], img_batch: np.ndarray
-) -> Dict[str, float]:
- """
- Run benchmarks for all models and return a dictionary of average inference times.
-
- :param models: Dictionary of models. Key is model type ("onnx", "ov", "pytorch", "trt_fp32", "trt_fp16"), value is the model.
- :param img_batch: The batch of images to run the benchmark on.
- :return: Dictionary of average inference times. Key is model type, value is average inference time.
- """
- results = {}
-
- # ONNX benchmark
- onnx_benchmark = ONNXBenchmark(models["onnx"], img_batch.shape)
- avg_time_onnx = onnx_benchmark.run()
- results["ONNX"] = avg_time_onnx
-
- # OpenVINO benchmark
- ov_benchmark = OVBenchmark(models["ov"], img_batch.shape)
- avg_time_ov = ov_benchmark.run()
- results["OpenVINO"] = avg_time_ov
-
- # PyTorch + TRT benchmark
- configs = [
- ("cpu", torch.float32, False),
- ("cuda", torch.float32, False),
- ("cuda", torch.float32, True),
- ("cuda", torch.float16, True),
- ]
- for device, precision, is_trt in configs:
- model_to_use = models["pytorch"].to(device)
-
- if not is_trt:
- pytorch_benchmark = PyTorchBenchmark(
- model_to_use, device=device, dtype=precision
- )
- avg_time_pytorch = pytorch_benchmark.run()
- results[f"PyTorch_{device}"] = avg_time_pytorch
-
- else:
- # TensorRT benchmarks
- if precision == torch.float32 or precision == torch.float16:
- mode = "fp32" if precision == torch.float32 else "fp16"
- trt_benchmark = PyTorchBenchmark(
- models[f"trt_{mode}"], device=device, dtype=precision
- )
- avg_time_trt = trt_benchmark.run()
- results[f"TRT_{mode}"] = avg_time_trt
-
- return results
-
-
-def plot_benchmark_results(results: Dict[str, float]):
- """
- Plot the benchmark results using Seaborn.
-
- :param results: Dictionary of average inference times. Key is model type, value is average inference time.
- """
- # Convert dictionary to two lists for plotting
- models = list(results.keys())
- times = list(results.values())
-
- # Create a DataFrame for plotting
- data = pd.DataFrame({"Model": models, "Time": times})
-
- # Sort the DataFrame by Time
- data = data.sort_values("Time", ascending=True)
-
- # Plot
- plt.figure(figsize=(10, 6))
- ax = sns.barplot(x=data["Time"], y=data["Model"], palette="rocket")
-
- # Adding the actual values on the bars
- for index, value in enumerate(data["Time"]):
- ax.text(value, index, f"{value:.2f} ms", color="black", ha="left", va="center")
-
- plt.xlabel("Average Inference Time (ms)")
- plt.ylabel("Model Type")
- plt.title("ResNet50 - Inference Benchmark Results")
-
- # Save the plot to a file
- plt.savefig("./inference/plot.png", bbox_inches="tight")
- plt.show()
-
-
-def main() -> None:
- """
- Main function to run inference, benchmarks, and predictions on the model
- using provided image and optional parameters.
- """
- # Initialize ArgumentParser with description
- parser = argparse.ArgumentParser(description="PyTorch Inference")
- parser.add_argument(
- "--image_path",
- type=str,
- default="./inference/cat3.jpg",
- help="Path to the image to predict",
- )
- parser.add_argument(
- "--topk", type=int, default=5, help="Number of top predictions to show"
- )
- parser.add_argument(
- "--onnx_path",
- type=str,
- default="./inference/model.onnx",
- help="Path where model in ONNX format will be exported",
- )
- parser.add_argument(
- "--mode",
- choices=["onnx", "ov", "cuda", "all"],
- required=True,
- help="Mode for exporting and running the model. Choices are: onnx, ov, cuda or all.",
- )
-
- args = parser.parse_args()
-
- models = {}
-
- # Setup device
- device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
-
- # Initialize model and image processor
- model_loader = ModelLoader(device=device)
- img_processor = ImageProcessor(img_path=args.image_path, device=device)
- img_batch = img_processor.process_image()
-
- if args.mode == "onnx" or args.mode == "all":
- onnx_path = args.onnx_path
-
- # Export the model to ONNX format using ONNXExporter
- onnx_exporter = ONNXExporter(model_loader.model, device, onnx_path)
- onnx_exporter.export_model()
-
- # Create ONNX Runtime session
- ort_session = ort.InferenceSession(
- onnx_path, providers=["CPUExecutionProvider"]
- )
-
- models["onnx"] = ort_session
-
- # Run benchmark
- # run_benchmark(None, None, None, ort_session, onnx=True)
-
- # Make prediction
- print(f"Making prediction with {ort.get_device()} for ONNX model")
- make_prediction(
- ort_session,
- img_batch.cpu().numpy(),
- topk=args.topk,
- categories=model_loader.categories,
- )
- if args.mode == "ov" or args.mode == "all":
- # Export the ONNX model to OpenVINO
- ov_exporter = OVExporter(args.onnx_path)
- ov_model = ov_exporter.export_model()
-
- models["ov"] = ov_model
-
- # Benchmark the OpenVINO model
- ov_benchmark = OVBenchmark(ov_model, input_shape=(1, 3, 224, 224))
- ov_benchmark.run()
-
- # Run inference using the OpenVINO model
- img_batch_ov = (
- img_processor.process_image().cpu().numpy()
- ) # Assuming batch size of 1
- print(f"Making prediction with OpenVINO model")
- make_prediction(
- ov_benchmark.compiled_model,
- img_batch_ov,
- topk=args.topk,
- categories=model_loader.categories,
- )
- if args.mode == "cuda" or args.mode == "all":
- # Define configurations for which to run benchmarks and make predictions
- configs = [
- ("cpu", torch.float32),
- ("cuda", torch.float32),
- ("cuda", torch.float16),
- ]
-
- for device, precision in configs:
- model_to_use = model_loader.model.to(device)
- models["pytorch"] = model_loader.model
-
- if device == "cuda":
- print(f"Tracing {device} model")
- model_to_use = torch.jit.trace(
- model_to_use, [torch.randn((1, 3, 224, 224)).to(device)]
- )
-
- if precision == torch.float32 or precision == torch.float16:
- print("Compiling TensorRT model")
- model_to_use = torch_tensorrt.compile(
- model_to_use,
- inputs=[torch_tensorrt.Input((32, 3, 224, 224), dtype=precision)],
- enabled_precisions={precision},
- truncate_long_and_double=True,
- )
- if precision == torch.float32:
- models["trt_fp32"] = model_to_use
- else:
- models["trt_fp16"] = model_to_use
-
- """print(f"Making prediction with {device} model in {precision} precision")
- make_prediction(
- model_to_use,
- img_batch.to(device),
- args.topk,
- model_loader.categories,
- precision,
- )
-
- print(f"Running Benchmark for {device} model in {precision} precision")
- run_benchmark(model_to_use, device, precision) """
- if args.mode == "all":
- # Run all benchmarks
- results = run_all_benchmarks(models, img_batch)
-
- # Plot results
- plot_benchmark_results(results)
-
-
-if __name__ == "__main__":
- main()