Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
65b3db3
Refactor main function and modularize codebase
DimaBir Oct 6, 2023
a14095c
Fixed imports
DimaBir Oct 6, 2023
86540f9
Fixed imports and refactored
DimaBir Oct 6, 2023
4f5b75d
Fixed imports and refactored
DimaBir Oct 6, 2023
bc67224
Fixed imports and refactored
DimaBir Oct 6, 2023
0f59440
Fixed imports and refactored
DimaBir Oct 6, 2023
a70efcd
Fixed imports and refactored
DimaBir Oct 6, 2023
79bb367
Fixed imports and refactored
DimaBir Oct 6, 2023
687791d
Fixed imports and refactored
DimaBir Oct 6, 2023
fb27d3a
Fixed imports and refactored
DimaBir Oct 6, 2023
7dfa3da
Fixed imports and refactored
DimaBir Oct 6, 2023
f13f672
Fixed imports and refactored
DimaBir Oct 6, 2023
2d01ce5
Fixed imports and refactored
DimaBir Oct 6, 2023
555bf6d
Fixed imports and refactored
DimaBir Oct 6, 2023
6bf5cb2
Fixed imports and refactored
DimaBir Oct 6, 2023
525278a
Fixed imports and refactored
DimaBir Oct 6, 2023
b523c53
Fixed imports and refactored
DimaBir Oct 6, 2023
81fa6e4
Fixed imports and refactored
DimaBir Oct 6, 2023
85486a5
Fixed imports and refactored
DimaBir Oct 6, 2023
b734a87
Fixed imports and refactored
DimaBir Oct 6, 2023
9d76254
Fixed imports and refactored
DimaBir Oct 6, 2023
071ed43
Fixed imports and refactored
DimaBir Oct 6, 2023
4fea89e
Fixed imports and refactored
DimaBir Oct 6, 2023
190c13b
Fixed imports and refactored
DimaBir Oct 6, 2023
5991cdd
Fixed imports and refactored
DimaBir Oct 6, 2023
bf4e322
Fixed imports and refactored
DimaBir Oct 6, 2023
318f005
Fixed imports and refactored
DimaBir Oct 6, 2023
9da8063
Fixed imports and refactored
DimaBir Oct 6, 2023
e0b7018
Fixed imports and refactored
DimaBir Oct 6, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
86 changes: 21 additions & 65 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,20 +6,21 @@
2. [Requirements](#requirements)
- [Steps to Run](#steps-to-run)
- [Example Command](#example-command)
5. [RESULTS](#results) ![Static Badge](https://img.shields.io/badge/update-orange)
3. [RESULTS](#results) ![Static Badge](https://img.shields.io/badge/update-orange)
- [Results explanation](#results-explanation)
- [Example Input](#example-input)
6. [Benchmark Implementation Details](#benchmark-implementation-details) ![New](https://img.shields.io/badge/-New-842E5B)
- [Example prediction results](#example-prediction-results)
4. [Benchmark Implementation Details](#benchmark-implementation-details) ![New](https://img.shields.io/badge/-New-842E5B)
- [PyTorch CPU & CUDA](#pytorch-cpu--cuda)
- [TensorRT FP32 & FP16](#tensorrt-fp32--fp16)
- [ONNX](#onnx)
- [OpenVINO](#openvino)
7. [Used methodologies](#used-methodologies) ![New](https://img.shields.io/badge/-New-96E5FE)
5. [Benchmarking and Visualization](#benchmarking-and-visualization) ![New](https://img.shields.io/badge/-New-96E5FE)
- [TensorRT Optimization](#tensorrt-optimization)
- [ONNX Exporter](#onnx-exporter)
- [OV Exporter](#ov-exporter)
10. [Author](#author)
11. [References](#references)
6. [Author](#author)
7. [References](#references)


<img src="./inference/plot.png" width="100%">
Expand All @@ -44,20 +45,20 @@ docker build -t awesome-tensorrt
docker run --gpus all --rm -it awesome-tensorrt

# 3. Run the Script inside the Container
python src/main.py
python main.py [--mode all]
```

### Arguments
- `--image_path`: (Optional) Specifies the path to the image you want to predict.
- `--topk`: (Optional) Specifies the number of top predictions to show. Defaults to 5 if not provided.
- `--mode`: Specifies the mode for exporting and running the model. Choices are: `onnx`, `ov`, `all`.
- `--mode`: (Optional) Specifies the mode for exporting and running the model. Choices are: `onnx`, `ov`, `all`. If not provided, it defaults to `all`.

### Example Command
```sh
python src/main.py --topk 3 --mode=all
python main.py --topk 3 --mode=ov
```

This command will run predictions on the default image (`./inference/cat3.jpg`), show the top 3 predictions, and run all models (PyTorch CPU, CUDA, ONNX, OV, TRT-FP16, TRT-FP32). At the end results plot will be saved to `./inference/plot.png`
This command will run predictions on the default image (`./inference/cat3.jpg`), show the top 3 predictions, and run OpenVINO model. Note: plot created only for `--mode=all` and results plotted and saved to `./inference/plot.png`

## RESULTS
### Inference Benchmark Results
Expand All @@ -76,6 +77,15 @@ Here is an example of the input image to run predictions and benchmarks on:

<img src="./inference/cat3.jpg" width="20%">

### Example prediction results
```
#1: 15% Egyptian cat
#2: 14% tiger cat
#3: 9% tabby
#4: 2% doormat
#5: 2% lynx
```

## Benchmark Implementation Details
Here you can see the flow for each model and benchmark.

Expand Down Expand Up @@ -116,62 +126,8 @@ OpenVINO is a toolkit from Intel that optimizes deep learning model inference fo
4. Perform inference on the provided image using the OpenVINO model.
5. Benchmark results, including average inference time, are logged for the OpenVINO model.

## Used methodologies
### TensorRT Optimization
TensorRT is a high-performance deep learning inference optimizer and runtime library developed by NVIDIA. It is designed for optimizing and deploying trained neural network models on production environments. This project supports TensorRT optimizations in FP32 (single precision) and FP16 (half precision) modes, offering different trade-offs between inference speed and model accuracy.

#### Features
- **Performance Boost**: TensorRT can significantly accelerate the inference of neural network models, making it suitable for deployment in resource-constrained environments.
- **Precision Modes**: Supports FP32 for maximum accuracy and FP16 for faster performance with a minor trade-off in accuracy.
- **Layer Fusion**: TensorRT fuses layers and tensors in the neural network to reduce memory access overhead and improve execution speed.
- **Dynamic Tensor Memory**: Efficiently handles varying batch sizes without re-optimization.

#### Usage
When running the main script, use the'- mode all' argument to employ TensorRT optimizations in the project.
This will initiate all models, including PyTorch models, that will be compiled to the TRT model with `FP16` and `FP32` precision modes. Then, in one of the steps, we will run inference on the specified image using the TensorRT-optimized model.
Example:
```sh
python src/main.py --mode all
```
#### Requirements
Ensure you have the TensorRT library and the torch_tensorrt package installed in your environment. Also, for FP16 optimizations, it's recommended to have a GPU that supports half-precision arithmetic (like NVIDIA GPUs with Tensor Cores).

### ONNX Exporter
ONNX Model Exporter (`ONNXExporter`) utility is incorporated within this project to enable converting the native PyTorch model into the ONNX format.
Using the ONNX format, inference and benchmarking can be performed with the ONNX Runtime, which offers platform-agnostic optimizations and is widely supported across numerous platforms and devices.

#### Features
- **Standardized Format**: ONNX provides an open-source format for AI models. It defines an extensible computation graph model and definitions of built-in operators and standard data types.
- **Interoperability**: Models in ONNX format can be used across various frameworks, tools, runtimes, and compilers.
- **Optimizations**: The ONNX Runtime provides performance optimizations for both cloud and edge devices.

#### Usage
To leverage the `ONNXExporter` and conduct inference using the ONNX Runtime, utilize the `--mode onnx` argument when executing the main script.
This will initiate the conversion process and then run inference on the specified image using the ONNX model.
Example:
```sh
python src/main.py --mode onnx
```

#### Requirements
Ensure the ONNX library is installed in your environment to use the ONNXExporter. Additionally, if you want to run inference using the ONNX model, install the ONNX Runtime.

### OV Exporter
OpenVINO Model Exporter utility (`OVExporter`) has been integrated into this project to facilitate the conversion of the ONNX model to the OpenVINO format.
This enables inference and benchmarking using OpenVINO, a framework optimized for Intel hardware, providing substantial speed improvements, especially on CPUs.

#### Features
- **Model Optimization**: Converts the ONNX model to OpenVINO's Intermediate Representation (IR) format. This optimized format allows for faster inference times on Intel hardware.
- **Versatility**: OpenVINO can target various Intel hardware devices such as CPUs, integrated GPUs, FPGAs, and VPUs.
- **Ease of Use**: The `OVExporter` seamlessly transitions from ONNX to OpenVINO, abstracting the conversion details and providing a straightforward interface.

#### Usage
To utilize `OVExporter` and perform inference using OpenVINO, use the `--mode ov` argument when running the main script.
This will trigger the conversion process and subsequently run inference on the provided image using the optimized OpenVINO model.
Example:
```sh
python src/main.py --mode ov
```
## Benchmarking and Visualization
The results of the benchmarks for all modes are saved and visualized in a bar chart, showcasing the average inference times across different backends. The visualization aids in comparing the performance gains achieved with different optimizations.

#### Requirements
Ensure you have installed the OpenVINO Toolkit and the necessary dependencies to use OpenVINO's model optimizer and inference engine.
Expand Down
Empty file added benchmark/__init__.py
Empty file.
20 changes: 20 additions & 0 deletions benchmark/benchmark_models.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
import src.benchmark_class
from benchmark.benchmark_utils import run_benchmark
from src.benchmark_class import PyTorchBenchmark, ONNXBenchmark, OVBenchmark
import openvino as ov
import torch
import onnxruntime as ort


def benchmark_onnx_model(ort_session: ort.InferenceSession):
run_benchmark(None, None, None, ort_session, onnx=True)


def benchmark_ov_model(ov_model: ov.CompiledModel) -> src.benchmark_class.OVBenchmark:
ov_benchmark = OVBenchmark(ov_model, input_shape=(1, 3, 224, 224))
ov_benchmark.run()
return ov_benchmark


def benchmark_cuda_model(cuda_model: torch.nn.Module, device: str, dtype: torch.dtype):
run_benchmark(cuda_model, device, dtype)
125 changes: 125 additions & 0 deletions benchmark/benchmark_utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
import logging

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from typing import Dict, Any
import torch
import onnxruntime as ort

from src.benchmark_class import PyTorchBenchmark, ONNXBenchmark, OVBenchmark


def run_benchmark(
model: torch.nn.Module,
device: str,
dtype: torch.dtype,
ort_session: ort.InferenceSession = None,
onnx: bool = False,
) -> None:
"""
Run and log the benchmark for the given model, device, and dtype.

:param onnx:
:param ort_session:
:param model: The model to be benchmarked.
:param device: The device to run the benchmark on ("cpu" or "cuda").
:param dtype: The data type to be used in the benchmark (typically torch.float32 or torch.float16).
"""
if onnx:
logging.info(f"Running Benchmark for ONNX")
benchmark = ONNXBenchmark(ort_session, input_shape=(32, 3, 224, 224))
else:
logging.info(f"Running Benchmark for {device.upper()} and precision {dtype}")
benchmark = PyTorchBenchmark(model, device=device, dtype=dtype)
benchmark.run()


def run_all_benchmarks(
models: Dict[str, Any], img_batch: np.ndarray
) -> Dict[str, float]:
"""
Run benchmarks for all models and return a dictionary of average inference times.

:param models: Dictionary of models. Key is model type ("onnx", "ov", "pytorch", "trt_fp32", "trt_fp16"), value is the model.
:param img_batch: The batch of images to run the benchmark on.
:return: Dictionary of average inference times. Key is model type, value is average inference time.
"""
results = {}

# ONNX benchmark
logging.info(f"Running benchmark inference for ONNX model")
onnx_benchmark = ONNXBenchmark(models["onnx"], img_batch.shape)
avg_time_onnx = onnx_benchmark.run()
results["ONNX"] = avg_time_onnx

# OpenVINO benchmark
logging.info(f"Running benchmark inference for OpenVINO model")
ov_benchmark = OVBenchmark(models["ov"], img_batch.shape)
avg_time_ov = ov_benchmark.run()
results["OpenVINO"] = avg_time_ov

# PyTorch + TRT benchmark
configs = [
("cpu", torch.float32, False),
("cuda", torch.float32, False),
("cuda", torch.float32, True),
("cuda", torch.float16, True),
]
for device, precision, is_trt in configs:
model_to_use = models[f"PyTorch_{device}"].to(device)

if not is_trt:
pytorch_benchmark = PyTorchBenchmark(
model_to_use, device=device, dtype=precision
)
logging.info(f"Running benchmark inference for PyTorch_{device} model")
avg_time_pytorch = pytorch_benchmark.run()
results[f"PyTorch_{device}"] = avg_time_pytorch

else:
# TensorRT benchmarks
if precision == torch.float32 or precision == torch.float16:
mode = "fp32" if precision == torch.float32 else "fp16"
logging.info(f"Running benchmark inference for TRT_{mode} model")
trt_benchmark = PyTorchBenchmark(
models[f"trt_{mode}"], device=device, dtype=precision
)
avg_time_trt = trt_benchmark.run()
results[f"TRT_{mode}"] = avg_time_trt

return results


def plot_benchmark_results(results: Dict[str, float]):
"""
Plot the benchmark results using Seaborn.

:param results: Dictionary of average inference times. Key is model type, value is average inference time.
"""
# Convert dictionary to two lists for plotting
models = list(results.keys())
times = list(results.values())

# Create a DataFrame for plotting
data = pd.DataFrame({"Model": models, "Time": times})

# Sort the DataFrame by Time
data = data.sort_values("Time", ascending=True)

# Plot
plt.figure(figsize=(10, 6))
ax = sns.barplot(x=data["Time"], y=data["Model"], hue=data["Model"], palette="rocket", legend=False)

# Adding the actual values on the bars
for index, value in enumerate(data["Time"]):
ax.text(value, index, f"{value:.2f} ms", color="black", ha="left", va="center")

plt.xlabel("Average Inference Time (ms)")
plt.ylabel("Model Type")
plt.title("ResNet50 - Inference Benchmark Results")

# Save the plot to a file
plt.savefig("./inference/plot.png", bbox_inches="tight")
plt.show()
Empty file added common/__init__.py
Empty file.
65 changes: 65 additions & 0 deletions common/utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
import argparse
import openvino as ov
import torch
from src.model import ModelLoader
from src.onnx_exporter import ONNXExporter
from src.ov_exporter import OVExporter
import onnxruntime as ort


def export_onnx_model(
onnx_path: str, model_loader: ModelLoader, device: torch.device
) -> None:
onnx_exporter = ONNXExporter(model_loader.model, device, onnx_path)
onnx_exporter.export_model()


def init_onnx_model(
onnx_path: str, model_loader: ModelLoader, device: torch.device
) -> ort.InferenceSession:
export_onnx_model(onnx_path=onnx_path, model_loader=model_loader, device=device)
return ort.InferenceSession(onnx_path, providers=["CPUExecutionProvider"])


def init_ov_model(onnx_path: str) -> ov.CompiledModel:
ov_exporter = OVExporter(onnx_path)
return ov_exporter.export_model()


def init_cuda_model(
model_loader: ModelLoader, device: torch.device, dtype: torch.dtype
) -> torch.nn.Module:
cuda_model = model_loader.model.to(device)
if device == "cuda":
cuda_model = torch.jit.trace(
cuda_model, [torch.randn((1, 3, 224, 224)).to(device)]
)
return cuda_model


def parse_arguments():
# Initialize ArgumentParser with description
parser = argparse.ArgumentParser(description="PyTorch Inference")
parser.add_argument(
"--image_path",
type=str,
default="./inference/cat3.jpg",
help="Path to the image to predict",
)
parser.add_argument(
"--topk", type=int, default=5, help="Number of top predictions to show"
)
parser.add_argument(
"--onnx_path",
type=str,
default="./inference/model.onnx",
help="Path where model in ONNX format will be exported",
)
parser.add_argument(
"--mode",
choices=["onnx", "ov", "cuda", "all"],
default="all",
help="Mode for exporting and running the model. Choices are: onnx, ov, cuda or all.",
)

return parser.parse_args()
Loading