Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
55 commits
Select commit Hold shift + click to select a range
65b3db3
Refactor main function and modularize codebase
DimaBir Oct 6, 2023
a14095c
Fixed imports
DimaBir Oct 6, 2023
86540f9
Fixed imports and refactored
DimaBir Oct 6, 2023
4f5b75d
Fixed imports and refactored
DimaBir Oct 6, 2023
bc67224
Fixed imports and refactored
DimaBir Oct 6, 2023
0f59440
Fixed imports and refactored
DimaBir Oct 6, 2023
a70efcd
Fixed imports and refactored
DimaBir Oct 6, 2023
79bb367
Fixed imports and refactored
DimaBir Oct 6, 2023
687791d
Fixed imports and refactored
DimaBir Oct 6, 2023
fb27d3a
Fixed imports and refactored
DimaBir Oct 6, 2023
7dfa3da
Fixed imports and refactored
DimaBir Oct 6, 2023
f13f672
Fixed imports and refactored
DimaBir Oct 6, 2023
2d01ce5
Fixed imports and refactored
DimaBir Oct 6, 2023
555bf6d
Fixed imports and refactored
DimaBir Oct 6, 2023
6bf5cb2
Fixed imports and refactored
DimaBir Oct 6, 2023
525278a
Fixed imports and refactored
DimaBir Oct 6, 2023
b523c53
Fixed imports and refactored
DimaBir Oct 6, 2023
81fa6e4
Fixed imports and refactored
DimaBir Oct 6, 2023
85486a5
Fixed imports and refactored
DimaBir Oct 6, 2023
b734a87
Fixed imports and refactored
DimaBir Oct 6, 2023
9d76254
Fixed imports and refactored
DimaBir Oct 6, 2023
071ed43
Fixed imports and refactored
DimaBir Oct 6, 2023
4fea89e
Fixed imports and refactored
DimaBir Oct 6, 2023
190c13b
Fixed imports and refactored
DimaBir Oct 6, 2023
5991cdd
Fixed imports and refactored
DimaBir Oct 6, 2023
bf4e322
Fixed imports and refactored
DimaBir Oct 6, 2023
318f005
Fixed imports and refactored
DimaBir Oct 6, 2023
9da8063
Fixed imports and refactored
DimaBir Oct 6, 2023
e0b7018
Fixed imports and refactored
DimaBir Oct 6, 2023
5c0f21a
Fixed logging message to include device type for PyTorch prediction l…
DimaBir Oct 6, 2023
410cbed
Fixed logging message to include device type for PyTorch prediction l…
DimaBir Oct 6, 2023
56dee2c
Fixed logging message to include device type for PyTorch prediction l…
DimaBir Oct 6, 2023
5c80c97
Fixed logging message to include device type for PyTorch prediction l…
DimaBir Oct 6, 2023
292d954
Merge remote-tracking branch 'origin/dev2' into dev2
DimaBir Oct 6, 2023
e06500e
Fixed logging message to include device type for PyTorch prediction l…
DimaBir Oct 6, 2023
f3bb8cb
IMG Batch size fix in TRT models
DimaBir Oct 6, 2023
4bf4485
Fixed warnings
DimaBir Oct 6, 2023
6ae5e84
Fixed warnings
DimaBir Oct 6, 2023
ac3dc53
Fixed warnings
DimaBir Oct 6, 2023
e929881
Delete benchmark_class.py
DimaBir Oct 7, 2023
93b6a45
Refactor benchmark
DimaBir Oct 7, 2023
834c8bb
Refactor benchmark
DimaBir Oct 7, 2023
29368fc
Fixed average calculation
DimaBir Oct 9, 2023
fac97ba
Merge branch 'master' into dev2
DimaBir Oct 9, 2023
a42229a
Updated README.md and results
DimaBir Oct 9, 2023
43fa5f4
Merge branch 'master' into dev2
DimaBir Oct 11, 2023
63133ad
Added CPU only version in the docker , updated README accordingly
DimaBir Oct 11, 2023
64773df
Updated Dockerfile to use only original requirements.txt
DimaBir Oct 11, 2023
f9d7dc9
Updated Dockerfile to use only original requirements.txt
DimaBir Oct 11, 2023
2f81c9a
Fix cuda related parts of code
DimaBir Oct 11, 2023
b6ce20f
Fix cuda related parts of code
DimaBir Oct 11, 2023
84a3178
Updated README.md
DimaBir Oct 11, 2023
a72ba31
Fixed Dockerfile condition
DimaBir Oct 11, 2023
bb99ebf
Revert Dockerfile to mention GPU image explicitly
DimaBir Oct 11, 2023
6349fa8
Update README.md
DimaBir Oct 11, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 15 additions & 12 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,24 +1,27 @@
# Use an official TensorRT base image
FROM nvcr.io/nvidia/tensorrt:23.08-py3
# Argument for base image. Default is a neutral Python image.
ARG BASE_IMAGE=python:3.8-slim

# Install system packages
RUN apt-get update && apt-get install -y \
python3-pip \
git \
libjpeg-dev \
libpng-dev
# Use the base image specified by the BASE_IMAGE argument
FROM $BASE_IMAGE

# Copy the requirements.txt file into the container
# Argument to determine environment: cpu or gpu (default is cpu)
ARG ENVIRONMENT=cpu

# Install required system packages conditionally
RUN apt-get update && apt-get install -y python3-pip git && \
if [ "$ENVIRONMENT" = "gpu" ] ; then apt-get install -y libjpeg-dev libpng-dev ; fi

# Copy the requirements file based on the environment into the container
COPY requirements.txt /workspace/requirements.txt

# Install Python packages
RUN pip3 install --no-cache-dir -r /workspace/requirements.txt

# Install torch-tensorrt from the special location
RUN pip3 install torch-tensorrt -f https://github.com/NVIDIA/Torch-TensorRT/releases
# Only install torch-tensorrt for GPU environment
RUN if [ "$ENVIRONMENT" = "gpu" ] ; then pip3 install torch-tensorrt -f https://github.com/NVIDIA/Torch-TensorRT/releases ; fi

# Set the working directory
WORKDIR /workspace

# Copy local project files to /workspace in the image
COPY . /workspace
COPY . /workspace
100 changes: 75 additions & 25 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,44 +6,73 @@
2. [Requirements](#requirements)
- [Steps to Run](#steps-to-run)
- [Example Command](#example-command)
3. [RESULTS](#results) ![Static Badge](https://img.shields.io/badge/update-orange)
3. [GPU-CUDA Results](#gpu-cuda-results) ![Static Badge](https://img.shields.io/badge/update-orange)
- [Results explanation](#results-explanation)
- [Example Input](#example-input)
- [Example prediction results](#example-prediction-results)
- [PC Setup](#pc-setup)
4. [Benchmark Implementation Details](#benchmark-implementation-details) ![New](https://img.shields.io/badge/-New-842E5B)
- [PyTorch CPU & CUDA](#pytorch-cpu--cuda)
- [TensorRT FP32 & FP16](#tensorrt-fp32--fp16)
- [ONNX](#onnx)
- [OpenVINO](#openvino)
5. [Benchmarking and Visualization](#benchmarking-and-visualization) ![New](https://img.shields.io/badge/-New-842E5B)
5. [Extra](#extra) ![New](https://img.shields.io/badge/-New-842E5B)
- [Remote Linux Server - CPU only - Inference](#remote-linux-server-cpu-only-inference)
- [Prediction results](#prediction-results)
6. [Author](#author)
7. [PC Setup](#pc-setup)
8. [References](#references)
7. [References](#references)


<img src="./inference/plot_latest.png" width="100%">

## Overview
This project demonstrates how to perform inference with a PyTorch model and optimize it using ONNX, OpenVINO, and NVIDIA TensorRT. The script loads a pre-trained ResNet-50 model from torch-vision, performs inference on a user-provided image, and prints the top-K predicted classes. Additionally, the script benchmarks the model's performance in the following configurations: PyTorch CPU, ONNX CPU, OpenVINO CPU, PyTorch CUDA, TensorRT-FP32, and TensorRT-FP16, providing insights into the speedup gained through optimization.
This project showcases inference with a PyTorch ResNet-50 model and its optimization using ONNX, OpenVINO, and NVIDIA TensorRT. The script infers a user-specified image and displays top-K predictions. Benchmarking covers configurations like PyTorch CPU, ONNX CPU, OpenVINO CPU, PyTorch CUDA, TensorRT-FP32, and TensorRT-FP16.

The project is Dockerized for easy deployment:
1. **CPU-only Deployment** - Suitable for non-GPU systems (supports `PyTorch CPU`, `ONNX CPU` and `OpenVINO CPU` models only).
2. **GPU Deployment** - Optimized for NVIDIA GPUs (supports all models: `PyTorch CPU`, `ONNX CPU`, `OpenVINO CPU`, `PyTorch CUDA`, `TensorRT-FP32`, and `TensorRT-FP16`).

For Docker instructions, refer to the [Steps to Run](#steps-to-run) section.


## Requirements
- This repo cloned
- Docker
- NVIDIA GPU (for CUDA and TensorRT benchmarks and optimizations)
- Python 3.x
- [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#install-guide) (for running the Docker container with GPU support)
- ![New](https://img.shields.io/badge/-New-842E5B)[OpenVINO Toolkit](https://www.intel.com/content/www/us/en/developer/tools/openvino-toolkit/download.html) (for running OpenVINO model)

### Steps to Run

- NVIDIA drivers installed on the host machine.
- [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#install-guide) (for running the Docker container with GPU support). Pre-installed withing the GPU docker image.

## Steps to Run
### Building the Docker Image

Depending on target environment (CPU or GPU), choose a different base image.

1. **CPU Deployment**:
For systems without a GPU or CUDA support, simply use the default base image.
```bash
docker build -t my_image_cpu .
```

2. **GPU Deployment**:
If your system has GPU support and you have NVIDIA Docker runtime installed, you can use the TensorRT base image to leverage GPU acceleration.
```bash
docker build --build-arg ENVIRONMENT=gpu --build-arg BASE_IMAGE=nvcr.io/nvidia/tensorrt:23.08-py3 -t my_project_image_gpu .
```

### Running the Docker Container
1. **CPU Version**:
```bash
docker run -it --rm my_image_cpu
```

2. **GPU Version**:
```bash
docker run --gpus all -it --rm my_image_gpu
```

### Run the Script inside the Container
```sh
# 1. Build the Docker Image
docker build -t awesome-tensorrt

# 2. Run the Docker Container
docker run --gpus all --rm -it awesome-tensorrt

# 3. Run the Script inside the Container
python main.py [--mode all]
```

Expand All @@ -59,7 +88,7 @@ python main.py --topk 3 --mode=all --image_path="./inference/train.jpg"

This command will run predictions on the chosen image (`./inference/train.jpg`), show the top 3 predictions, and run all available models. Note: plot created only for `--mode=all` and results plotted and saved to `./inference/plot.png`

## RESULTS
## GPU-CUDA Results
### Inference Benchmark Results
<img src="./inference/plot_latest.png" width="70%">

Expand All @@ -85,6 +114,11 @@ Here is an example of the input image to run predictions and benchmarks on:
#5: 2% lynx
```

### PC Setup
- CPU: Intel(R) Core(TM) i7-10700K CPU @ 3.80GHz
- RAM: 32 GB
- GPU: GeForce RTX 3070

## Benchmark Implementation Details
Here you can see the flow for each model and benchmark.

Expand Down Expand Up @@ -125,16 +159,32 @@ OpenVINO is a toolkit from Intel that optimizes deep learning model inference fo
4. Perform inference on the provided image using the OpenVINO model.
5. Benchmark results, including average inference time, are logged for the OpenVINO model.

## Benchmarking and Visualization
The results of the benchmarks for all modes are saved and visualized in a bar chart, showcasing the average inference times across different backends. The visualization aids in comparing the performance gains achieved with different optimizations.
## Extra
### Remote Linux Server - CPU only - Inference
<img src="./inference/plot_linux_server.png" width="70%">

### Prediction results
`model.log` file content
```
Running prediction for OV model
#1: 15% Egyptian cat
#2: 14% tiger cat
#3: 9% tabby
#4: 2% doormat
#5: 2% lynx


Running prediction for ONNX model
#1: 15% Egyptian cat
#2: 14% tiger cat
#3: 9% tabby
#4: 2% doormat
#5: 2% lynx
```


## Author
[DimaBir](https://github.com/DimaBir)

## PC Setup
- CPU: Intel(R) Core(TM) i7-10700K CPU @ 3.80GHz
- RAM: 32 GB
- GPU: GeForce RTX 3070

## References
- **PyTorch**: [Official Documentation](https://pytorch.org/docs/stable/index.html)
Expand Down
7 changes: 5 additions & 2 deletions benchmark/benchmark_models.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,9 @@ def run(self):
with torch.no_grad():
for _ in range(self.nwarmup):
features = self.model(input_data)
torch.cuda.synchronize()

if self.device == "cuda":
torch.cuda.synchronize()

# Start timing
print("Start timing ...")
Expand All @@ -81,7 +83,8 @@ def run(self):
for i in range(1, self.nruns + 1):
start_time = time.time()
features = self.model(input_data)
torch.cuda.synchronize()
if self.device == "cuda":
torch.cuda.synchronize()
end_time = time.time()
timings.append(end_time - start_time)

Expand Down
4 changes: 3 additions & 1 deletion benchmark/benchmark_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@
import seaborn as sns
from typing import Dict, Any
import torch
import onnxruntime as ort

from benchmark.benchmark_models import PyTorchBenchmark, ONNXBenchmark, OVBenchmark

Expand Down Expand Up @@ -43,6 +42,9 @@ def run_all_benchmarks(
("cuda", torch.float16, True),
]
for device, precision, is_trt in configs:
if not torch.cuda.is_available() and device == "cuda":
continue

model_to_use = models[f"PyTorch_{device}"].to(device)

if not is_trt:
Expand Down
Binary file added inference/plot_linux_server.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
14 changes: 13 additions & 1 deletion main.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,14 @@
import logging
import os.path
import torch_tensorrt
import torch

CUDA_AVAILABLE = False
if torch.cuda.is_available():
try:
import torch_tensorrt
CUDA_AVAILABLE = True
except ImportError:
print("torch-tensorrt is not installed. Running on CPU mode only.")

from benchmark.benchmark_models import benchmark_onnx_model, benchmark_ov_model
from benchmark.benchmark_utils import run_all_benchmarks, plot_benchmark_results
Expand Down Expand Up @@ -79,6 +87,10 @@ def main():
precision = config["precision"]
is_trt = config["is_trt"]

# check if CUDA is available
if device.lower() == "cuda" and not CUDA_AVAILABLE:
continue

model = init_cuda_model(model_loader, device, precision)

# If the configuration is not for TensorRT, store the model under a PyTorch key
Expand Down
2 changes: 0 additions & 2 deletions prediction/prediction_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,6 @@
import torch
import onnxruntime as ort
import numpy as np
import torch_tensorrt



def make_prediction(
Expand Down