TensorRT Edge-LLM

High-Performance Large Language Model Inference Framework for NVIDIA Edge Platforms

Overview | Examples | Documentation | Roadmap

Overview

TensorRT Edge-LLM is NVIDIA's high-performance C++ inference runtime for Large Language Models (LLMs) and Vision-Language Models (VLMs) on embedded platforms. It enables efficient deployment of state-of-the-art language models on resource-constrained devices such as NVIDIA Jetson and NVIDIA DRIVE platforms. TensorRT Edge-LLM provides convenient Python scripts to convert HuggingFace checkpoints to ONNX. Engine build and end-to-end inference runs entirely on Edge platforms.

Getting Started

For the supported platforms, models and precisions, see the Overview. Get started with TensorRT Edge-LLM in <15 minutes. For complete installation and usage instructions, see the Quick Start Guide.

Documentation

Introduction

Overview - What is TensorRT Edge-LLM and key features
Supported Models - Complete model compatibility matrix

User Guide

Installation - Set up Python export pipeline and C++ runtime
Quick Start Guide - Run your first inference in ~15 minutes
Examples - End-to-end LLM, VLM, EAGLE, and LoRA workflows
Input Format Guide - Request format and specifications
Chat Template Format - Chat template configuration

Developer Guide

Software Design

Python Export Pipeline - Model export and quantization
Engine Builder - Building TensorRT engines
C++ Runtime Overview - Runtime system architecture
- LLM Inference Runtime
- LLM SpecDecode Runtime

Advanced Topics

Customization Guide - Customizing TensorRT Edge-LLM for your needs
TensorRT Plugins - Custom plugin development
Tests - Comprehensive test suite for contributors

Use Cases

🚗 Automotive

In-vehicle AI assistants
Voice-controlled interfaces
Scene understanding
Driver assistance systems

🤖 Robotics

Natural language interaction
Task planning and reasoning
Visual question answering
Human-robot collaboration

🏭 Industrial IoT

Equipment monitoring with NLP
Automated inspection
Predictive maintenance
Voice-controlled machinery

📱 Edge Devices

On-device chatbots
Offline language processing
Privacy-preserving AI
Low-latency inference

Tech Blogs

Coming soon

Stay tuned for technical deep-dives, optimization guides, and deployment best practices.

Latest News

[01/05] 🚀 Accelerate AI Inference for Edge and Robotics with NVIDIA Jetson T4000 and NVIDIA JetPack 7.1 ✨ ➡️ link
[01/05] 🚀 Accelerating LLM and VLM Inference for Automotive and Robotics with NVIDIA TensorRT Edge-LLM ✨ ➡️ link

Follow our GitHub repository for the latest updates, releases, and announcements.

Support

Documentation: Full Documentation
Examples: Code Examples
Roadmap: Developer Roadmap
Issues: GitHub Issues
Discussions: GitHub Discussions
Forums: NVIDIA Developer Forums

License

Apache License 2.0

Contributing

We welcome contributions! Please see our Contributing Guidelines for details.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github		.github
3rdParty		3rdParty
cmake		cmake
cpp		cpp
docs		docs
examples		examples
kernelSrcs		kernelSrcs
tensorrt_edgellm		tensorrt_edgellm
tests		tests
unittests		unittests
.clang-format		.clang-format
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
CMakeLists.txt		CMakeLists.txt
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CODING_GUIDELINES.md		CODING_GUIDELINES.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
LICENSE_HEADER		LICENSE_HEADER
README.md		README.md
SECURITY.md		SECURITY.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TensorRT Edge-LLM

Overview

Getting Started

Documentation

Introduction

User Guide

Developer Guide

Software Design

Advanced Topics

Use Cases

Tech Blogs

Latest News

Support

License

Contributing

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors 2

Languages

License

NVIDIA/TensorRT-Edge-LLM

Folders and files

Latest commit

History

Repository files navigation

TensorRT Edge-LLM

Overview

Getting Started

Documentation

Introduction

User Guide

Developer Guide

Software Design

Advanced Topics

Use Cases

Tech Blogs

Latest News

Support

License

Contributing

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors 2

Languages

Packages