Skip to content

NVIDIA/TensorRT-Edge-LLM

TensorRT Edge-LLM

High-Performance Large Language Model Inference Framework for NVIDIA Edge Platforms

Documentation version license

Overview   |   Examples   |   Documentation   |   Roadmap


Overview

TensorRT Edge-LLM is NVIDIA's high-performance C++ inference runtime for Large Language Models (LLMs) and Vision-Language Models (VLMs) on embedded platforms. It enables efficient deployment of state-of-the-art language models on resource-constrained devices such as NVIDIA Jetson and NVIDIA DRIVE platforms. TensorRT Edge-LLM provides convenient Python scripts to convert HuggingFace checkpoints to ONNX. Engine build and end-to-end inference runs entirely on Edge platforms.


Getting Started

For the supported platforms, models and precisions, see the Overview. Get started with TensorRT Edge-LLM in <15 minutes. For complete installation and usage instructions, see the Quick Start Guide.


Documentation

Introduction

User Guide

Developer Guide

Software Design

Advanced Topics


Use Cases

🚗 Automotive

  • In-vehicle AI assistants
  • Voice-controlled interfaces
  • Scene understanding
  • Driver assistance systems

🤖 Robotics

  • Natural language interaction
  • Task planning and reasoning
  • Visual question answering
  • Human-robot collaboration

🏭 Industrial IoT

  • Equipment monitoring with NLP
  • Automated inspection
  • Predictive maintenance
  • Voice-controlled machinery

📱 Edge Devices

  • On-device chatbots
  • Offline language processing
  • Privacy-preserving AI
  • Low-latency inference

Tech Blogs

Coming soon

Stay tuned for technical deep-dives, optimization guides, and deployment best practices.


Latest News

  • [01/05] 🚀 Accelerate AI Inference for Edge and Robotics with NVIDIA Jetson T4000 and NVIDIA JetPack 7.1 ✨ ➡️ link
  • [01/05] 🚀 Accelerating LLM and VLM Inference for Automotive and Robotics with NVIDIA TensorRT Edge-LLM ✨ ➡️ link

Follow our GitHub repository for the latest updates, releases, and announcements.


Support


License

Apache License 2.0


Contributing

We welcome contributions! Please see our Contributing Guidelines for details.


About

High-performance, light-weight C++ LLM and VLM Inference Software for Physical AI

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

No packages published

Languages