High-Performance Large Language Model Inference Framework for NVIDIA Edge Platforms
Overview | Examples | Documentation | Roadmap
TensorRT Edge-LLM is NVIDIA's high-performance C++ inference runtime for Large Language Models (LLMs) and Vision-Language Models (VLMs) on embedded platforms. It enables efficient deployment of state-of-the-art language models on resource-constrained devices such as NVIDIA Jetson and NVIDIA DRIVE platforms. TensorRT Edge-LLM provides convenient Python scripts to convert HuggingFace checkpoints to ONNX. Engine build and end-to-end inference runs entirely on Edge platforms.
For the supported platforms, models and precisions, see the Overview. Get started with TensorRT Edge-LLM in <15 minutes. For complete installation and usage instructions, see the Quick Start Guide.
- Overview - What is TensorRT Edge-LLM and key features
- Supported Models - Complete model compatibility matrix
- Installation - Set up Python export pipeline and C++ runtime
- Quick Start Guide - Run your first inference in ~15 minutes
- Examples - End-to-end LLM, VLM, EAGLE, and LoRA workflows
- Input Format Guide - Request format and specifications
- Chat Template Format - Chat template configuration
- Python Export Pipeline - Model export and quantization
- Engine Builder - Building TensorRT engines
- C++ Runtime Overview - Runtime system architecture
- Customization Guide - Customizing TensorRT Edge-LLM for your needs
- TensorRT Plugins - Custom plugin development
- Tests - Comprehensive test suite for contributors
🚗 Automotive
- In-vehicle AI assistants
- Voice-controlled interfaces
- Scene understanding
- Driver assistance systems
🤖 Robotics
- Natural language interaction
- Task planning and reasoning
- Visual question answering
- Human-robot collaboration
🏭 Industrial IoT
- Equipment monitoring with NLP
- Automated inspection
- Predictive maintenance
- Voice-controlled machinery
📱 Edge Devices
- On-device chatbots
- Offline language processing
- Privacy-preserving AI
- Low-latency inference
Coming soon
Stay tuned for technical deep-dives, optimization guides, and deployment best practices.
- [01/05] 🚀 Accelerate AI Inference for Edge and Robotics with NVIDIA Jetson T4000 and NVIDIA JetPack 7.1 ✨ ➡️ link
- [01/05] 🚀 Accelerating LLM and VLM Inference for Automotive and Robotics with NVIDIA TensorRT Edge-LLM ✨ ➡️ link
Follow our GitHub repository for the latest updates, releases, and announcements.
- Documentation: Full Documentation
- Examples: Code Examples
- Roadmap: Developer Roadmap
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Forums: NVIDIA Developer Forums
We welcome contributions! Please see our Contributing Guidelines for details.