Skip to content
#

nvidia-dynamo

Here are 3 public repositories matching this topic...

A comprehensive toolkit for deploying production-ready Generative AI infrastructure on Amazon EKS. Includes pre-configured components for: 🚀 AI Gateway (LiteLLM) 🤖 LLM Serving (vLLM, SGLang, Ollama) 📊 Vector Databases, 🔍 Embedding Models (TEI) 📈 Observability (Langfuse, Phoenix) etc. Fast-track your GenAI deployment with Kubernetes

  • Updated May 26, 2026
  • JavaScript

The goal of the project is to benchmark and optimize BERT inference using different backends—PyTorch eager mode, TorchDynamo (Inductor backend), and NVIDIA Triton Inference Server. We use GLUE SST-2 samples for evaluation and compare performance through profiling, kernel timing, and latency analysis.

  • Updated May 10, 2025
  • Jupyter Notebook

Improve this page

Add a description, image, and links to the nvidia-dynamo topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the nvidia-dynamo topic, visit your repo's landing page and select "manage topics."

Learn more