Beyond the Algorithm with NVIDIA: The New PyTorch Architecture for TensorRT-LLM

Просмотров: 0 | Загружено: 6 дн

NVIDIA Developer

5

Скачать

Всё видео пользователя: AM-NVIDIA Developer

Подробнее о видео

TensorRT-LLM equips practitioners with the tools needed to achieve state-of-the-art performance for large language model (LLM) deployments on the NVIDIA platform.

In this talk, we introduce a new PyTorch-based architecture for TensorRT-LLM that significantly enhances user experience and developer velocity—making it easier to build custom models, integrate new kernels, and extend runtime functionality, while delivering SOTA performance on the NVIDIA GPUs.

Some concrete examples will be used to illustrate the flexibility of this new PyTorch-based Architecture to help add new customizations quickly and achieve the SOTA performance.

Beyond the Algorithm with NVIDIA: The New PyTorch Architecture for TensorRT-LLM

Похожие видео