Implementation and optimization of MTP for DeepSeek R1 in TensorRT-LLM

Просмотров: 976 | Загружено: 2 нд.

NVIDIA Developer

Скачать

Всё видео пользователя: AM-NVIDIA Developer

Подробнее о видео

Learn from our experts about how we use MTP speculative decoding method to achieve better performance in TensorRT-LLM. You'll learn the MTP method in LLM inference, the MTP implementation in TensorRT-LLM, and the optimization of MTP to further boost the performance.

Implementation and optimization of MTP for DeepSeek R1 in TensorRT-LLM

Похожие видео