Тёмный

vLLM Office Hours - FP8 Quantization Deep Dive - July 9, 2024 

Neural Magic
Подписаться 1,6 тыс.
Просмотров 832
50% 1

In this session, we brought on vLLM Committers from Anyscale to give an in-depth dive into FP8 quantization. They discussed why FP8 is important, how to get started with FP8 in vLLM, and shared quality and performance results of FP8 quantization.
We also covered the latest updates in vLLM v0.5.1, including pipeline parallelism and model support for Gemma 2, Jamba, and DeepSeek-V2.
For more details, check out the session slides here: docs.google.co...
Join our bi-weekly vLLM office hours to stay current with vLLM, ask questions, meet the community, and give feedback: neuralmagic.co...

Опубликовано:

 

21 сен 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии    
Далее
Unlock Faster and More Efficient LLMs with SparseGPT
42:27
Deep Dive: Optimizing LLM inference
36:12
Просмотров 21 тыс.