Тёмный

Xupeng Miao (Purdue University) - Faster Inference of LLMs Seminar ⚡️ 

Nadav Timor
Подписаться 59
Просмотров 62
50% 1

About the seminar: faster-llms.ve...
Title: Towards Fast and Affordable Serving Systems for Large Language Models
Abstract: In the rapidly evolving field of generative artificial intelligence, efficient deployment of large language models (LLMs) is a critical challenge. In this talk, I will introduce our three innovative approaches to enhancing the efficiency and cost-effectiveness of LLM inference and serving systems. First, I will present SpecInfer, the inaugural tree-based speculative inference system that reduces LLM serving latency by 1.5-3.5x compared to existing solutions by leveraging a novel token tree speculation and verification mechanism. Next, I will describe SpotServe, the first LLM serving system on spot instances, handling preemptions with dynamic reparallelization, ensuring relatively low tail latency, and reducing monetary cost by 54%. Finally, I will exhibit Mirage, a superoptimizer that automatically discovers highly-optimized GPU implementations for LLMs and beyond, which might even be faster than existing expert-designed implementations like FlashAttention.
Recorded on Aug 28, 2024.

Опубликовано:

 

7 окт 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии    
Далее
Accelerating LLM Inference with vLLM
35:53
Просмотров 4,1 тыс.
نترس تو برق نبود😅😅
00:17
Просмотров 934 тыс.
Thermoelectric cooling: it's not great.
32:51
Просмотров 2,5 млн
ZEN 5 has a 3D V-Cache Secret
19:32
Просмотров 72 тыс.
How are holograms possible?
46:24
Просмотров 501 тыс.
Mircea Stan, Ph.D. Speaker Series
48:20
Просмотров 102