Тёмный

vLLM Office Hours - Multimodal Models in vLLM with Roblox - August 8, 2024 

Neural Magic
Подписаться 1,6 тыс.
Просмотров 396
50% 1

Опубликовано:

 

21 сен 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 2   
@hari000-f6y
@hari000-f6y 15 дней назад
I have a question!. I'm serving multimodal on vLLM, quantized (InternVL2) on L4 , it takes ~5-6 secs to complete a request, so when multiple request hit at a time, it takes much time ~30 secs to complete the requests. how to handle it like multiple requests also gets completed in ~5 secs. I have less understanding in batch_requesting and all.
@shumshvenhiszali
@shumshvenhiszali Месяц назад
Say code opensource but where?
Далее
GIANT Gummy Worm Pt.6 #shorts
00:46
Просмотров 40 млн
vLLM on Kubernetes in Production
27:31
Просмотров 2,9 тыс.
What Is an AI Anyway? | Mustafa Suleyman | TED
22:02
Просмотров 1,5 млн
Unlock Faster and More Efficient LLMs with SparseGPT
42:27
Turns out REST APIs weren't the answer (and that's OK!)
10:38