Тёмный

Fast Inference of Mixture-of-Experts Language Models with Offloading 

AI Papers Academy
Подписаться 11 тыс.
Просмотров 1,4 тыс.
50% 1

Опубликовано:

 

22 окт 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 6   
@winterclimber7520
@winterclimber7520 9 месяцев назад
Very exciting work! The resulting speed the paper proposes won't break any land speed records (2-3 tokens per second), but in my experience one of the most productive and practical applications of LLMs is prompting it with multiple choice questions, which only require a single token. This paper (and provided code!) for GPT3.5 levels of inference running local on consumer hardware is a huge breakthrough, and I'm excited to give it a try!
@jacksonmatysik8007
@jacksonmatysik8007 9 месяцев назад
I have been looking for channel like this for ages as I hate reading
@fernandos-bs6544
@fernandos-bs6544 7 месяцев назад
I just found your channel. It is amazing. Congratulations. Your numbers will grow soon, I am sure. Great quality and great content.
@aipapersacademy
@aipapersacademy 7 месяцев назад
Thank you 🙏
@PaulSchwarzer-ou9sw
@PaulSchwarzer-ou9sw 9 месяцев назад
Thanks! ❤
@ameynaik2743
@ameynaik2743 6 месяцев назад
I believe this applicable only for single request? If you have change of experts, you will most likely have many experts active for various requests. Is my understanding correct? thank you.
Далее
Кольцо Всевластия от Samsung
01:00
Просмотров 578 тыс.
ТЕСЛА КИБЕРТРАК x WYLSACOM / РАЗГОН
1:40:47
Were RNNs All We Needed? (Paper Explained)
27:48
Просмотров 45 тыс.
[1hr Talk] Intro to Large Language Models
59:48
Просмотров 2,2 млн
Understanding Mixture of Experts
28:01
Просмотров 9 тыс.
Mixture of Experts LLM - MoE explained in simple terms
22:54
Кольцо Всевластия от Samsung
01:00
Просмотров 578 тыс.