Тёмный

Multi-Head vs Grouped Query Attention. Claude AI, Llama-3, Gemma are choosing speed over quality? 

Chris Hay
Подписаться 15 тыс.
Просмотров 1 тыс.
50% 1

Multi-Head vs Grouped Query Attention. Are Claude, Llama-3, Gemma are choosing speed over quality?
frontier model providers such as anthropic claude 3.5 sonnet, and Google Gemini / Gemma 2B and Meta Llama-3 are trending towards using grouped query attention vs traditional multi-headed attention in transformer models as their attention mechansim. Interesting OpenAI with GPT-4o doesn't seem to be making this trade off.
Although this choice speeds up AI inference, it does impact content quality for output such such as summarization. in this video chris shows that you get better coherent output from models such as llama-2 or claude 3-opus over new models such as llama-3 or gemini or gemma. in the end, in certain scenarios such as summarization or generative content, gpt-4o still beats sonnet.
repo
github.com/chrishayuk/mha_gqa...

Наука

Опубликовано:

 

26 июл 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 16   
@makepeace88
@makepeace88 24 дня назад
I just attended detailed anatomy of LLM session.. and it’s just wow! Nobody’s telling these details. Thanks very much Chris ❤
@chrishayuk
@chrishayuk 24 дня назад
Glad it was useful, I skipped a lot of details, as I wanted to keep the focus on MHA vs GQA. I will probs do some other videos on some of the other details
@everyhandletaken
@everyhandletaken 25 дней назад
Interesting! Claude 3.5 Sonnet is definitely great for code, much better than cgpt 4-o & has really helped me solve things that are well beyond my brain capacity in the last few days.
@chrishayuk
@chrishayuk 25 дней назад
totally agree, much better for code than gpt-4o
@danielhenderson7050
@danielhenderson7050 25 дней назад
This was very interesting
@chrishayuk
@chrishayuk 25 дней назад
Glad you enjoyed, definitely a fun rabbit hole
@trsd8640
@trsd8640 25 дней назад
Great video! I don’t understand it fully, had to watch it again, but I‘m getting a idea of what is happening! Thank you!
@chrishayuk
@chrishayuk 25 дней назад
it was quite a tough one to record, as i'm trying to avoid explaining the entire transformers architecture and attention fully (i'll do that in another video), but do enough to just show how this architectural change is affecting models output. it was a weird balance and apologies that i never explained it enough
@Leo-ph7ow
@Leo-ph7ow 25 дней назад
Excelent content! Thanks!
@chrishayuk
@chrishayuk 25 дней назад
Glad you liked it!
@seanknowles9985
@seanknowles9985 25 дней назад
Intel agencies are having their fill first. Its obviously being slowed down so three letter agencies can get ahead of this.
@chrishayuk
@chrishayuk 25 дней назад
lol, i'm sure 3 letter agencies are having their say but i suspect it's not on MHA vs GQA but would love to hear that conversation if they were
@user-rs4sg2tz6k
@user-rs4sg2tz6k 9 дней назад
I believe 4o's judges only 90%
@chrishayuk
@chrishayuk 9 дней назад
interesting, where did you get that info from?
Далее
Fine-Tune Llama3 using Synthetic Data
37:03
Просмотров 2,6 тыс.
How AI 'Understands' Images (CLIP) - Computerphile
18:05
BERT vs GPT
1:00
Просмотров 79 тыс.
Has Generative AI Already Peaked? - Computerphile
12:48
How to Soldering wire in Factory ?
0:10
Просмотров 4,2 млн
АЙФОН Г0ВН0
0:54
Просмотров 1,3 млн