How did the Attention Mechanism start an AI frenzy? | LM3

Подписаться 86 тыс.

Просмотров 10 тыс.

50% 1

The attention mechanism is well known for its use in Transformers. But where does it come from? It's origins lie in fixing a strange problems of RNNs.
Support me on Patreon! / vcubingx
Language Modeling Playlist: • Language Modeling
3blue1brown series on Transformers: • But what is a GPT? Vi...
The source code for the animations can be found here:
github.com/vivek3141/dl-visua...
These animation in this video was made using 3blue1brown's library, manim:
github.com/3b1b/manim
Sources (includes the entire series): docs.google.com/document/d/1e...
Chapters
0:00 Introduction
0:22 Machine Translation
2:01 Attention Mechanism
8:04 Outro
Music (In Order):
Helynt - Route 10
Helynt - Bo-Omb Battlefield
Helynt - Underwater
Philanthrope, mommy - embrace chll.to/7e941f72
Helynt - Twinleaf Town
Follow me!
Website: vcubingx.com
Twitter: / vcubingx
Github: github.com/vivek3141
Instagram: / vcubingx
Patreon: / vcubingx

Опубликовано:

1 июл 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 22

@vcubingx 2 месяца назад

With that, these are the three videos I had planned out. Do check out the previous ones if you missed them! What kind of videos would you guys like to see next?

@VisibilityO2 2 месяца назад

Hey , I consider vcubingx should explain the sparse attention it make the models handle large inputs more efficiently by only attending to a subset of elements . In large sequences it helps in a advantage of calculation (as it requires less calculation than softmax). I will recommend you to read this 'research.google/blog/rethinking-attention-with-performers/?m=1'

@FabioDBB 11 дней назад

Truly amazing explanation, thx!

@scottmcevoy9252 2 месяца назад

This is one of the best explanations of attention I have seen so far. Understanding the bottleneck motivation really makes this clear right around 3:15.

@blackveganarchist 2 месяца назад

you’re doing god’s work brother, thank you for the series

@lolatomroflsinnlos 2 месяца назад

Thanks for this series :)

@shukurullomeliboyev2004 Месяц назад

Best explanation when i have found so far, thank you

@calix-tang 2 месяца назад

What a great video mfv I paid attention the whole time

@antoineberkani9747 2 месяца назад

I really like how easy you make it to understand the why of things. I think you've accomplished your goal of making it seem like I could come up with this! Please cover multi headed self attention next! :) I am worried that this simple approach skips important pieces of the puzzle though. Transformers do have a lot of moving parts it seems. But it seems like you're only getting started!

@kevindave277 Месяц назад

Thank you, Vivek. Absolutely love your content. Please also keep adding Math content, though. Maybe create a playlist about different functions, limits etc? Whatever suits you.

@nikkatalnikov 2 месяца назад

great explanation

@j.domenig418 2 месяца назад

Thanks!

@balasubramaniana9541 2 месяца назад

awesome

@post_toska 2 месяца назад

nice vid

@Fussfackel 2 месяца назад

Great material and presentation, thanks a lot for your work! I'd like to see some deep dive into how embeddings work, as we can get embeddings from decoder-only models like GPTs, Llamas, etc. and they use some form of embeddings for their internal representations, right? But there are also encoder-only models like BERT and others (OpenAIs text-embedding models) which are actually used instead. What is their difference and why does one work better than the other? Is it just because of computer differences or are there some inherent differences?