Тёмный
No video :(

Visual Guide to Transformer Neural Networks - (Episode 2) Multi-Head & Self-Attention 

Hedu AI by Batool Haider
Подписаться 11 тыс.
Просмотров 170 тыс.
50% 1

Visual Guide to Transformer Neural Networks (Series) - Step by Step Intuitive Explanation
Episode 0 - [OPTIONAL] The Neuroscience of "Attention"
• The Neuroscience of “A...
Episode 1 - Position Embeddings
• Visual Guide to Transf...
Episode 2 - Multi-Head & Self-Attention
• Visual Guide to Transf...
Episode 3 - Decoder’s Masked Attention
• Visual Guide to Transf...
This video series explains the math, as well as the intuition behind the Transformer Neural Networks that were first introduced by the “Attention is All You Need” paper.
--------------------------------------------------------------
References and Other Great Resources
--------------------------------------------------------------
Attention is All You Need
arxiv.org/abs/1706.03762
Jay Alammar - The Illustrated Transformer
jalammar.github.io/illustrated...
The A.I Hacker - Illustrated Guide to Transformers Neural Networks: A step by step explanation
jalammar.github.io/illustrated...
Amirhoussein Kazemnejad Blog Post - Transformer Architecture: The Positional Encoding
kazemnejad.com/blog/transform...
Yannic Kilcher RU-vid Video - Attention is All You Need
www.youtube.com/watch?v=iDulh...

Опубликовано:

 

8 авг 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 623   
@HeduAI
@HeduAI 3 года назад
*CORRECTIONS* A big shoutout to the following awesome viewers for these 2 corrections: 1. @Henry Wang and @Holger Urbanek - At (10:28), "dk" is actually the hidden dimension of the Key matrix and not the sequence length. In the original paper (Attention is all you need), it is taken to be 512. 2. @JU PING NG - The result of concatenation at (14:58) is supposed to be 7 x 9 instead of 21 x 3 (that is to so that the concatenation of z matrices happens horizontally and not vertically). With this we can apply a nn.Linear(9, 5) to get the final 7 x 5 shape. Here are the timestamps associated with the concepts covered in this video: 0:00 - Recaps of Part 0 and 1 0:56 - Difference between Simple and Self-Attention 3:11 - Multi-Head Attention Layer - Query, Key and Value matrices 11:44 - Intuition for Multi-Head Attention Layer with Examples
@amortalbeing
@amortalbeing 2 года назад
Where's the first video?
@HeduAI
@HeduAI 2 года назад
​@@amortalbeing Episode 0 can be found here - ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-48gBPL7aHJY.html
@amortalbeing
@amortalbeing 2 года назад
@@HeduAI thanks a lot really appreciate it:)
@omkiranmalepati1645
@omkiranmalepati1645 2 года назад
Awesome...So dk value is 3?
@jasonwheeler2986
@jasonwheeler2986 Год назад
@@omkiranmalepati1645 d_k = embedding dimensions // number of heads
@thegigasurgeon
@thegigasurgeon Год назад
Need to say this out loud, I saw Yannic Kilcher's video, read tonnes of materials on internet, went through atleast 7 playlists, and this is the first time I really understood the inner mechanism of Q, K and V vectors in transformers. You did a great job here
@HeduAI
@HeduAI Год назад
This made my day :,)
@afsalmuhammed4239
@afsalmuhammed4239 Год назад
True
@exciton007
@exciton007 11 месяцев назад
Very intuitive explanation!
@EducationPersonal
@EducationPersonal 9 месяцев назад
Totally agree with this comment
@VitorMach
@VitorMach 8 месяцев назад
Yes, no other video actually explains what the actual input for these are
@nitroknocker14
@nitroknocker14 3 года назад
All 3 parts have been the best presentation I've ever seen of Transformers. Your step-by-step visualizations have filled in so many gaps left by other videos and blog posts. Thank you very much for creating this series.
@HeduAI
@HeduAI 3 года назад
This comment made my day :,) Thanks!
@bryanbaek75
@bryanbaek75 2 года назад
Me, too!
@lessw2020
@lessw2020 2 года назад
Definitely agree. These videos really crystallize a lot of knowledge, thanks for making this series!
@Charmente2014
@Charmente2014 2 года назад
ش
@devstuff2576
@devstuff2576 2 года назад
​@@HeduAI absolutely awesome . You are the best.
@nurjafri
@nurjafri 3 года назад
Damn. This is exactly what a developer coming from other backgrounds need. Simple analogies for a rapid understanding. Thanks a ton. Keep uploadinggggggggggg plss
@Xeneon341
@Xeneon341 3 года назад
Agreed, very well done. You do a very good job of explaining difficult concepts to a non-industry developer (fyi I'm an accountant) without assuming a lot of prior knowledge. I look forward to your next video on masked decoders!!!
@HeduAI
@HeduAI 3 года назад
@@Xeneon341 Oh nice! Glad you enjoyed these videos! :)
@ML-ok9nf
@ML-ok9nf 9 месяцев назад
Absolutely underrated, hands down one of the best explanations I've found on the internet
@HuyLe-nn5ft
@HuyLe-nn5ft Год назад
The important detail that set you apart from the other videos and websites is that not only did you provide the model's architecture with numerous formulas but you also demonstrated them in vectors and matrixes, successfully walked us through each complicated and trivial concept. You really did a good job!
@ghostvillage1
@ghostvillage1 Год назад
Hands down the best series I've found on the web about transformers. Thank you
@fernandonoronha5035
@fernandonoronha5035 2 года назад
I don't have words to describe how much these videos saved me, thank you!
@malekkamoua5968
@malekkamoua5968 2 года назад
I've been stuck for so long trying to get the Transformer Neural Networks and this is by far the best explanation ! The examples are so fun making it easier to comprehend. Thank you so much for you effort !
@HeduAI
@HeduAI 11 месяцев назад
Cheers!
@rohanvaidya3238
@rohanvaidya3238 3 года назад
Best explanation ever on Transformers !!!
@frankietank8019
@frankietank8019 11 месяцев назад
Hands down the best video on transformers I have seen! Thank you for taking your time to make this video.
@MCMelonslice
@MCMelonslice Год назад
This is the best resource for an intuitive understanding of transformers. I will without a doubt point everyone towards your video series. Thank you so much!
@geetanshkalra8340
@geetanshkalra8340 2 года назад
This is by far the best video to understand Attention Networks. Awesome work !!
@oliverhu1025
@oliverhu1025 Год назад
Probably the best explanation of transformers I’ve found online. Read the paper, watched Yannic’s video, some paper reading videos and a few others, the intuition is still missing. This connects the dots, keep up the great work!
@chenlim2165
@chenlim2165 Год назад
Bravo! After watching dozens of other explainer videos, I can finally grasp the reason for multi-headed attention. Excellent video. Please make more!
@jasonpeloquin9950
@jasonpeloquin9950 Год назад
Hands down the best explanation of the use of Query, Key and Value matrices. Great video with an easy example to understand.
@skramturbo8499
@skramturbo8499 2 года назад
I really like the fact that you ask questions within the video. In fact those are the same questions one has and first reading about transformers. Keep up the awesome work!
@artukikemty
@artukikemty Год назад
Thanks for posting, by far this is the most didactic Transformer presentation I've ever seen. AMAZING!
@davidlazaro3143
@davidlazaro3143 Год назад
This video is GOLD, it should be everywere! Thank you so much for doing such an amazing job 😍😍
@chaitanyachhibba255
@chaitanyachhibba255 3 года назад
Were you the one who wrote transformers in the fist place, because no one explained it like you did. This is undoubtfully the best info I have seen. I hope you please keep posting more videos. Thanks a lot.
@HeduAI
@HeduAI 3 года назад
This comment made my day! :) Thank you.
@Srednicki123
@Srednicki123 Год назад
I just repeat what everybody else said: these videos are the best! thank you for the effort
@sowmendas812
@sowmendas812 Год назад
This is literally the best explanation for self-attention I have seen anywhere! Really loved the videos!
@jamesshady5483
@jamesshady5483 Год назад
This explanation is incredible and better than 99% of what I found on the Internet. Thank you!
@kennethm.4998
@kennethm.4998 2 года назад
You have a gift for explanations... Best I've seen anywhere online. Superb.
@alankarmisra
@alankarmisra 9 месяцев назад
3 days, 16 different videos, and your video "just made sense". You just earned a subscriber and a life-long well-wisher.
@abdot604
@abdot604 Год назад
brilliant explanation , your chanel deserve way more ATTENTION.
@sebastiangarciaacosta5468
@sebastiangarciaacosta5468 3 года назад
The best explanation I've ever seen of such a powerful architecture. I'm glad of having found this Joy after searching for positional encoding details while implementing a Transformer from scratch today. Valar Morghulis!
@HeduAI
@HeduAI 3 года назад
Valar Dohaeris my friend ;)
@RafidAslam
@RafidAslam 4 месяца назад
Thank you so much! This is by far the clearest explanation that I've ever seen on this topic
@an_experienced_guy
@an_experienced_guy 10 месяцев назад
This is by far the best and the most clear and insightful explanation of transformers, I've tried to understand it through multiple blogs and videos and stack exchange answers, this is the first time every component became clear to me and how they all work in conjunction. Thanks a lot for this series. Amazing explanations.
@freaknextdoor9040
@freaknextdoor9040 3 года назад
Hands down, this series is the best one explaining the essence of transformers I have found online!! Thanks a lot, you are awesome!!!!
@HeduAI
@HeduAI 3 года назад
Cheers! 🙌
@raunakdey3004
@raunakdey3004 Год назад
Really love coming back to your videos and get a recap on multi layered attention and the transformers! Sometimes I need to make my own specialized attention layers for the dataset in question and sometimes i dunno it just helps to just listen to you talk about transformers and attention ! Really intuitive and helps me to break out of some weird loop of algorithm design I might have gotten myself stuck at. So thank you so so much :D
@forresthu6204
@forresthu6204 2 года назад
Self-attention is a villain that has struck me for a long time. Your presentation has helped me to better understand this genius idea.
@shivam6565
@shivam6565 Год назад
Finally I understood the concept of query, key and value. Thank you.
@VADemon
@VADemon Год назад
Excellent examples and explanation. Don't shy away from using more examples of things that you love, this love shows and will translate to better work overall. Cheers!
@Clammer999
@Clammer999 2 месяца назад
I’ve gone through dozens of videos on transformers and the multi-head attention is one of the most complex mechanisms that require not only a step-by-step explanation, but be accompanied with a step-by-step animation, which many videos tend to skip over but this video really nails it. Thanks so much!
@darkcrafteur165
@darkcrafteur165 Год назад
Never posting but right now I need to thank you, I really don't believe that it exists a better way to understand self attention than watching your video. Thank you !
@bendarodes61
@bendarodes61 2 года назад
I've watched many video series about transformers, this is by far the best.
@sujithkumar5415
@sujithkumar5415 Год назад
This is quite literally the best attention mechanism video out there guys
@MGMG-li6lt
@MGMG-li6lt 3 года назад
Finally! You delivered me from long nights of searching for good explanations about transformers! It was awesome! I can't wait to see the part 3 and beyond!
@HeduAI
@HeduAI 3 года назад
Thanks for this great feedback!
@HeduAI
@HeduAI 3 года назад
“Part 3 - Decoder’s Masked Attention” is out. Thanks for the wait. Enjoy! Cheers! :D ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-gJ9kaJsE78k.html
@adithyakaravadi8170
@adithyakaravadi8170 2 года назад
You are so good, thank you for breaking down a seemingly scary topic for all of us.The original paper requires lot of background to understand clearly, and not all have it. I personally felt lost. Such videos help a lot!
@DeepakSadulla
@DeepakSadulla 2 года назад
The RU-vid example helps a lot with understanding QKVs. Really good explanation... Thanks!!
@bhavyaghai1924
@bhavyaghai1924 Год назад
Educational + Entertaining. Nice examples and figures. Loved it!
@binhle9475
@binhle9475 Год назад
Your attention to details and information structuring are just exceptional. The Avatar and GoT references on top were hilarious and make things perfect. You literally made a story out of complex deep learning concept(s). This is just brillant. You have such a beautiful mind (if you get the reference :D). Please consider making more videos like this, such a gift is truly precious. May the force be always with you. 🤘
@vanhell966
@vanhell966 2 месяца назад
Amazing work. Really appreciate you, making complex topics into simple language with the touch of anime and series. Amazing.
@wayneqwele8847
@wayneqwele8847 6 месяцев назад
Thank you for taking the time explain from a linear algebra perspective what actually happens. Many teachers on youtube are comfortable just leaving it at math symbols and labels. Showing what actually happens to matrice values has sharpened my intuition of what actually happens under the hood. Thank you.🙏
@marcosmartinez9241
@marcosmartinez9241 3 года назад
These are the best serie of videos where I finally can find a good explanation about the Transformer network. Thanks a lot!!
@HeduAI
@HeduAI 3 года назад
Cheers! 🙌
@ariasardari8588
@ariasardari8588 2 года назад
Your ability to convey concepts is quite impressive! Probably the best tutorial video I've ever seen. From now on, every time I open RU-vid, I first check if you have a new video It was fantastic! I greatly appreciate it.
@HeduAI
@HeduAI 2 года назад
Thanks a lot Aria! Really means a lot :)
@rohtashbeniwal9202
@rohtashbeniwal9202 Год назад
this channel needs more love (the way she explains is out of the box). I can say this because I have 4 years of experience in data science, she did a lot of hard work to get so much clarity in concepts (love from India)
@HeduAI
@HeduAI Год назад
Thank you Rohtash! You made my day! :) धन्यवाद
@nicholasabad8361
@nicholasabad8361 2 года назад
By fair the best explanation of Multi-Head Attention I've ever seen on RU-vid! Thanks!
@HeduAI
@HeduAI 2 года назад
Glad to hear this :)
@maryamkhademi
@maryamkhademi 2 года назад
Thank you for putting so much effort in the visualization and awesome narration of these series. These are by far the best videos to explain transformers. You should do more of these videos. You certainly have a gift!
@HeduAI
@HeduAI 2 года назад
Thank you for watching! Yep! Back on it :) Would love to hear which topic/model/algorithm are you most wanting to see on this channel. Will try to cover it in the upcoming videos.
@hubertkanyamahanga2782
@hubertkanyamahanga2782 11 месяцев назад
I am just speechless, this is unbelievable! Bravo!
@ClaudiaAcquistapace
@ClaudiaAcquistapace 7 месяцев назад
Totally in love with your explanations.. You are the light at the end of my personal tunnel trying to understand transformers in preparation of the lecture I have to give on this topic. I will mention your videos all the way through my lecture. Thanks so much for explaining it so clearly.
@McBobX
@McBobX 2 года назад
That is what I'm looking for, for 3 days now! Thanks a lot!
@Scaryder92
@Scaryder92 2 года назад
Amazing video, showing how the attention matrix is created and what values it assumes is really awesome. Thanks!
@devchoudhary8892
@devchoudhary8892 2 года назад
best, best best explanation on transformer, you are adding so much value to the world.
@laalbujhakkar
@laalbujhakkar 4 месяца назад
Amazing explanation! Best on RU-vid! totally under-rated! I feel fortunate to have found it. Thank you! :) 💐👏👏
@adityaghosh8601
@adityaghosh8601 2 года назад
Blown away by your explanation . You are a great teacher.
@ja100o
@ja100o Год назад
I'm currently reading a book about transformers and was scratching my head over the reason for the multi-headed attention architecture. Thank you so much for the clearest explanation yet that finally gave me this satisfying 💡-moment
@tahahuraibb5833
@tahahuraibb5833 Год назад
This has to be the best video ever! You should make more of these :)
@Abhi-qf7np
@Abhi-qf7np 2 года назад
You are the best😄😄, This is THE Best explanation I have ever seen on RU-vid for Transformer Model, Thank you so much for this video.
@adarshkone9384
@adarshkone9384 Год назад
have been trying to understand this topic for a long time , glad I found this video now
@wolfie6175
@wolfie6175 2 года назад
This is an absolute gem of a video.
@persianform
@persianform Год назад
The best explanation of attention models on the earth!
@wireghost897
@wireghost897 Год назад
Finally a video on transformers that actually makes sense. Not a single lecture video from any of the reputed universities managed to cover the topic with such brilliant clarity.
@cw9249
@cw9249 Год назад
you are amazing. ive watched other videos and read materials but nothing compares to your videos
@Ariel-px7hz
@Ariel-px7hz Год назад
Such a fantastic and detailed yet digestible explanation. As others have said in the comments, other explanations leave so many gaps. Thank you for this gem!
@cracksomeface
@cracksomeface Год назад
I'm a grad student currently applying NLP - this is literally the best explanation of self-attention I have ever seen. Thank you so much for a great vid!
@SOFTWAREMASTER
@SOFTWAREMASTER 11 месяцев назад
Most underrated video about transformers. Going to recommend this to everyone. Thankyou
@endgameisthejoke
@endgameisthejoke 11 дней назад
Absolutely beautifully and accurate guide on Multi-head Attention
@user-ne2nr2yi1h
@user-ne2nr2yi1h 7 месяцев назад
The best video I've ever seen for explaining transformer.
@cihankatar7310
@cihankatar7310 Год назад
This is the best explanation of transformers architecture with a lot of basic analogy ! Thanks a lot!
@rishiraj8225
@rishiraj8225 2 месяца назад
Coming back after a year, just to revise the basic concepts. It is still the best video on YT. Thanks Hedu AI
@madhusharath
@madhusharath 3 года назад
Wow! Truly wow! Your ability to explain complex stuff in layman terms + reference to well known series/anime shows how in-depth your understanding actually is!
@HeduAI
@HeduAI 3 года назад
Your comment made my day :)
@MikeAirforce111
@MikeAirforce111 4 месяца назад
My goodness, you have talent as a teacher!! :-) This builds a very good intuition about what is going on. Very impressed. Subscribed!
@giridharnr6742
@giridharnr6742 Год назад
Its one of the best explainations of Transformers. Just mind blowing.
@danielarul2382
@danielarul2382 Год назад
One of the best explanations on Attention in my opinion.
@pythondev2631
@pythondev2631 Год назад
The best video on multihead attention by far!
@hesona9759
@hesona9759 Год назад
The best video I've ever watched, thank you so much
@mariosconstantinou8271
@mariosconstantinou8271 Год назад
These videos are amazing, thank you so much! Best explanation so far!!
@robertco7
@robertco7 Год назад
This is very clear and well-thought out, thanks!
@jeremyhofmann7034
@jeremyhofmann7034 2 года назад
I’ve watched dozens of these and read as many articles and none have been able to explain in detail what self-attention is doing as well as this one. Finally I get it! Great work.
@HeduAI
@HeduAI 2 года назад
I feel so glad upon reading your comment! :) Mission served.
@mrkshsbwiwow3734
@mrkshsbwiwow3734 2 месяца назад
This is the best explanation of transformers on RU-vid.
@yassine20909
@yassine20909 Год назад
This is a great work, thank you. keep uploading. 👏
@shubheshswain5480
@shubheshswain5480 3 года назад
I went through many videos from Coursera, youtube, and some online blogs but none explained so clear about the Query, key, and values. You made my day.
@HeduAI
@HeduAI 3 года назад
Glad to hear this Shubhesh :)
@simonren4890
@simonren4890 2 года назад
This is the best. It is simple and tight. Please do more papers in the future.
@jonathanlarkin1112
@jonathanlarkin1112 3 года назад
Excellent series. Looking forward to Part 3!
@HeduAI
@HeduAI 3 года назад
“Part 3 - Decoder’s Masked Attention” is out. Thanks for the wait. Enjoy! Cheers! :D ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-gJ9kaJsE78k.html
@andybrice2711
@andybrice2711 3 месяца назад
This really is an excellent explanation. I had some sense that self-attention layers acted like a table of relationships between tokens, but only now do I have more sense of how the Query, Key, and Value mechanism actually works.
@bharanij6130
@bharanij6130 Год назад
Hello! This is an incredible explanation of Self Attention! Thank you!
@franzanders7762
@franzanders7762 2 года назад
I can't believe how good this is.
@minruihu
@minruihu Год назад
it is impressive, you explain so complicated topics in a vivid and easy way!!!
@hewas321
@hewas321 Год назад
No way. This video is insane!! The most accurate and excellent explanation of self-attention mechanism. Subscribed to your channel!
@suttonmattp
@suttonmattp Год назад
Honestly understood this far better from this 15 minute video than from the 90 minute university lecture I went to on the subject. Really excellent explanation.
@melihekinci7758
@melihekinci7758 Год назад
This is the best explanation I've ever seen!
@henrylouis5143
@henrylouis5143 2 года назад
Brilliant presentation, it's none other than the best I've seen. Great appreciation for your work!!! Cristal clear organization.
@HeduAI
@HeduAI 2 года назад
Thanks Henry! Glad you liked it :)
@newbie8051
@newbie8051 2 месяца назад
Ah this makes everything simple and make sense Thanks for the easy to follow explanation !
@haowenjohnwei7547
@haowenjohnwei7547 Год назад
The best video I ever had! Thank you very much!
@jirasakburanathawornsom1911
@jirasakburanathawornsom1911 2 года назад
Hand down the best transformer explanation. Thank you very much!
@madhu1987ful
@madhu1987ful Год назад
Wow. Just wow !! This video needs to be in the top most position when searched for content on transformers and their explanation
@HeduAI
@HeduAI Год назад
So glad to see this feedback! :)
@andrerocha155
@andrerocha155 9 месяцев назад
Congrats! Fantastic explanation. You really used your Transformer skills to "pay attention" to the most important issues in this matter and explained it as simply as needed to be understood even by a 10 years old child!
@sheerazahmad3131
@sheerazahmad3131 Год назад
Just Wow...........One of the best explaination out there. Thank you so much :)
Далее
This is why Deep Learning is really weird.
2:06:38
Просмотров 379 тыс.
Building and riding increasingly small bikes
18:01
Просмотров 446 тыс.