anyone here from msa’s playlist edit: i left this cmt almost 2 years ago and i still get likes from it crazy how msa still has not found out this was in their playlist
Bro this might be the best video available on the entire internet for explaining transformers. I have studied, worked upon and implemented transformers but never have I been able to grasp it as simply and intuitively as you made it. You should really make more videos about anything you do or like, explain more algorithms, papers, implement them from scratch and stuff. Big thanks man.
This goes in my favorites list to recommend for others. You have the gift of teaching at a level rarely seen, distilling key concepts and patiently explaining every step, even from multiple angles. This teaches not only the subject, but to think in the domain of the subject. Please use this gift as much as you can :). Respect!
I am still waiting for part II. I haven't yet found the explanation better than this. The way you built the intuition on query, key and value which is the heart and soul of self attention mechanism is impeccable.
i've been trying to wrap my head around this stuff and between this video and chatgpt itself explaining and answering my questions i think i'm starting to get it. i dont think i will ever have the ability to calculate the derivatives of a loss function for gradient descent myself though
@@tvcomputer1321usually, u dont have to worry about calculating derivatives (not saying anyone shouldnt learn derivatives). but tools such as pytorch and tensorflow has autograd that does all that for you
In a landscape where lots of 5 to 15 minute videos exist where some weird dude stutters and stammers technical terms failing at both the attempt to hide their own lack of understanding about a machine learning topic and "summarizing" a vastly complex subject in a ridiculously short amount of time, you managed to explain a topic amazingly clean. That's how it should be done. Keep it up! Great work!
The BEST source of information I've come across on the internet about the intuition behind the Q,K and V stuff. PLEASE do part 2! You are an amazing teacher!
Can you please do a part 2? I'm usually not commenting on youtube videos but the way you explained the intuition of the first part was the best I've seen. Thank you so much, you gave me a lot of intuition!!
This was a fantastic video. I really hope you do the whole series on the "Attention is all you need" paper. It would be fantastic to cover the other parts of the architecture, as you said.
I've seen many videos on transformers that parrot the steps in the original paper with no explanation of why those steps matter. In contrast, this video actually gives an excellent intuitive explanation! Hands down, the best video explaining self-attention that I have seen...by a long shot.
This Video is a master piece. I really loved this video and explains in a very effective and simplified way. the complexities hidden behind the architecture is peeled layer by layer.. Hats off..
Having just read the attention is all you need paper with the intention to tackle a work problem with BERT and some specialized classification layers, your explanation here illuminates totally the self-attention mechanism component. Thanks a million times.
What an explanation. I read through other articles but couldnt figure out why they are doing what they are doing. But you nailed it with explaation for everything from dot product to weights and most importantly the meaning of Query and key values. Thanks a ton!
I came here after the Andrej Karpathys building GPT from scratch video. I have looked at many other videos, but this explains the self-attention mechanism best. Amazing work.
THANKS ! Out of so many videos on the Attention Mechanism, this is by far the best and the more intuitive which explains very well how the score is calculated. THANKS !
By far the best and most intuitive introduction to the concept of Self-Attention I've ever found anywhere! Really looking forward to watching more of your amazing videos.
Dude you need to make more videos. You have a gift. If you do a full series on some key deep learning concepts and things take off you could be onto a very lucrative channel with a lot of social good
I watched many videos explaining deep learning concepts. This one is without doubt one of the best. Keep up the great work! You have just earned another subscriber.
Beautiful! Congrats to Ark, this video is wonderful. I’ve read many papers and seen different videos but this one is a head above the rest in explaining each component AND the intuitions about WHY we are using these, which is the part often skipped in other videos which just cover structure and formulas but are missing the big picture simplicity of what is the purpose of each component. Please keep up this good work!
Wow , this is one of the most intuitive intuition I have found on transformer . Please make the second part as well , eagerly awaiting. Thanks a lot for this. :)
I'd just like to add my voice to the chorus of praise for your teaching ability. Thank you for offering this intuition for the scaled dot product attention architecture. It is very helpful. I hope that you'll have the time and inclination to continue providing intuition for other aspects of LLM architectures. All the best to you and yours!
Thanks bro, it's the BEST explanation of Attenction I had seen so far ( I have to say that I had seen many others ), looking forward the other Parts eventhough it's been almost 4 years since this Part1 !
I wish I could give you five thumbs up for this video. The diagrams along with the commentary provided the representations needed to comprehend the different aspects / steps to needed to understand the inner workings behind multi-headed attention while not delving too deep into the weeds. This is the best video i've ever watched in terms of explaining a more complex technical topic; it is like a gold standard for technical education videos in my book. Thank you.
Finally a video which talks about the learning part in transformers which plugs a big hole in all the other videos. Great, I am finally able to understand this. Thank you
Probably the best explanation I have found on Attention here. Thank you so much. Implementation and coding these will still be a task, but at least now have enough knowledge to know exactly what is happening under the hood of Transformers.
This was absolutely one of the better explanations that I've come across and I feel like i've watched a hundred different videos explaining attention. Thanks so much for putting in the time to make it, I look forward to the next one if you can get around to it!
This is amazing. Very nicely explained self-attention mechanism. It seems the you are gifted with amazing teaching qualities. Thanks for sharing the information.
One of the best explanations of the Transformer architecture I have ever seen. I hope you make some videos about the different variations of the architecture that popped up since the original paper, and about some of the details you overlooked in this video (e.g. masking).
Best explanation on the self attention mechanism on the internet. Please explain the other concepts in the paper if possible. Thanks for the intuition!
Really, really good explanation, with all the details needed to understand everything, I really appreciated going into visual details with the weights given by Softmax. Nice work, can't wait to see the other videos related to this one.
This is a very good explanation of the attention mechanism. The high-level intuition presented is superb. Your use of just enough details allows a someone to grasp the key ideas without getting lost in unnecessary complications. Thank you for this.
This was so detailed and clear! So many videos just gloss over the various blocks, but this made it much less of a mystery. We need a part 2!! The reason I'm personally intrigued by this is that I work with FPGAs designing computer architecture. I'm looking into how we can better build processors to handle these computations. Understanding all this on the lowest level is key
This is probably the best explanation of the general architecture of the self-attention model I've seen. I hope you can get around to a video on positional embedding and masking to complete the transformer network!
Amazing! I have no background in deep learning and yet I could understand every step of your explanation. Now, I am going to build a transformer from the scratch just for the fun of doing it and because you motivated me beyond words. Keep making more of these videos. Can you please make videos on vision transformers that are doing zero-shot classification, detection and segmentation. For example, DINO V2 by META that is launched recently.