**CORRECTIONS** 1) Shoutout to @SHUBHAM PACHORI and @gabewebyt: [11:20] "Sine or cosine function is used as a function of the dimension of the positional embedding -- so dimension 0 of the positional embedding vector uses sine; dimension 1 uses cosine, and so on." 2) The issue of variable lengths mentioned at [7:26] can actually be also resolved by setting a maximum sequence length (padding shorter sentences, trimming larger sentences to be of the maximum sequence length). *Timestamps* Here are the timestamps associated with the concepts covered in this video: 0:00 - Recap of Part 0 1:14 - Input Processing 2:19 - Word Embeddings 4:44 - Position Embeddings
At 2:56 the screen says 512 but you say 521. I guess the text on the screen is correct but can you confirm? Also the reference links in the video description do not work anymore.
@dev stuff While this is certainly late, when she says "against", she means that the input index value is separate from the array. This can be represented by a multidimensional array of embeddings [][512] or where the first array is the set of all possible tokens or could be represented by an object that contains both an index and the separate set of embeddings. Consider a TokenSet object that has the following shape: double Value; double[512] Embeddings; In this example, the Value is the index value of the token / word. The Embeddings are an array that is associated with this token. The Embeddings change as the network trains (and are initially random), but the index values never change and will be used to retranslate the decisions of the network back into the English word.
THANK YOU! I watched almost 10 different videos about Transformer networks and you're the only person who cared to explain that the vector representations are determined randomly and learned through backpropagation. The visuals are very helpful too. I'll watch the whole video series!!
After watching the whole series (the 3 episodes), I can very confidently say that this is the clearest, most succinct, and most useful explanation of transformers on YT that I've come across. Thank you!!
OMG...this video is filling the gap among most Transformer explanation! Most of them are just emphasis the part of Multi-Head Attention! Thanks you truely!
To those who hesitate watching this it is the best explanation of the transformers I ever seen and I saw a lot of them. A good plus with this video is the one by Yann Lecun on associative memories and transformers on NYU.
Best explanation so far with simplicity, coherent animations without compromising the value . Don't know why but whenever I hear a women or feminine voice explaining any concept it just goes so smoothly into my brain ..no friction. and main part is teaching efficiently . 😅.. anyways Thank-you
Your Transformer model tutorials are the best I have seen for explaining exactly how the process actually works. Thank you for taking the time to put this together with clear explanations. You know a person has a true understanding of a subject if they are able to clearly explain it to others.
Hi at 11:14 I guess, they did not use sine and cosine for different positions of embeddings, they used cosine and sine for odd and even embedding dimension at the same pos, where "i" varies from 0 to dim/2. BTW, thanks for the awesome tutorial.
First person to concretely explain why they use a periodic function, which in my mind would give the same position embedding when you come back to the same point on the curve. Thank you!
Watched a vast number of videos and read papers on positional encoders, how Q,K, V matrices work in the attention mechanism and why we need multi-head attention. You are far ahead of everybody else in transferring knowledge to others with utmost clarity and ease. Thanks Batool Haider.
the best, and the most understandable about transformer architecture.... this video is consice with a great explanation, you can perfectly get the intuition... thanks
This video is the most insightful/easy to understand video for transformer related I have ever watched so far. Thanks for putting the effort into this.
The best explanation of transformers hands down, saying that after watching all of the transformers video on youTube. Just one thing- I can't find part 0 of this video
Dear Batool, thanks for this amazing and by far best explanation of the whole topic. I'm currently writing my master's thesis and you're an enormous help to this!
very smooth video especially that how frequency and indices part creates positional embedding and why other methods were rejected. God bless teachers like you.
All the content I have seen coming from this channel is just incredible, the questions you ask are perfect for me, sometimes what you ask and then respond is what I was asking myself, sometimes the questions you put aren't what I was thinking but is like yeah the answer to that question is what I needed! Thank you very much!!
Same thing here. This is really a great way of teaching. This video is by far the best. I was so frustrated not to find answers to my questions then i came across this video. Awsome!!👏👏👏 Thanks a lot!!.
this is the best channel I have found try to learn deep learning on my own, most search results from google or youtube are just super shallow articles, or video that has zero depth . This channel is probably better than go to most colleges. This is the first video on youtube that I feel bad watching it, because I did not pay. You should have setup some system for people's donation.
You are providing information in a very accurate manner as well as in very understanding manner. Can you please share this presentation file? It would be very helpful.
Now I understand the complex computations in AI coding. However, to complete my understanding, please prepare another video (1) showing in parallel the corresponding code -- in Python, C++, or Java; and (2) Parts which were omitted. How is the inputting done? Is it by running a program or by responding to a prompt? What is fine tuning exactly? How is it done? What is a data set? Why is a foundation model very expensive? By the way, the corrections in the 3 videos were very useful.
Really Awesome explanation of Positional Embedding of word vectors, I was pretty confused on seeing the pytorch code, you gave a really nice intuition of the phenomenon.
@@HeduAI implemented it as well 🙂, the code was using simple broadcasting concept of Pytorch , if we think of it in vector terms for each Vector in an embedding we are essentially giving sin and cos to actually mark the how much they vary in the feature space ( purely from NLP perspective )
If you are here to learn about transformers, I suggest stay and learn only the content from this channel. This would suffice to know literally everything about transformers.