This playlist is like a time machine. I’ve watched you grow your hair from black to white, and I’ve seen the content quality continuously improve video by video. Great work!
Sir one small suggestion, aap apni videos pe speech to speech translation laga ke english mai convert kar lo and upload it on Udemy/youtube. it will help a lot of people jinko hindi nhi aati and will help your hard work get more and more attraction.🙂🙂. We are really very lucky that we are getting such rich content for free.. God bless you.
Your DL playlist is like a thrilling TV series - can't wait for the next episode! Any chance we could get a season finale soon? Keep up the awesome work!
@campusx Hi Nitish, thanks a lot for the elaborated explanation. But I have a query, Is it really that the values '0' representing the padding tokens really the reason (or the only reason) which is stopping from using Batch Normalization. because it can be internally handled to not consider '0' which calulating the mean and stadard deviation while calulating z across features. on the other hand I think, this technique (Batch Norm) is clubbing the embeddings of different sentences while calulating Z which seems little odd to me. and that is the reason for not using this technique. please correct me if I am wrong here
I am glad that I found this Channel! can't thank you enough, Nitish Sir! One more request: If you could create one-shot revision videos for machine learning, deep learning, and natural language processing (NLP).🤌
Respected sir, I request you to please complete the playlist. I am really thankful to you for your amazing videos in this playlist. I have recommended this playlist to a lot of my friends and they loved it too. Thanks for providing such content for free🙏🙏
Question: Sir agr kia ho ky jo padding wala vector hai isme B¹ ki value 0 ky bjae khuch aur ajae ky wo update hoti rehti hai to is se padding vector 0 nhi rhe ga to kia ye model me affect nhi kre ga ?
Thanks for this video sir. Can you also make a video on Rotary Positional Embeddings (RoPE) that is used in Llama as well as other LLMs for enhanced attention.
Sir, is transformer architecture completed as I want to cover it ASAP, I have covered the topics till attention mechanism. I want to cover the topic in one go. Sir please tell please. And, sir I request to upload all video asap. I want to learn a lot. And thanks for the amazing course at 0 cost. God bless you.
Initially the gama value is kept 1 and beta is kept zero, hence initially the value will be zero. But during training process may be value will be other than zero
Sir, is transformer architecture completed as I want to cover it ASAP, I have covered the topics till attention mechanism. I want to cover the topic in one go. Sir please tell please. And, sir I request to upload all video asap. I want to learn a lot. And thanks for the amazing course at 0 cost.
Sir what if Beta value is updated during learning process? Then it will get added along with the padded zeros making it a non zero value in the further iterations
Just ignoring padded rows while performing batch normalization should also work, I feel like it that padded zeros are not the only reason we layer normalization instead of batch normalization.
Nitish, please relook at your covariate shift funds... yes, you are partially correct but how you explained covariate shift is actually incorrect. (example - Imagine training a model to predict if someone will buy a house based on features like income and credit score. If the model is trained on data from a specific city with a certain average income level, it might not perform well when used in a different city with a much higher average income. The distribution of "income" (covariate) has shifted, and the model's understanding of its relationship to house buying needs to be adjusted.)
sir mera doubt that ki mai agar transformer architecture mai batchnorm use karoon kunki jo values matrix mai hai un sabka apna learning rate and bias factor hai to jo bias hai uskai karan to zero chala hi jayega fir layer norm kyun. kyunki ham ((x-u)/var)*lambda+bias krtai hi hain to bias to apne aap usko zero nhi hone dega. Please help sir
Mujhe bhi same confusion thi chatgpt se pta kiya usne kha ky hum ek error value add krty jo very close to zero hota hai to isliye hum zero likh dety hai after normalization
sir, why dont you translate it to english and upload , making a new channel like campusX english, i am sure it will attract more audience and reach. I am sure you would have thought of this already
Sir In batch normalization , in your example we have three mean and three variance along with same number of beta and gamma i.e. 3. But in layer normalization , we have eight mean and eight variance along with 3 beta and 3 gamma. That means number of beta and gamma are same in both batch and layer normalization. Is it correct? Pl elaborate on it .
Absolute banger video again. Appreciate the efforts you're taking for transformers. Cannot wait for when you explain the entire transformer architecture.
Sir, is transformer architecture completed as I want to cover it ASAP, I have covered the topics till attention mechanism. I want to cover the topic in one go. Sir please tell please. And, sir I request to upload all video asap. I want to learn a lot. And thanks for the amazing course at 0 cost. God bless you.
Sir, is transformer architecture completed as I want to cover it ASAP, I have covered the topics till attention mechanism. I want to cover the topic in one go. Sir please tell please. And, sir I request to upload all video asap. I want to learn a lot. And thanks for the amazing course at 0 cost. God bless you.
Sir, is transformer architecture completed as I want to cover it ASAP, I have covered the topics till attention mechanism. I want to cover the topic in one go. Sir please tell please. And, sir I request to upload all video asap. I want to learn a lot. And thanks for the amazing course at 0 cost. God bless you.
Sir, is transformer architecture completed as I want to cover it ASAP, I have covered the topics till attention mechanism. I want to cover the topic in one go. Sir please tell please. And, sir I request to upload all video asap. I want to learn a lot. And thanks for the amazing course at 0 cost. God bless you.
Sir, is transformer architecture completed as I want to cover it ASAP, I have covered the topics till attention mechanism. I want to cover the topic in one go. Sir please tell please. And, sir I request to upload all video asap. I want to learn a lot. And thanks for the amazing course at 0 cost. God bless you.
Sir, is transformer architecture completed as I want to cover it ASAP, I have covered the topics till attention mechanism. I want to cover the topic in one go. Sir please tell please. And, sir I request to upload all video asap. I want to learn a lot. And thanks for the amazing course at 0 cost. God bless you.
Sir, is transformer architecture completed as I want to cover it ASAP, I have covered the topics till attention mechanism. I want to cover the topic in one go. Sir please tell please. And, sir I request to upload all video asap. I want to learn a lot. And thanks for the amazing course at 0 cost. God bless you.