Layer Normalization in Transformers | Layer Norm Vs Batch Norm

CampusX

Подписаться 251 тыс.

Просмотров 17 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

27 окт 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 123

@abhisheksaurav 4 месяца назад

This playlist is like a time machine. I’ve watched you grow your hair from black to white, and I’ve seen the content quality continuously improve video by video. Great work!

@animatrix1631 4 месяца назад

I feel the same as well but I guess he's not that old

@zerotohero1002 3 месяца назад

Courage comes at a price ❤

@sachink9102 4 месяца назад

Thank you, NitishJi, Eeagerly waiting to attend your Transformers sessions. Please complete this series.

@RamandeepSingh_04 4 месяца назад

Another student added in the waiting list demanding for next video. Thank you sir.

@ayushrathore2570 4 месяца назад

This whole playlist is the best thing I discovered on RU-vid! Thank you so much, sir

@yashshekhar538 4 месяца назад

Respected Sir, your playlist is the best. Kindly increase the frequency of videos.

@muhammadsheraz177 4 месяца назад

Please end this playlist as early as possible

@sahil5124 4 месяца назад

this is really important topic. Thank you so much. Please cover everything about Transformer architecture

@akeshagarwal794 4 месяца назад

Congratulations for building a 200k Family you deserve even more reach🎉❤ We love you sir ❤

@PrathameshKhade-j2e 4 месяца назад

Sir try to complete this playlist as early as possible , you are the best teacher and we want to learn the deep learning concept from you

@ShivamSuri-lz5it 2 месяца назад

Excellent deep learning playlist , highly recommended !!

@ai_pie1000 4 месяца назад

Congratulations Brother for 200k users Family ... 👏👏👏

@nvnurav1892 3 месяца назад

Sir one small suggestion, aap apni videos pe speech to speech translation laga ke english mai convert kar lo and upload it on Udemy/youtube. it will help a lot of people jinko hindi nhi aati and will help your hard work get more and more attraction.🙂🙂. We are really very lucky that we are getting such rich content for free.. God bless you.

@just__arif Месяц назад

Top-quality content!

@sharangkulkarni1759 2 месяца назад

जबरदस्त,! मजा आगया, जिस तरह से padding के zeroes को लपेटे मे ले लिया, मजा आगया

@vinaykumar-xh5pi 3 месяца назад

please release the next video very curious to complete ...... loved your content as always

@SBhupendraAdhikari 2 месяца назад

Thanks a Lot Sir, Really enjoying the learning of Transformers

@udaysharma138 Месяц назад

Thanks a lot Nitish Sir , best Explanation

@AidenDsouza-ii8rb 3 месяца назад

Your DL playlist is like a thrilling TV series - can't wait for the next episode! Any chance we could get a season finale soon? Keep up the awesome work!

@praneethkrishna6782 Месяц назад

@campusx Hi Nitish, thanks a lot for the elaborated explanation. But I have a query, Is it really that the values '0' representing the padding tokens really the reason (or the only reason) which is stopping from using Batch Normalization. because it can be internally handled to not consider '0' which calulating the mean and stadard deviation while calulating z across features. on the other hand I think, this technique (Batch Norm) is clubbing the embeddings of different sentences while calulating Z which seems little odd to me. and that is the reason for not using this technique. please correct me if I am wrong here

@GanitSikho-xo2yx 4 месяца назад

Well, I am waiting for your next video. It's a gem of learning!

@ryannflin1285 Месяц назад

bhai literally mujhe samajh nhi aa rha hai ki mujhe samajh kaise aa rha, koi itna accha kaise padha sakta hai yrr,love u sir ( from IITJ)

@saurabhbadole821 4 месяца назад

I am glad that I found this Channel! can't thank you enough, Nitish Sir! One more request: If you could create one-shot revision videos for machine learning, deep learning, and natural language processing (NLP).🤌

@Fazalenglish 2 месяца назад

I really like the way you explain things ❤

@shibrajdeb5177 4 месяца назад

sir please upload regular video . This videos help me a lot. please sir upload regular videos

@taseer12 4 месяца назад

Sir I can't describe your efforts Love from Pakistan

@amitmehraa 3 месяца назад

please complete this this playlist and add transformers tutorials as soon as possible

@AmitBiswas-hd3js 4 месяца назад

Please cover this entire Transformer architecture as soon as possible

@ESHAANMISHRA-pr7dh 4 месяца назад

Respected sir, I request you to please complete the playlist. I am really thankful to you for your amazing videos in this playlist. I have recommended this playlist to a lot of my friends and they loved it too. Thanks for providing such content for free🙏🙏

@mayyutyagi 4 месяца назад

Amazing series full of knowledge...

@shreeyagupta5720 4 месяца назад

Congratulations for 200k sir 👏 🎉🍺

@ghousepasha4172 4 месяца назад

Please sir update videos regularly, we wait a lot for your videos

@rb4754 4 месяца назад

Congratulations for 200k subscribers!!!!!!!!!!!!!!!!!!

@rajnishadhikari9280 4 месяца назад

Thanks for this amazing series.

@znyd. 4 месяца назад

Congrats on the 200k subs, love from Bangladesh ❤.

@krisharora2959 3 месяца назад

Next video is awaited more than anything

@bmp-zz9pu 3 месяца назад

SIr krdo pls iss playlist ko poora!!!!!!!!!

@1111Shahad 4 месяца назад

Thank you Nitish, Waiting for your next upload.

@Shisuiii69 2 месяца назад

Question: Sir agr kia ho ky jo padding wala vector hai isme B¹ ki value 0 ky bjae khuch aur ajae ky wo update hoti rehti hai to is se padding vector 0 nhi rhe ga to kia ye model me affect nhi kre ga ?

@slaypop-b5n 13 дней назад

Bro Did u find the answer ? Had the same doubt

@Ishant875 5 дней назад

Same doubt

@AkashSingh-oz7qx Месяц назад

please also cover Generative and diffusion models.

@gurvgupta5515 4 месяца назад

Thanks for this video sir. Can you also make a video on Rotary Positional Embeddings (RoPE) that is used in Llama as well as other LLMs for enhanced attention.

@anonymousman3014 4 месяца назад

Sir, is transformer architecture completed as I want to cover it ASAP, I have covered the topics till attention mechanism. I want to cover the topic in one go. Sir please tell please. And, sir I request to upload all video asap. I want to learn a lot. And thanks for the amazing course at 0 cost. God bless you.

@SulemanZeb. 4 месяца назад

Please start MLOPs playlist as we are desperately waiting for.......

@physicskiduniya8054 4 месяца назад

Bhaiya! Awaiting for your course upcoming videos please try to complete this playlist asap bhaiya

@WIN_1306 4 месяца назад

at 46:10 ,why it is zero? as beta is added so it will prevent it from becoming zero?

@dilippokhrel4009 22 дня назад

Initially the gama value is kept 1 and beta is kept zero, hence initially the value will be zero. But during training process may be value will be other than zero

@advaitdanade7538 4 месяца назад

Sir please end this playlist fast placement season is nearby😢

@muhammadsheraz177 4 месяца назад

Sir kindly can you tell that when this playlist will complete.

@anonymousman3014 4 месяца назад

@WIN_1306 4 месяца назад

i am the 300th person to like this video sir plzz upload next vidoes we are eagerly waiting

@princekhunt1 3 месяца назад

Sir, Please complete this series.

@ishika7585 4 месяца назад

Kindly make video on Regex as well

@WIN_1306 4 месяца назад

what is regex?

@Xanchs4593 3 месяца назад

Can you pls explain what is the add in add and norm layer?

@arpitpathak7276 4 месяца назад

Thank you sir I am waiting for this video ❤

@darkpheonix6592 4 месяца назад

please upload remaining videos quickly

@himansuranjanmallick16 2 месяца назад

thank you sir................

@technicalhouse9820 4 месяца назад

Sir love you so much from Pakistan

@vinayakbhat9530 Месяц назад

excellent

@khatiwadaAnish 3 месяца назад

Awesome 👍👍

@arunkrishna1036 3 месяца назад

Sir what if Beta value is updated during learning process? Then it will get added along with the padded zeros making it a non zero value in the further iterations

@Shisuiii69 2 месяца назад

Same confusion, did you find the answer?

@barryallen5243 4 месяца назад

Just ignoring padded rows while performing batch normalization should also work, I feel like it that padded zeros are not the only reason we layer normalization instead of batch normalization.

@WIN_1306 4 месяца назад

how would you ignore padding cols in batch normalisation?

@shubharuidas2624 4 месяца назад

Please also continue with vision transformer

@Amanullah-wy3ur 4 месяца назад

this is helpful 🖤

@hassan_sid 4 месяца назад

It would be great if you make a video on RoPE

@gauravbhasin2625 4 месяца назад

Nitish, please relook at your covariate shift funds... yes, you are partially correct but how you explained covariate shift is actually incorrect. (example - Imagine training a model to predict if someone will buy a house based on features like income and credit score. If the model is trained on data from a specific city with a certain average income level, it might not perform well when used in a different city with a much higher average income. The distribution of "income" (covariate) has shifted, and the model's understanding of its relationship to house buying needs to be adjusted.)

@WIN_1306 4 месяца назад

ig , the explanation that sir gave and your explanation are same with different example of covariate shift

@intcvn 4 месяца назад

complete jaldi sir waiting asf

@not_amanullah 4 месяца назад

Thanks ❤

@sagarbhagwani7193 4 месяца назад

thanks sir plse complete this playlist asap

@29_chothaniharsh62 4 месяца назад

Sir can you please continue the 100 interview questions on ML playlist?

@peace-it4rg 4 месяца назад

sir mera doubt that ki mai agar transformer architecture mai batchnorm use karoon kunki jo values matrix mai hai un sabka apna learning rate and bias factor hai to jo bias hai uskai karan to zero chala hi jayega fir layer norm kyun. kyunki ham ((x-u)/var)*lambda+bias krtai hi hain to bias to apne aap usko zero nhi hone dega. Please help sir

@RamandeepSingh_04 4 месяца назад

still it will be a very small number and will affect the result and not represent the true picture of the feature in batch normalization.

@WIN_1306 4 месяца назад

@@RamandeepSingh_04 compared to others who are without padding it will be small, but still sir wrote zero but zero to nhi hi hoga

@manojprasad6781 4 месяца назад

Waiting for the next video💌

@SaurabhKumar-t4s 2 месяца назад

At 46:04, if sigma4 is 0 then how do we divide with this value.

@Shisuiii69 2 месяца назад

Mujhe bhi same confusion thi chatgpt se pta kiya usne kha ky hum ek error value add krty jo very close to zero hota hai to isliye hum zero likh dety hai after normalization

@turugasairam2886 Месяц назад

sir, why dont you translate it to english and upload , making a new channel like campusX english, i am sure it will attract more audience and reach. I am sure you would have thought of this already

@SANJAYTYAGI-bk6tx 4 месяца назад

Sir In batch normalization , in your example we have three mean and three variance along with same number of beta and gamma i.e. 3. But in layer normalization , we have eight mean and eight variance along with 3 beta and 3 gamma. That means number of beta and gamma are same in both batch and layer normalization. Is it correct? Pl elaborate on it .

@campusx-official 4 месяца назад

Yes

@WIN_1306 4 месяца назад

mean and variance are used for normalisation ,beta and gamma are used for scaling

@rose9466 4 месяца назад

Can you give an estimate by when this playlist will be completed

@virajkaralay8844 4 месяца назад

Absolute banger video again. Appreciate the efforts you're taking for transformers. Cannot wait for when you explain the entire transformer architecture.

@virajkaralay8844 4 месяца назад

Also, congratulations for 200k subscribers. May you reach many more milestones