The Fundamental Problem with Neural Networks - Vanishing Gradients

Подписаться 166 тыс.

Просмотров 13 тыс.

50% 1

Why vanishing gradients are the biggest issue with neural networks.
My Patreon : www.patreon.co...
Intro to Neural Networks : • Intro to Neural Networ...
Backpropagation : • Backpropagation : Data...
The Sigmoid : • The Sigmoid : Data Sci...

Опубликовано:

3 окт 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 30

@HoHoHaHaTV Год назад

My respect for Ritvik goes exponentially high each time I see his explanations. He can beat any prof when it comes to explaining these things. I just feel so lucky to have come across this channel

@erickmacias5153 2 года назад

I just found this channel like 3 days ago and has been very useful and interesting! Thank you very much

@ritvikmath 2 года назад

Of course!

@shubhampandilwar8448 2 года назад

Very well explained. I alwayd had trouble understanding this topic, but this video helped me to comprehend the topic intuitively.

@ritvikmath 2 года назад

Thanks!

@siddhantrai7529 2 года назад

Very well addressed, thank you for the video.

@ritvikmath 2 года назад

Thanks!

@geoffreyanderson4719 2 года назад

Yes, great topic. Absolutely, some top ways to fight off vanishing gradients are relu (and other advanced activation functions), and residual nets (skip nets). It's also quite possible to add your own custom resnets to any deep network; it's not neessary to use only the resnet blocks that the framework tool provides. Tensorflow's functional api makes it pretty straightforward to add skip layers plus the necessary aggregation layer to combine the main path plus the skip path, to layer types other than convolutional and computer vision specific. So while resnet was originally designed with compujter vision purpose, it's not married to that at all. Additional solid help to fight those vanishing gradients off, are - batch normalization, which basically conditions the signal to the next hidden layer; and - smarter initialization of weights in all your layers, like He Initialization when using Relu (and other initializations that are suited to other activation functions).

@CptJoeCR 2 года назад

Love it as always! May I suggest a future video topic: Bayesian Change Point Detection. BCP has so many components that you already have covered (sampling techniques, MCMC, Bayesian statistics) that I think it would make for a great video! (and I'm still slightly confused how it all comes together in the end! lol)

@tomdierickx5014 2 года назад

Another gem! Great insights! 🔬

@ChocolateMilkCultLeader 2 года назад

You can also add some bias to your networks in one of the intermediate layers

@arontapai5586 2 года назад

Very informative!!! Are you planning to make videos on RNN (LSTM...) and other type of network models?

@ritvikmath 2 года назад

Yup they'll likely be coming out within the next month!

@pushkarparanjpe Год назад

Some questions inspired by your video. - Earliest layers see the severest form of vanishing gradient. Do later layers undergo vanishing gradient sequentially? - So what if the earliest layer weights get stuck; learning can still happen due to weight updates at later layers, right ? - Can we use vanishing gradient for neural architecture depth search ? Start with many layers; train; identify the early layers that got stuck; discard them and keep a shallower network. This sounds like there is something wrong with it; will this work ?

@shreypatel9379 2 года назад

One of the best channels i've found on youtube (along the lines of 3Blue 1Brown and rest such channels). Keep up the good work

@listakurniawati8946 2 года назад

Omg thank you so much!!! You saved my thesis ❤❤

@marvinbcn2 2 года назад

Excellent video. You perfectly convey the intuition. Only one doubt left: I cannot see why ReLu is a good solution, given that gradient vanishes to 0 in case of negative values. How do we compute backpropagation then?

@posthocprior 2 года назад

Excellent explanation.

@christophersolomon633 2 года назад

Outstanding Video. Really well explained.

@pushkarparanjpe Год назад

Thanks once again !

@MachineLearningStreetTalk 2 года назад

Awesome video! Nice channel

@seeking9145 2 года назад

Super nice explanation!!!

@ChocolateMilkCultLeader 2 года назад

Never heard of someone call it the most important problem. Interesting view point.

@geoffreyanderson4719 2 года назад

Great depth is where you get the most exponentiation effect, thus the worst vanishing or explosion. But great depth is where the bulk of the power of deep neural nets comes from.