Тёмный
No video :(

Optimization for Deep Learning (Momentum, RMSprop, AdaGrad, Adam) 

DeepBean
Подписаться 3,2 тыс.
Просмотров 42 тыс.
50% 1

Опубликовано:

 

27 авг 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 38   
@HojjatMonzavi
@HojjatMonzavi 4 дня назад
As a junior AI developer, this was the best toturial of Adam and Other optimizers I've ever seen. Simply explained but not too simply to be a useless overview Thanks
@rhugvedchaudhari4584
@rhugvedchaudhari4584 9 месяцев назад
The best explanation I've seen till now!
@ukasz9625
@ukasz9625 5 дней назад
confirmed
@sokrozayeng7691
@sokrozayeng7691 6 дней назад
Great Explaination! Thank you.
@zhang_han
@zhang_han 10 месяцев назад
Most mind blowing thing in this video was what Cauchy did in 1847.
@saqibsarwarkhan5549
@saqibsarwarkhan5549 3 месяца назад
That's a great video with clear explanations in such a short time. Thanks a lot.
@EFCK555
@EFCK555 20 дней назад
Good work man its the best explanation i have ever seen. Thank you so much for your work.
@AkhilKrishnaatg
@AkhilKrishnaatg 5 месяцев назад
Beautifully explained. Thank you!
@dongthinh2001
@dongthinh2001 8 месяцев назад
Clearly explained indeed! Great video!
@MrWater2
@MrWater2 2 месяца назад
Wonderful explanation!!
@Justin-zw1hx
@Justin-zw1hx Год назад
keep doing the awesome work, you deserve more subs
@tempetedecafe7416
@tempetedecafe7416 8 месяцев назад
Very good explanation! 15:03 Arguably, I would say that it's not the responsibility of the optimization algorithm to ensure good generalization. I feel like it would be more fair to judge optimizers only on their fit of the training data, and leave the responsibility of generalization out of their benchmark. In your example, I think it would be the responsibility of model architecture design to get rid of this sharp minimum (by having dropout, fewer parameters, etc...), rather than the responsibility of Adam not to fall inside of it.
@idiosinkrazijske.rutine
@idiosinkrazijske.rutine Год назад
Very nice explanation!
@markr9640
@markr9640 7 месяцев назад
Fantastic video and graphics. Please find time to make more. Subscribed 👍
@luiskraker807
@luiskraker807 7 месяцев назад
Many thanks, clear explanation!!!
@physis6356
@physis6356 4 месяца назад
great video, thanks!
@leohuang-sz2rf
@leohuang-sz2rf 4 месяца назад
I love your explaination
@benwinstanleymusic
@benwinstanleymusic 5 месяцев назад
Great video thank you!
@rasha8541
@rasha8541 8 месяцев назад
really well explained
@TheTimtimtimtam
@TheTimtimtimtam Год назад
Thank you this is really well put together and presented !
@makgaiduk
@makgaiduk 9 месяцев назад
Well explained!
@wishIKnewHowToLove
@wishIKnewHowToLove Год назад
thank you so much :)
@MikeSieko17
@MikeSieko17 5 месяцев назад
why didnt you explain the (1-\beta_1) term?
@donmiguel4848
@donmiguel4848 5 месяцев назад
Nesterov is silly. You have the gradient g(w(t)) because the weight w is calculating in the forward the activation of the neuron and contributes to the loss. You don't have the gradient g(w(t)+pV(t)) because at this fictive position of the weight the inference was not calculated and so you don't have any information about what the loss contribution at that weight position would have been. It's PURE NONSENSE. But it only cost a few more calculations without doing much damage, so no one really seems to complain about it.
@wishIKnewHowToLove
@wishIKnewHowToLove Год назад
Really? i didn't know SGD generalized better than ADAM
@deepbean
@deepbean Год назад
Thank you for your comments Sebastian! This result doesn't seem completely clear cut so may be open to refutation in some cases. For instance, one Medium article concludes that "fine-tuned Adam is always better than SGD, while there exists a performance gap between Adam and SGD when using default hyperparameters", which means the problem is one of hyperparameter optimization, which can be more difficult with Adam. Let me know what you think! medium.com/geekculture/a-2021-guide-to-improving-cnns-optimizers-adam-vs-sgd-495848ac6008
@wishIKnewHowToLove
@wishIKnewHowToLove Год назад
@@deepbean it's sebastiEn with E.Learn how to read carefully :)
@deepbean
@deepbean Год назад
🤣
@deepbean
@deepbean Год назад
@@wishIKnewHowToLove my bad
@dgnu
@dgnu Год назад
@@wishIKnewHowToLove bruh cmon the man is being nice enough to u just by replying jesus
@Stopinvadingmyhardware
@Stopinvadingmyhardware Год назад
nom nom nom learn to program.
Далее
How YOLO Object Detection Works
17:04
Просмотров 32 тыс.
Solve any equation using gradient descent
9:05
Просмотров 53 тыс.
Секрет фокусника! #shorts
00:15
Просмотров 33 млн
Only I get to bully my sister 😤
00:27
Просмотров 30 млн
The Grandfather Of Generative Models
33:04
Просмотров 52 тыс.
The Most Important Algorithm in Machine Learning
40:08
Просмотров 391 тыс.
Application of Calculus in Backpropagation
14:45
Просмотров 17 тыс.
Optimizers - EXPLAINED!
7:23
Просмотров 115 тыс.
Why Does Diffusion Work Better than Auto-Regression?
20:18
Top Optimizers for Neural Networks
29:00
Просмотров 8 тыс.
Секрет фокусника! #shorts
00:15
Просмотров 33 млн