No video :(

Optimization for Deep Learning (Momentum, RMSprop, AdaGrad, Adam)

DeepBean

Подписаться 3,2 тыс.

Просмотров 42 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

27 авг 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 38

@HojjatMonzavi 4 дня назад

As a junior AI developer, this was the best toturial of Adam and Other optimizers I've ever seen. Simply explained but not too simply to be a useless overview Thanks

@rhugvedchaudhari4584 9 месяцев назад

The best explanation I've seen till now!

@ukasz9625 5 дней назад

confirmed

@sokrozayeng7691 6 дней назад

Great Explaination! Thank you.

@zhang_han 10 месяцев назад

Most mind blowing thing in this video was what Cauchy did in 1847.

@saqibsarwarkhan5549 3 месяца назад

That's a great video with clear explanations in such a short time. Thanks a lot.

@EFCK555 20 дней назад

Good work man its the best explanation i have ever seen. Thank you so much for your work.

@AkhilKrishnaatg 5 месяцев назад

Beautifully explained. Thank you!

@dongthinh2001 8 месяцев назад

Clearly explained indeed! Great video!

@MrWater2 2 месяца назад

Wonderful explanation!!

@Justin-zw1hx Год назад

keep doing the awesome work, you deserve more subs

@tempetedecafe7416 8 месяцев назад

Very good explanation! 15:03 Arguably, I would say that it's not the responsibility of the optimization algorithm to ensure good generalization. I feel like it would be more fair to judge optimizers only on their fit of the training data, and leave the responsibility of generalization out of their benchmark. In your example, I think it would be the responsibility of model architecture design to get rid of this sharp minimum (by having dropout, fewer parameters, etc...), rather than the responsibility of Adam not to fall inside of it.

@idiosinkrazijske.rutine Год назад

Very nice explanation!

@markr9640 7 месяцев назад

Fantastic video and graphics. Please find time to make more. Subscribed 👍

@luiskraker807 7 месяцев назад

Many thanks, clear explanation!!!

@physis6356 4 месяца назад

great video, thanks!

@leohuang-sz2rf 4 месяца назад

I love your explaination

@benwinstanleymusic 5 месяцев назад

Great video thank you!

@rasha8541 8 месяцев назад

really well explained

@TheTimtimtimtam Год назад

Thank you this is really well put together and presented !

@makgaiduk 9 месяцев назад

Well explained!

@wishIKnewHowToLove Год назад

thank you so much :)

@MikeSieko17 5 месяцев назад

why didnt you explain the (1-\beta_1) term?

@donmiguel4848 5 месяцев назад

Nesterov is silly. You have the gradient g(w(t)) because the weight w is calculating in the forward the activation of the neuron and contributes to the loss. You don't have the gradient g(w(t)+pV(t)) because at this fictive position of the weight the inference was not calculated and so you don't have any information about what the loss contribution at that weight position would have been. It's PURE NONSENSE. But it only cost a few more calculations without doing much damage, so no one really seems to complain about it.

@wishIKnewHowToLove Год назад

Really? i didn't know SGD generalized better than ADAM

@deepbean Год назад

Thank you for your comments Sebastian! This result doesn't seem completely clear cut so may be open to refutation in some cases. For instance, one Medium article concludes that "fine-tuned Adam is always better than SGD, while there exists a performance gap between Adam and SGD when using default hyperparameters", which means the problem is one of hyperparameter optimization, which can be more difficult with Adam. Let me know what you think! medium.com/geekculture/a-2021-guide-to-improving-cnns-optimizers-adam-vs-sgd-495848ac6008

@wishIKnewHowToLove Год назад

@@deepbean it's sebastiEn with E.Learn how to read carefully :)

@deepbean Год назад

🤣

@deepbean Год назад

@@wishIKnewHowToLove my bad