Тёмный

Machine Learning Lecture 36 "Neural Networks / Deep Learning Continued" -Cornell CS4780 SP17 

Kilian Weinberger
Подписаться 22 тыс.
Просмотров 14 тыс.
50% 1

Опубликовано:

 

8 окт 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 40   
@ahmedmustahid4936
@ahmedmustahid4936 4 года назад
These are the best lectures on ML I have come across so far
@jachawkvr
@jachawkvr 4 года назад
The lecture was so much fun, I used to think that deep learning is somehow very different than the other ml algos, but I realize now that this is not the case. But, the shocking thing was how accurate the neural net's predictions were at the end, simply based on the photo.
@deltasun
@deltasun 4 года назад
beautiful and intuitive explanation of why SGD is working for NNs even if it is so stupid as an optimization algorithm. Really illuminating. I've been struggling with this thing for months. thank you! this course is by far the most insightful course on ML I've ever seen
@gaconc1
@gaconc1 3 года назад
Great intellectual insight delivered passionately! Many thanks, Prof Kilian!
@ForcesOfOdin
@ForcesOfOdin 8 месяцев назад
The intuition here was so so satisfying. The way it all comes together at the end, when he points out that the sigmoidal functions people used to use (because of emulating neuronal activation functions) have these flat parts which slow down the gradient. Not only is the slowed learning bad, but that slowed learning dampens the ability of the noisy SGD to escape the thin deep wells which represent ideal parameters only for a SPECIFIC data set. I.e. the thin deep wells = overfitting, the noise of SGD escapes them along with big alpha, and a slowed gradient from the sigmoidal flat parts causes an effective reduction in learning rate, which leads to getting trapped in the wells even with SGD, which causes overfitting. Just awesome.
@in100seconds5
@in100seconds5 4 года назад
I am really grateful, really useful contents and very practical
@lastfirst4073
@lastfirst4073 Год назад
I love the fact that he searched himself at rate my professor. 50:53 I rate you a 10/10. Thank you professor.
@michaelmellinger2324
@michaelmellinger2324 2 года назад
2:00 Begin 2:25 Neural networks a just a simple extension of linear classifiers 5:00 Chain rule 15:30 Gradient descent 16:00 No longer working with convex functions because of the transition function. Where we start matters. Not with all zero vector. Initialization is a big deal. 20:25 SGD is really important 28:45 We end up at some of the large holes (not necessarily deep) that we can’t escape from. Throwing away training data and test data gives us a different function. Wider less likely to change a lot. SGD can only find these! 33:40 Two tricks: mini-batch and initial large learning rate then lower by a factor of 10. 37:40 If you wanted to do bagging with neural networks… Ensemble several networks and don’t need to resample. 38:50 Why they’re called neural networks. 43:00 Discusses why ReLU, which is non-differentiable. Good at not getting trapped in local minima 44:50 Demo 46:00 ReLU is better at complex problems but not smaller problems like the demo. 47:00 playground.tensorflow.org demo
@kodjigarpp
@kodjigarpp 3 года назад
Thank you so much for your teaching, this is the best content I found in four years in this field. I am close to applying for a PhD in your laboratory haha!
@sandeepreddy6295
@sandeepreddy6295 4 года назад
Great Lecture! You made learning fun. Complexity in understanding the concepts through Bishop's book vs these lectures = huge number, even though there is no doubt that the book may be one of the best.
@sudhanshuvashisht8960
@sudhanshuvashisht8960 4 года назад
Hey I'm planning to start BISHOP's book after finishing this, is this the right thing to do or are you saying that book is just complexly defined version of Prof. Killian lectures?
@sandeepreddy6295
@sandeepreddy6295 4 года назад
@@sudhanshuvashisht8960, It's complexly defined; Someone like the professor is the right one to say what to do or what not to do.
@RedPillDS
@RedPillDS 3 года назад
Not the professor we want but the one we NEED !!! Just Awesome ...
@vatsan16
@vatsan16 4 года назад
Am I the only one who is sad that I have only one more lecture to go? :( (Of course, I will probably come back to some of the classes but there is nothing like discovering it for the first time)
@husamalsayed8036
@husamalsayed8036 3 года назад
thanks for the lecture as you said if you have enough data the function of the training set is close to the testing set , but any way the SGD tends to go to the wider local minimum , in such case isn't better to use other classifier which put you on much narrow local minimum because as you said the function would be close to the function of the training data
@kilianweinberger698
@kilianweinberger698 3 года назад
The danger is that the loss surface changes as you switch to different (test) data. So a narrow minimum in the training set may actually not be very deep for the test data. Wider minima are often considered more stable.
@gregmakov2680
@gregmakov2680 2 года назад
the layers in NN could be considered as filters. One layer = one filter. NN has behaviour as cascaded filters.
@med0897
@med0897 4 года назад
I first would like to thank you for these amazing lectures on ML ! I just have a question about the SGD. Can we say that the SGD can escape the local minima because the landscape of the single loss function ( or batch loss function ) is different from the loss function of the whole dataset ?
@kilianweinberger698
@kilianweinberger698 4 года назад
To some degree, but it is important that you are changing the mini-batch for every gradient update. Probably a better way to think about it is that it is not that easy to get stuck in local minima / saddle points. You need the precise gradient information to hit it exactly (a little like hitting the moon with a rocket - only if you aim very carefully will you be successful). If you estimate your gradient with a mini-batch your gradients will be way too noisy to hit the local minima, and you will shoot past it - eventually converging near a global minima from which it is very hard to escape. Hope this helps.
@med0897
@med0897 4 года назад
@@kilianweinberger698 Thank you for the reply !
@sayantanmitra7567
@sayantanmitra7567 5 лет назад
Sir could we get access to all the project you assign to students? That would really help us.
@in100seconds5
@in100seconds5 4 года назад
Sayantan Mitra not on this one but on other videos there is a link for home works
@tostupidforname
@tostupidforname 4 года назад
@@in100seconds5 Oh really? I somehow missed that. Thats amazing!
@in100seconds5
@in100seconds5 4 года назад
yeah link of homework, but not projects. Apparently dear Kilian does not share them publicly because the solution might become available to future Cornell students.
@tostupidforname
@tostupidforname 4 года назад
@@in100seconds5 That's unfortunate but understandable.
@pratoshraj3679
@pratoshraj3679 4 года назад
How the cost function of neural networks is non-convex and is this the case all the time?
@kilianweinberger698
@kilianweinberger698 4 года назад
Yes, unless you have no hidden layers, in which case you obtain something like logistic regression.
@kilianweinberger698
@kilianweinberger698 4 года назад
I have to add ... Or in case you have no non-linear transition functions, in which case you would also get something like logistic regression :-)
@prateekpatel6082
@prateekpatel6082 2 года назад
Can you please share pointers to : sgd finds good minimas
@Bmmhable
@Bmmhable 4 года назад
at 11:38, I think it should be da/dU = phi_prime(x).
@zelazo81
@zelazo81 4 года назад
it's correct on the blackboard because you differentiate with respect to U and then \phi(x) is a 'constant'.
@Shkencetari
@Shkencetari 5 лет назад
Can we access practical exams?
@kilianweinberger698
@kilianweinberger698 5 лет назад
Yes, actually you can download them here: www.dropbox.com/s/zfr5w5bxxvizmnq/Kilian%20past%20Exams.zip
@Shkencetari
@Shkencetari 5 лет назад
@@kilianweinberger698 Thank you very much. I really appreciate it.
@jiviteshsharma1021
@jiviteshsharma1021 4 года назад
@@kilianweinberger698 Thank You so much
@udiibgui2136
@udiibgui2136 3 года назад
@@kilianweinberger698 Hi Kilian thank you for these amazing lessons and resources! The files have seen to be deleted, is there a new link?
@shrishtrivedi2652
@shrishtrivedi2652 3 года назад
2:00
@gregmakov2680
@gregmakov2680 2 года назад
hahah, I love to be lazy :D:D moi thu khai niem bi dao lon het :D:D:D
Далее
The Most Important Algorithm in Machine Learning
40:08
Просмотров 454 тыс.
MIT 6.S191: Convolutional Neural Networks
1:07:58
Просмотров 77 тыс.
How convolutional neural networks work, in depth
1:01:28
Просмотров 207 тыс.
MIT Introduction to Deep Learning | 6.S191
1:09:58
Просмотров 615 тыс.