No video :(

25. Stochastic Gradient Descent

MIT OpenCourseWare

Подписаться 5 млн

Просмотров 85 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

28 авг 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 72

@elyepes19 3 года назад

For those of us who are newcomers in ML, it's most enlightening to know that unlike "pure optimization" that aims to find the most exact minimum possible, ML aims instead to be "close enough" to the minimum in order to train the ML engine, if you get " too close" to the minimum an over-fit of your training data might occur. Thank you so much for the clarification

@rogiervdw 4 года назад

This is truly remarkabe teaching. Greatly helps understanding and intuition of what SGD actually does. prof. Sra's proof of SGD convergence for non-convex optimization is in prof. Strang's excellent book "Linear Algebra & Learning From Data", p.365

@JatinThakur-dv7mt Год назад

Sir you are a student from lalpani school shimla. You were the topper in +2. I am very happy for you. You have reached at a level where you truly belonged to. I wish you more and more success.

@ASHISHDHIMAN1610 Год назад

I am from Nahan, and I’m watching this from Ga Tech :)

@schobihh2703 11 месяцев назад

MIT is simply the best teaching around. Really deep insights again. Thank you.

@rembautimes8808 2 года назад

Amazing for MIT to make such high quality lectures available worldwide. Well worth time investment to go thru these lectures. Thanks Prof Strang & Prof Suvrit & MIT

@trevandrea8909 Месяц назад

I love the way the professor teaches in this lecture and video. Thank you so much!

@sukhjinderkumar2723 2 года назад

Hands Down one of the most intersting lectures, The way Professor showed reseach ideas here and there and almost everywhere just blows me away, It was very very intersting, and best part is it is afforable to non-Math guys too, (thought its coming from a maths guy, however I feel like math part of very little, it was more towards intuitive side of SGD)

@Vikram-wx4hg 3 года назад

What a beautiful beautiful lecture! Thank you Prof. Suvrit!

@minimumlikelihood6552 Год назад

That was the kind of lecture that deserved applause!

@BananthahallyVijay 2 года назад

Wow! That was one great talk. Prof. Suvrit Sra's done a great job in giving examples just light enough to drive the key ideas of SGD.

@cobrasetup703 2 года назад

Amazing lecture, i am delighted by the smooth explanation of this complex topic! Thanks

@georgesadler7830 3 года назад

Professor Suvrit Sra, thank for a beautiful lecture on Stochastic Gradient Descent and it's impact on machine learning. This powerful lecture help me understand something about machine learning and it's overall impact on large companies.

@RAJIBLOCHANDAS 2 года назад

Really extraordinary lecture. Very lucid but highly interesting. My research is on 'Adaptive signal processing'. However, I enjoyed this lecture most. Thank you.

@benjaminw.2838 9 месяцев назад

Amazing class!!!!!!!!!!!! not only for ML researchers but also for ML practitioners.

@notgabby604 Год назад

Very nice lecture. I will seeming go off topic here and say that an electrical switch is one-to-one when on and zero out when off. When on 1 volt in gives 1 volt out, 2 volts in gives 2 volts out etc. ReLU is one-to-one when its input x is >=0 and zero out otherwise. To convent a switch to ReLU you just need a attached switching decision x>=0. Then a ReLU neural networks is composed of weighted sums that are connected and disconnected from each other by the switch decisions. Once the switch states are known then you can simplify the weighted sum composits using simple linear algebra. Each neuron output anywhere in the net is some simple weighted sum of the input vector. AI462 blog.

@gwonchanyoon7748 3 месяца назад

beautiful class room!

@tmusic99 2 года назад

Thank you for an excellent lecture! Give me a clear track for development.

@holographicsol2747 2 года назад

Thank you, you are an excellent teacher and I learned, thank you

@scorpio19771111 2 года назад

Good lecture. Intuitive explanations with specific illustrations

@TrinhPham-um6tl 3 года назад

Just a litte typo that I came across throught out this perfect lecture is the "confusion region": min(a_i/b_i) and max (a_i/b_i) should be min(b_i/a_i) and max (b_i/a_i). Generally speaking, this lecture is the best explanation on SGD I have ever seen. Again, thank you prof. Sra and thank you MITOpenCourseWare so so much 👍👏 P/s: Any other resources that I've read explained SGD so complicatedly 😔

@anadianBaconator 3 года назад

this guy is fantastic!

@nayanvats3424 5 лет назад

couldn't have been better....great lecture.... :)

@jfjfcjcjchcjcjcj9947 4 года назад

Very clear and nice, to the point.

@pbawa2003 2 года назад

This is Gr lecture though took me little time to prove the gradient descent lies in range of region of confusion with min and max been individual sample gradients

@taasgiova8190 2 года назад

Fantastic, excellent lecture thank you.

@hj-core 10 месяцев назад

An amazing lecture!

@rababmaroc3354 4 года назад

well explained, thank you very much professor

@NinjaNJH 4 года назад

Very helpful, thanks! ✌️

@BorrWick 4 года назад

i think there is a very small mistake in the graph of (a_i*x-b)^2. The confusion area is bound is not a_i/b_i but b_i/a_i

@KumarHemjeet 3 года назад

What an amazing lecture !!

@3g1991 4 года назад

Anyone have the proof he didn't have time for regarding stochastic gradient in non-convex case.

@josemariagarcia9322 4 года назад

Simply brilliant

@cevic2191 2 года назад

Many thanks Great!!!

@neoneo1503 2 года назад

"shuffle" in practice or "random pick" in theory on 42:00

@haru-1788 2 года назад

Marvellous!!!

@xiangyx 3 года назад

fantastic

@grjesus9979 Год назад

So, when using tensorflow or keras, when you set batch size = 1, there is as many iterations as samples in the entire training dataset. So my question is where is the random in "stochastic" gradient descent coming from?

@MohanLal-of8io 4 года назад

what GUI software professor Suvrit is using to change the step size instantly?

@brendawilliams8062 3 года назад

I don’t know but it would have to transpose numbers of a certain limit it seems to me.

@vinayreddy8683 4 года назад

Prof assumed all the variables are scalars so, while moving loss towards down hill or local minimum; how does loss function is guided to minimum without any directions (scalar property)

@fatmaharman3842 4 года назад

excellent

@fishermen708 5 лет назад

Great.

@SHASHANKRUSTAGII 3 года назад

Andrew NG didn't explain it in this detail That is why MIT is MIT, Thanks professor.

@sadeghadelkhah6310 2 года назад

10:31 the [INAUDIBLE] thing is "Weight".

@mitocw 2 года назад

Thanks for the feedback! The caption has been updated.

@akilarasan3288 10 месяцев назад

I would use MCMC to compute n sum to answer 14:00

@watcharakietewongcharoenbh6963 2 года назад

How can we find his 5 lines proof of why SGD works? It is fascinating.

@kethanchauhan9418 5 лет назад

what is the best book or resource to learn the whole mathematics behind stochastic gradient descent?

@mitocw 5 лет назад

The textbook listed in the course is: Strang, Gilbert. Linear Algebra and Learning from Data. Wellesley-Cambridge Press, 2019. ISBN: 9780692196380. See the course on MIT OpenCourseWare for more information at: ocw.mit.edu/18-065S18.

@brendawilliams8062 3 года назад

Does this view and leg of math believe there is an unanswered Reiman hypothesis?

@JTFOREVER26 3 года назад

Can anyone here care to explain how in the example in one dimension, when choosing a scalar outside R it grants that the stochastic gradient and the full gradient has the same sign? (corresponding to 30:30 - 31:00 ish in the video) Thanks in advance!

@ashrithjacob4701 Год назад

Since f(x) can be thought of as a sum of quadratic functions ( each function corresponding to one data point) with a minima at bi/ai. When we are outside the region R, then the minima of all the functions lies on the same side to where we are and as a result all their gradients have the same sign

@Tevas25 5 лет назад

A link to the Matlab simulation prof Suvrit shows would be great

@techdo6563 5 лет назад

fa.bianp.net/teaching/2018/COMP-652/ found it

@SaikSaketh 4 года назад

@@techdo6563 Awesome

@medad5413 3 года назад

@@techdo6563 thank you

@robmarks6800 2 года назад

Leaving the proof as a cliffhanger, almost worse than Fermat…

@papalau6931 Год назад

You can find the proof by Prof. Survit Sra from Prof. Gilbert Strang's book titled "Linear Algebra and Learning from Data".

@ac2italy 4 года назад

He cited images as an example for large feature set : nobody use standard ML for images, we use Convolution.

@elyepes19 3 года назад

I understand he is referring to Convolutional Neural Networks as a tool for image analysis as a generalized example

@tuongnguyen9391 Год назад

Where can I obtain professor sra's slide ?

@mitocw Год назад

The course does not have slides of the presentations. The materials that we do have (problem sets, readings) are available on MIT OpenCourseWare at: ocw.mit.edu/18-065S18. Best wishes on your studies!

@tuongnguyen9391 Год назад

@@mitocw Thank you, I think I gues I just noted everything down

@brendawilliams8062 3 года назад

It appears that from engineering math view that there’s the problem.

@shivamsharma8874 5 лет назад

please share slides of this lecture.

@mitocw 5 лет назад

It doesn't look like there are slides available. I see a syllabus, instructor insights, problem sets, readings, and a final project. Visit the course on MIT OpenCourseWare to see what materials we have at: ocw.mit.edu/18-065S18.

@vinayreddy8683 4 года назад

Take a screenshots and prepare it by yourself!!!