CS231n Winter 2016: Lecture 4: Backpropagation, Neural Networks 1

Andrej Karpathy

Подписаться 46 тыс.

Просмотров 299 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

28 окт 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 109

@charleswetherell9436 6 лет назад

This is a terrific lecture on a difficult subject. Andrej strips all the complication away, gets to the essentials, and avoids jargon and "academese". The slides are great and the presentation engaging and easy to understand. The kind of teacher we wish was teaching every class!

@pe2roti 6 лет назад

Having suffered a lot and struggled through the cs229 of Andrew Ng, I can certify that this lecture is far more accessible to the least math-oriented souls out there. The way Andrej explains concepts *in english* and connects ideas to each other is one of the reasons i did not give up studying these topics.

@vil9386 9 месяцев назад

Can't thank Andrej, the cs231n team, Stanford enough. Thoroughly enjoy your lectures. Knowledge is one form of addiction and pleasure and thank you so much for providing it freely. I hope you all enjoy giving it as much as we enjoy receiving it.

@UAslak 7 лет назад

This was gold to me! "... and this process is called Back Propagation: it's a way of computing, through recursive application of the chain rule in a computational graph, the influence of every single intermediate value in that graph on the final loss function." (14:48)

@essamal-mansouri2689 6 лет назад

Andrej Karpathy's lectures and articles are basically the reason I know anything about anything.The webpage he uses near end of lecture to show decision boundary is quite literally one of the reasons I got interested in machine learning at all.

@leixun 4 года назад

*My takeaways:* 1. Computational graph and backpropagation examples 2:55 2. Implementation: forward/backward API 32:37 3. Vectorized implementation 43:26 4. Summary so far 50:50 5. Neural Networks 52:35 6. Summary 1:16:05

@nausheenfatma 7 лет назад

Thank you so much for making these lectures public !! I had been struggling to understand backpropagation for a long time. You have made it all so simple in this lecture. I am planning to watch the entire series.

@portebella1 7 лет назад

This is by far the most intuitive neural network video I have seen. Thanks Andrej!

@WahranRai 6 лет назад

Are you waiting for a first date ?

@madmax-zz3tt 5 лет назад

@@WahranRai hahaha online pervert

@tomashorych394 2 года назад

Exactly my thoughts. Ive seen quite a few backprop videos but this lecture truly shows how simple yet elegant way of computing this is.

@BrandonRohrer 7 лет назад

Excellent explanation Andrej. Thanks for making the rest of us a little smarter.

@dashingrahulable 7 лет назад

You do quite the same thing with your videos as well, Brandon. Find them really helpful, thanks.

@bayesianlee6447 6 лет назад

Thanks for Andrej and Brandon also. Brandon ur works also gave me huge chance on interpretation very good way. thanks for both :)

@cryptobling 3 года назад

Finally, a lecture that lifts the curtain of mystery on backward propagation! Excellent delivery of core concepts in neural networks. This is the best lecture I have watched on the topic so far (including the most popular ones that are heavily promoted on public domains!)

@proGrammar 7 лет назад

You are a genius of explication, Andrej! Can't tell you how much this helped. Thank you!

@avinashpanigrahi3232 4 года назад

After seeing a dozen videos on BackPropagation (that includes CSxxx courses too) , this is by far the best explanation given by anybody. Thank you Andrej, you have made life bit simpler in COVID.

@krajkumar6 6 лет назад

Most intuitive explanation I've seen for backprop. Thank you Andrej!! Great resource for deep learning enthusiasts

@nguyenthanhdat93 7 лет назад

Great backprop explanation. Thanks!

@mishrpun 5 лет назад

one of the Best back prop video on internet

@reachmouli Год назад

This is a beautiful lecture - gave a very fundamental understanding of backward propagation and its concepts - I see backward propagation correlates to demultiplexing and forward prop corresponds to multiplexing where we are multiplexing the input .

@Alley00Cat 7 лет назад

At 12:33, I love the switch from Canadian to American: "compute on zed...ahum...on zee". ;)

@dhanushc5744 5 лет назад

That's how we pronounce in India as well "Zed".

@adityapasari2548 7 лет назад

This is the best introductory explanation for NN Thanks a lot

@GermanGarcia_Hi 7 лет назад

This course is pure gold.

@erithion 5 лет назад

Excellently structured lecture with brilliant intuitive visualization. Thanks a lot, Andrej!

@mostinho7 4 года назад

Starts at 6:30 13:40 the local gradient is computed on the forward pass, this node while doing the forward pass can immediately know what dz/dx and dz/dy are because it knows what function it’s computing 31:30 when two branches coming into a node when doing backprop we add their gradients 34:15 example of backward pass code 37:00 the gradient of the loss wrt x is NOT just “y” but it’s y*z because this z is dL/dZ and we want to find dL/dX, dL/dX = dL/dZ * dZ/dX, and z the incoming gradient is dL/dZ and y = dZ/dX 43:30 why are x,y,z vectors? Because recall the notation for forward propagation, we do forward prop for more than one input at a time. Each input can be represented as a column vector or a row vector in the input matrix X? 1:10:00 decision boundaries formed by neural network

@anetakufova7180 3 года назад

Best explanation about backprop and neural networks in general I've ever seen. Also good questions from the students (I like how interactive they are)

@IchHeiseAndre 6 лет назад

Fun fact: The lake in the first slide is the "Seealpsee" at the Nebelhorn (Oberstdorf, Germany)

@vocdex 3 года назад

This was very satisfying piece of information. Especially, the follow-up questions helped me clear the doubt about many details. Thank you so much

@getrasa1 7 лет назад

That was god-like explanation. Thank you so much!

@tomatocc4565 4 года назад

This is the best explanation for backpropagation. Thank you!

@arnabmukherjee 4 года назад

Excellent explanation of backpropagation through computational graphs

@budiardjo6610 Год назад

i am come from his blog, and he say he put a lot of effort for this class.

@sherifbadawy8188 Год назад

This is the course, I finally understood it. Thank you so so much!!!!

@AG-dt7we 2 года назад

at 25:00 add distributes the upstream gradient, max “routes” the upstream gradient & mul switches the upstream gradient:

@y-revar 8 лет назад

Great explanation and intuition on back propagation. Thank you!

@SAtIshKumar-qt3wx 3 года назад

great explanation, helps me to understand backpropagation better. thanks a lot.

@ThienPham-hv8kx 3 года назад

In summary: we calculate forward to get loss value , calculate backward to get gradient value and then update weight for the next step. And so on, until we 're happy with loss value.

@nikolahuang1919 6 лет назад

This lecture is enlightening and interesting.

@shanif 7 лет назад

Echoing everyone here, great explanation, thanks!

@LouisChiaki 7 лет назад

Amazing class about the back propagation and the demo!!!

@ThuongNgC 8 лет назад

Great explanation , very intuitive lecture.

@ArthurGarreau 7 лет назад

Great lecture, this is really valuable

@inaamilahi5094 5 лет назад

Excellent Explanation

@vineetberlia3470 5 лет назад

Awesome Explanation Andrej!

@maxk7171 7 лет назад

These videos are helping so much

@ServetEdu 8 лет назад

That back prop explanation!

@miheymik7984 5 лет назад

w1x1 - "this weigh represents how this neuron likes... that neuron". Bravo!

@jmeliu 3 года назад

This video made me cry😿. This guy did his best and enthusiastically tried to make the students understand BackProp. Stanford students are so lucky. It’s not that you are smarter, but that other former children, such as me, don’t have such a good teacher.

@cp_journey 4 года назад

1:00:31 why update are added instead of taking it negative += instead of -=

@vatsaldin 3 года назад

When such an excellent piece of knowledge shared by Andrej -- still there are some idiots who have disliked this great piece.

@letsl2 8 лет назад

Great explanation of backprop algorithm ! thanks a lot !

@saymashammi7695 3 года назад

Where can I find the lecture slides? The lectures are too good!

@serdarbaykan2327 5 лет назад

(27:33) "..because calculus works" :D

@bhavinmoriya9216 Год назад

Awesome as always! Is notes available to general public? If so, where do I find it?

@rvopenl6402 6 лет назад

Thank you Andrej, Interesting lecture.

@bobythomas4427 6 лет назад

Great video. One question regarding the computation of dx at around 35 minutes. How is dx becomes y * dz? Why is it not dz / y?

@hmm7458 4 года назад

chain rule ..

@sergeybaramzin6030 8 лет назад

great, looking forward to the next part)

@reubenthomas1033 2 года назад

Are the homework assignments accessible? If yes, could someone link me to it? I can't seem to find them.

@kevinxu5203 3 года назад

35:10, thought dz/dx = y, for each unit of dx change, dz changes by y times. why the equation says dx = self.y * dz?

@mumbaicarnaticmusic2021 3 года назад

So there dx means dL/dx and dz means dL/dz, so dx = self.y * dz would be read as dL/dx = y * dL/dz, or dL/dx = dz/dx * dL/dz

@smyk1975 5 лет назад

This entire course is a triumph

@olegkravchuk5751 7 лет назад

Just one question. Why here ( ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-i94OvYb6noo.html )you take input value to the gate for its own derivative while in all the other parts you take the already computed value ( ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-i94OvYb6noo.html )? It's kinda confusing. I'd expect that in the beginning we take just 1 not the input to the gate.(1/x)

@cinderellaman7534 3 года назад

Thank you Andrej.

@guanjuexiang5656 8 лет назад

It's so clear and helpful. Thank you!

@rishangprashnani4386 5 лет назад

suppose output of (1/x) is F and (+1) is Q. So, the local gradient on the F gate should be (dF/dQ). why is that not the case? here the local gradient over F is (dF/dx).. pls explain? @ 18:20

@rishangprashnani4386 5 лет назад

never mind, got it. Q is x.

@Pranavissmart 8 лет назад

It's a bit weird for me to see him in the video. I was used to Badmephisto's voice only.

@WorldMusicCafe 8 лет назад

+Profilic Locker LOL, Thank God, it's not only me. He picked up a pretty accent too. Well, his recent years in US did him good. These 8 years changed Badmephisto's voice a lot.

@saurabh7337 3 года назад

Thanks for the great lecture.

@TomershalevMan 6 лет назад

amazing tutor

@snippletrap 5 лет назад

Maybe "mux" instead of "router"? On the gate analogy.

@tyfoodsforthought 4 года назад

Extremely helpful. Thank you!

@cyrilfurtado 8 лет назад

Really good lectures & assignments. I could not figure out how to get dW, I worked out other parts of the assignments, now that Assignments 1 & 2 are graded can some one please tell me Thanks

@vaishali_ramsri 7 лет назад

Excellent Lecture !

@sumitvaise5452 4 года назад

does the output layer has a bias? I read in StackOverflow, that if we use sigmoid funtion then we have to use a bias.

@420lomo 4 года назад

The output layer has to match the dimensions of the output, so the dimensions are fixed. Also, since the output is not used for computation, don't think about it in terms of weights and biases. Output layers have no weights and biases, it's just the output. If you're using a sigmoid layer at the end, there is no bias coming in from the previous layer, since that would skew your probability estimates. All the hidden layers can have biases, though

@slashpl8800 7 лет назад

Fantastic lecture, thanks!

@brucelau6929 7 лет назад

I loves this lecture.

@gauravpawar1416 7 лет назад

I cannot thank you enough :)

@jpzhang8290 7 лет назад

what are add gate, max gate and mul gate ?

@Nur-vr6vb 5 лет назад

ConvnetJS, Neural Network demo around 1:10:00 cs.stanford.edu/people/karpathy/convnetjs/demo/classify2d.html

@alexanderskusnov5119 3 года назад

It's a bad decision to use small L in names l1 etc. Bad font does it like big i and 1.

@uVeVentrue 6 лет назад

I feel like a ninja backpro.

@mark_larsen 2 года назад

Wow actually so helpful!

@M0481 6 лет назад

Loved this, thanks a lot!

@shobhitashah1929 4 года назад

Nice video Sir

@theempire00 8 лет назад

Around 24:00, shouldn't the gradients be switched? I.e: x0: [-1] X [0.2] = -0.2 w0: [2] X [0.2] = 0.4 Oh wait nevermind, I see it's explained a couple slides further!

@vorushin 7 лет назад

No. dL/dx0 = df/dx0 * dL/d(w0 * x0) df/dx0 (local gradient) = 2.0 (because the value of x0 is multiplied by w0 = 2.0, every time we increase the value of x0 by h, the value of w0 * (x0 + h) increases by h * w0) dL/d(w0 * x0) is already calculated as 0.20 2.0 * 0.20 = 0.40 The same is applied to calculation of dL/dw0

@proGrammar 7 лет назад

So glad someone else saw this! I thought I was trippin'. Looking forward to the reveal! : )

@questforprogramming 4 года назад

@@proGrammar have you understood...its because the mul logic

@rohan1427 5 лет назад

Great lecture :)

@quishzhu 2 года назад

Grazie!

@owaisiqbal4160 3 года назад

Link for COVNETS demo: cs.stanford.edu/people/karpathy/convnetjs/demo/classify2d.html

@jklasfjkl 3 года назад

amazing

@memojedi 6 лет назад

bravo!

@eduardtsuranov712 Год назад

ty very much!!!

@SiD-hq2fo 4 месяца назад

hey internet, i was here

@jiminxiao1779 8 лет назад

quite cool.

@xuanbole5906 7 лет назад

Thank you!

@HolmesPatrick 8 лет назад

awesome

@theVSORIGINALs 3 года назад

is he speaking too fast or the vdo is fast forwarded

@StanislavOssovsky 5 лет назад

Someone please help. I am a pianist, so my math is weak, I know I am stupid, but... The derivative of (x+y)... If I do an implicit differentiation, it turns out: (x+y)=3-> d(x+y)/dx=d3/dx-> d(x+y)/dx=0-> 1+1*(dy/dx)=0-> dy/dx=-1 And the result must be a positive 1! Where is my mistake? Please someone point out.

@pravachanpatra4012 Год назад

28:00

@channelforwhat Месяц назад

@14:47

@frodestr 6 лет назад

Good lecture, but not a good idea to teach the next generation of data scientists that (in this case, Python) code should have no comments and maximum terseness.