Synthetic Gradients Tutorial - How to Speed Up Deep Learning Training

Aurélien Géron

Подписаться 24 тыс.

Просмотров 12 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

13 сен 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 52

@maratkopytjuk3490 6 лет назад

You have a huge talent to exlain things! Thank you for your time and energy. What is the reason that the synthetic models perform better than normal backpropagation? Intuitively it should perform worse on the training dataset because we just use approximations of the gradients.

@Chhillee 6 лет назад

Marat Kopytjuk I suspect that synthetic gradients act as a sort of regularizer. It's kind of been a weird thing for a while that some forms of regularization sometimes speeds up training.

@nateamus3920 6 лет назад

Months and months of work, study, trial and error...condensed into less than 30 minutes of concise instruction. I've found the value of your book to be the same. Once again, Mr. Géron, incredible work!!!

@mouduge 6 лет назад

Nateamus Thanks a lot! :)

@animeshkarnewar3 6 лет назад

Video on Synthetic Gradients! You did it. Thank you for taking my suggestion. I haven't seen it yet. But I am excited to watch. I am sure it'll be very insightful.

@bingeltube 6 лет назад

Just watched your video a 2nd time! Thank you very much for putting this great video and supporting information together, Aurelien! Well done!

@seanpedersen829 6 лет назад

Really impressed by the quality of your videos! Also I am shocked how good your english sounds, assuming your mother tongue is french. Keep doing what you do and may I suggest you provide people with a way to support you financially so you can keep this going.

@AurelienGeron 6 лет назад

Thanks a lot Sean, I'm really glad you enjoy my videos. My mother tongue is indeed French, but I lived in English speaking countries for a total of 12 years (Nigeria, New Zealand, U.S. and Canada). If people want to support my work, the best option is to buy or recommend my book. Thanks again! :)

@PhongNguyen-zz1ei 6 лет назад

My god ! Just wake up in the morning and I felt amazing watching your video on bed.

@jamespack161 6 лет назад

Aurélien, thank you for putting together this video. This talk is the best and most approachable explanation of synthetic gradients I have seen or read. Nice job!

@Shady9 Год назад

thank you so much for this thorough and very clear explanation of a complex subject.

@ronnywing9049 6 лет назад

Absolutely incredible work here, sir. Thank you so much for your efforts

@Arecatail 6 лет назад

Thanks for posting these videos. They are incredibly insightful. The book is great too.

@greendatadialog 6 лет назад

Great job man! I’m sharing it with the Data science community here in Hong Kong!

@kozzuli 6 лет назад

Best explanation seen so far. Thanks a lot! Looking forward for your next video.

@VijayKumar-fv6dx 6 лет назад

Thanks for your video ... really a great explanation .. not much needed beyond that... also about the Annie's painting is superb... Happy New Year to you as well

@alzeNL 2 года назад

Thank you for this great video, a great accompaniment to your book which I am working thru.

@bobsalita3417 6 лет назад

Excellent clarity of thought.

@lgsoftwares7093 6 лет назад

Thanks for the best way explaining. When is the new book comming ?

@ahmedadly 6 лет назад

Wonderful explanation, the best so far for caps net

@rantaoca491 6 лет назад

Very well explained! Please do more videos like this :)

@BadriNathJK 6 лет назад

You have a great voice. Keep up the channel. Bring more videos

@Vladeeer 6 лет назад

what a great way to end 2017

@RelatedGiraffe 6 лет назад

Great video! Very well explained. Synthetic gradients become really useful when you can't afford to store all activations in memory (which is necessary for regular backpropagation) such as for really long time chains in recurrent neural networks. But what is the benefits of using synthetic inputs? I see that they are described in the original paper, "Decoupled Neural Interfaces using Synthetic Gradients", but I don't see that they mention any reason for using them? Or do you think they explored the possibility to use them more out of curiosity to see whether it is possible at all?

@fanyixiao7235 6 лет назад

Very well explained!! Thank you for your efforts :)

@Skythedragon 6 лет назад

Great explanation, You just got a new subscriber!

@PaulHobbs23 6 лет назад

Thanks for making this video, this is a very interesting result! Does the paper explain why you would want to use synthetic inputs in addition to cDNI? It seems like cDNI already gives you the ability to train a model in a fully parallelized way.

@aa-xn5hc 6 лет назад

Thank you ! this is fantastic content.

@Alex-gc2vo 5 лет назад

would it not be better to also proved the labels as input to the synthetic gradient models? it seems like expecting a model to predict deltas without even knowing the final target is just a more complex version of gradient descent with momentum in a way.

@LinkSF1 6 лет назад

Thanks for the video! Huge fan of your channel and presentation style. I hope you don't mind a small comment regarding one part of your video. You state that truncated BP involves cutting the RNN and performing BP at a fixed point. This isn't fully correct, as truncated BP usually involves BP-ing a fixed number of time steps k back at ANY time point t (i.e. BP for k time steps from time t). Tensorflow, however, performs the style that you refer to (in the interest of computational efficiency). You can find more details in this post: r2rt.com/styles-of-truncated-backpropagation.html

@abhiwins123 6 лет назад

Your book is as amazing as your explanation 👍

@AurelienGeron 6 лет назад

Thanks Abhijith, I'm very glad you like both! :)

@thangbom4742 5 лет назад

brilliant idea. is it implemented in some platform?

@bingeltube 6 лет назад

Very recommendable

@mohamadyakteen8710 6 лет назад

Excellent video, I'm glad I've reached to your channel, I've just subscribed. Is there a PDF version of your book that we can buy online?

@AurelienGeron 6 лет назад

Thanks Mohamad! There's a PDF version available on ebooks.com. Here's the link: goo.gl/d9ZV3t

@mohamadyakteen8710 6 лет назад

Thank you Aurélien, I've got the book. Can't wait to reach Chapter 12 because I started learning CUDA libraries few months ago.. Judging from an overall perspective, the book is highly recommended , Great job.

@dherbemontvictor5188 6 лет назад

Hello Aurelien ! Thank you for this video and all the pasts (and i hope future !) vids ! I surely miss something because I don't understand why this is faster and how you can distribute this calcul. I understand that you create an estimator of the gradient that you use for the updating of the parameters, but as to compute the estimator of the gradient for the layer i you have to use the estimator of the layer i+1 ( to calculate the distance between the two) , somehow you have to wait that every layer j, i < j have computed their estimator of the gradient to compute yours (as every layer wait for the estimator of the layer after him to compute his estimator of the gradient) . I am missing something I think, so i hope you can help me on that ! Merci beaucoup ! Victor

@Chhillee 6 лет назад

D'herbemont Victor so, my understanding is that it isn't strictly faster, but it's parallelizable. Just like how there still is a forward lock even with dni's, you still have to wait for the next layer to compute its gradients before the synthetic gradient layer can update. Remember that the synthetic layer's outputs are an approximation of what would be the gradient at layer n+1, and that's what you need to compute the gradient for layer n

@SuperBlablou 6 лет назад

No, you estimate using only the output of the ith layer and eventually the true label of the sample. You will, as you say, need feedback from the i+1th layer later to compute the true loss and compare it with the estimated loss to improve your estimator. The point is that you didn't have to wait until the full forward pass is over to update the layer i. So you can directly start to compute the next sample's output in layer i.

@dherbemontvictor5188 6 лет назад

Thank you Aloïs for your answer ! So In a certain way the calculation of the estimator of the gradient and the forward pass, are not made synchronously ? When I say synchronously, I mean that you can make in one hand the calculation of the forward pass and updating the weight with your estimator and in an other hand you calculate the loss and the updating of the different gradient for each layer or group of layer ?

@SuperBlablou 6 лет назад

Yes, you got it :)

@dherbemontvictor5188 6 лет назад

Excellent! thank you for your time Aloïs!

@alibaheri4614 6 лет назад

Thanks. Is there any way to access the slides presented in this video?

@AurelienGeron 6 лет назад

Sure, here are the slides: www.slideshare.net/aureliengeron/synthetic-gradients-tutorial

@alibaheri4614 6 лет назад

Thanks, great.

@taksirhasan3551 6 лет назад

May I know which tools were used for the figures?

@mouduge 6 лет назад

Taksir Hasan I just use Google Slides (same as for my book). I wish they had a shortcut to toggle Help > Snap to Guides, but apart from that it's pretty easy to use.

@taksirhasan3551 6 лет назад

Thanks :)