Tutorial 12- Stochastic Gradient Descent vs Gradient Descent

Подписаться 991 тыс.

Просмотров 211 тыс.

50% 1

Below are the various playlist created on ML,Data Science and Deep Learning. Please subscribe and support the channel. Happy Learning!
Deep Learning Playlist: • Tutorial 1- Introducti...
Data Science Projects playlist: • Generative Adversarial...
NLP playlist: • Natural Language Proce...
Statistics Playlist: • Population vs Sample i...
Feature Engineering playlist: • Feature Engineering in...
Computer Vision playlist: • OpenCV Installation | ...
Data Science Interview Question playlist: • Complete Life Cycle of...
You can buy my book on Finance with Machine Learning and Deep Learning from the below url
amazon url: www.amazon.in/Hands-Python-Fi...
🙏🙏🙏🙏🙏🙏🙏🙏
YOU JUST NEED TO DO
3 THINGS to support my channel
LIKE
SHARE
&
SUBSCRIBE
TO MY RU-vid CHANNEL

Опубликовано:

27 июл 2019

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 97

@ravindrav1895 2 года назад

whenever i am confused with some topics , i come back to this channel and watch your videos and it helps me a lot sir .Thank you sir for an amazing explanation

@BalaguruGupta 3 года назад

Amazing explanation Sir! You'll always be the hero for the AI Enthusiasts. Thanks a lot!

@nagesh866 3 года назад

what an amazing teacher you are. Crystal clear.

@saurabhnigudkar6115 4 года назад

Best Deep Learning playlist on youtube

@lakshminarasimhanvenkatakr3754 4 года назад

This is excellent explanation so that anyone can understand with so much granular level of details.

@ajithtolroy5441 4 года назад

I saw many videos but this one is quite comprehensible and informative

@VVV-wx3ui 4 года назад

Superb...simply superb. understood the concept now from the Loss function. Well don Krish.

@shashanktripathi3034 3 года назад

Krish sir your youtube channel is just like GITA for me as one gets all the answers to life in GITA I get all my doubts cleared on your channel. Thank you, SIr.

@kartikdave659 3 года назад

after becoming member how can i get the data science material, can you please tell me?

@fedisalhi6320 4 года назад

Excellent explanation, it was really helpful thank you.

@archanamaurya89 3 года назад

This video is such a light bulb moment for me :D Thank you so very much!!

@aditisrivastava7079 4 года назад

Just wanted to ask to ask if you could also suggest some good resources online that we can read which could bring more clarity.......

@nitayg1326 4 года назад

My God! Finally am clear about GD SGD and mini batch SGD!

@bhavanapurohit2627 3 года назад

Hi, is it completely theoretical or will you code in further sessions?

@sreejus8218 3 года назад

If we use a sample of output to find the loss, will we use its derivative for changing whole weight or change the weights of the respective output

@ruchikalalit1304 4 года назад

have you make the videos of practical implementation of all the work if so please share the links

@khuloodnasher1606 4 года назад

Really this is the best video i'v seen ever explaining the concept better than famous. school

@Skandawin78 4 года назад

Your vidoes are excellent reference to brush up these concepts

@severnsevern1445 3 года назад

Great explanation . Very clear . Thank!

@allaboutdata2050 4 года назад

What an explaination 🧡 . Great !! Awesome !! .

@sandipansarkar9211 4 года назад

Thanks Krish. Good video.I want to use all this knowledge in my next batch of deep learning by ineuron

@koustavdutta5317 3 года назад

Hi Krish, one request to you ...like this playlist, please make long videos for the ML Playlist with the Loss Functions , Optimizers used in various ML Algorithms --> mainly in case of Classification Algorithms

@taranilakshmi9680 4 года назад

Explained very well. Thankyou.

@chinmaybhat9636 4 года назад

Awesome @KrishNaik Sir.

@uttamchoudhary5229 5 лет назад

Great video man 👍👍..Please keep it up. I am waiting for next videos

@gayathrijpl Год назад

such a clean way of explanation

@gauravsingh2425 4 года назад

Thanks Krish !!! very nice explanation

@tonyzhang2501 3 года назад

Thank you, It is clear explanation. I got it!

@ArthurCor-ts2bg 4 года назад

Krish you concise subject most meaningfully

@guytonedhai Год назад

How are you so good at explaining 😭😭😭😭😭 Thanks a lot ♥♥♥

@muralimohan6974 3 года назад

How can we take k inputs at the same time

@rabidub733 4 месяца назад

thanks for this! great explanation

@rababmaroc3354 4 года назад

thank you very much for your efforts. please how can we solve a portfolio allocation problem using this algorithm? please answer me

@r7918 3 года назад

I have 1 question regarding this topic. Is this concept applicable to linear regression, right?

@RishikeshGangaDarshan 3 года назад

Good Good clearly explained nobody can explained like this

@pareesepathak7348 3 года назад

can you share the paper for reference and also can you share the resources for deep learning for image processing.

@Kurtmind 2 года назад

Excellent explanation Sir!

@muhammedsahalot8683 2 месяца назад

which have more convergence speed SGD or GD ?

@manojsalunke2842 4 года назад

9.28 time, you said sgd will take time to converge than gd, then which is fast , sgd or gd????

@nikkitha92 4 года назад

Sir your videos are amazing. Can you please explain about latest methodologies such as BERT , ELMO

@siddharthachatterjee9959 4 года назад

Good attempt 👍. Please record with camera on manual focus.

@vinuvarshith6412 Год назад

Top notch explanation!

@a.sharan8876 Год назад

py:28: RuntimeWarning: overflow encountered in scalar power cost = (1/n)*sum([value**2 for value in(y-y_predicted)]) hey bro . ia m stuck here with this error , i could not understand the error itself, if you suggests me some solution. .... just now i started to practice a ml algorthm.

@rohitsaini8480 Год назад

Sir, please solve my problem, in my view we are doing gradient descent to find the best value of m (slop in case of linear regression, considering b = 0) so if we use all the point then we must came to know at which point the value of m is less, so why we have to use learning rate to update weight because we already know the best value.

@_JoyshreeMozumder 3 года назад

what is resource of data point?

@akfvc8712 3 года назад

greate video excelent effort. appreciated!!

@ankitbiswas8380 2 года назад

when you mentioned SGD takes place in linear regression . I didnt understand that comment . Even in your linear regression videos for the mean square error we are having sum of squares for all data points . So how SGD got linked in linear regression ?

@nansonspunk Год назад

yes i really liked this explanation thanks

@rdf1616 4 года назад

good explanation! thankss

@NaveenKumar-ts1om День назад

Awesome KRISHHHHHH

@alsabtilaila1923 3 года назад

Great one!

@achrafkmout9398 3 года назад

very good explanation

@goodnewsdaily-tamil1990 Год назад

1000 likes for you man👏👍

@bijaynayak6473 4 года назад

Hello Sir, could you share the link for the code where you explained, these videos series are very nice with short of the period we can cover so many concepts. :)

@syedsaqlainabatool3399 3 года назад

This is what i was looking for

@rameshthamizhselvan2458 4 года назад

Excellent!

@vishaljhaveri7565 2 года назад

Thank you sir.

@vineetagarwal18 Год назад

Great Sir

@Anand-uw2uc 4 года назад

Good Explanation! But you did not speak much about when to use SGD although you clarified better on GD and Mini Batch SGD

@vishaldas6346 3 года назад

There is nothing much to explain about SGD when you are talking about 1 datapoint at a time while considering dataset of 1000 datapoints.

@samiabidah4197 3 года назад

please what the difference between GD and Batch GD !

@RaviRanjan_ssj4 4 года назад

great video !!

@ting-yuhsu4229 4 года назад

You are AWESOME! :)

@response2u 2 года назад

Thank you, sir!

@aminuabdulsalami4325 4 года назад

Great guy.

@jiayuzhou6051 2 месяца назад

the only video that explains

@SandeepKashyap-ek2hx 2 года назад

You are a HERO sir

@praneethcj6544 4 года назад

Perfect ..!!!

@khushboosoni2788 Год назад

sir can you explain me SPGD algorithm please

@louerleseigneur4532 3 года назад

Thanks buddy

@percyjardine5724 3 года назад

thanks Krish

@yukeshnepal4885 4 года назад

8:58 , using GD it converge quickly and while using mini-batch SGD it follows zigzag path, How??

@kannanparthipan7907 4 года назад

In case of mini batch sgd, we are considering only some points so some deviations will be there in the calculation compared to usual gradient descent where we are considering all values. Simple example GD is like total population and mini SGD is like sample population, it will never be equal and in sample population some deviation always will be there in distribution compared to total population distribution. We cant use GD everywhere, due to time computation factor, using mini SGD will give approximate correct result.

@bhargavpotluri5147 4 года назад

@@kannanparthipan7907 Deviation will be there in the final output or in the final converge result. Question is why do we have during the process of convergence. Also for every epoch if we consider different samples then understood that there can be zig zag results in the process of convergence. But if only one sample of k records are considered then why is that zig zag during convergence?

@bhargavpotluri5147 4 года назад

Ok now I got it. For every iteration, samples are picked at random, so is zig zag. Just gone through other artciles

@abhrapuitandy3327 4 года назад

please do tell about stochastic gradient ascent also

@AjanUnderscore 2 года назад

Thank u sir 🙏🙏🙌🧠🐈

@thanicssubakar6303 5 лет назад

Nice bro

@sathvikambati3464 Год назад

Thanks

@phaneendra3700 3 года назад

hats off man

@minakshiboruah1356 3 года назад

@12:02 Sir it should bemini batch stocastic g.d.

@funpoint3966 4 месяца назад

please workout your camera issue it seems like it is set to auto focus resulting in a little disturbance.

@shubhangiagrawal336 3 года назад

good video

@jsverma143 4 года назад

negative weights and positive weights best explained as-- since the angle of tangent is more than 90 degree in left side of the curve so this results in -ve values and for other its less than 90 degree so it would be +ve

@soheljagirdar8830 3 года назад

4:17 SGD have minimum 256 records to find error / minima you said it's 1 record at a time

@pramodyadav4422 3 года назад

I read few articles which says In "SGD a randomly one data point is picked from the whole data set at each iteration". 256 records which you're talking about may be Mini Batch SGD "It is also common to sample a small number of data points instead of just one point at each step and that is called “mini-batch” gradient descent."

@tejasvigupta07 3 года назад

@@pramodyadav4422 yeah ,even I have read that in SCD only one data point is selected and updated in each iteration instead of all.

@atchutram9894 4 года назад

Switch the auto focus feature in your camera. It is distracting.

@shekharkumar1902 4 года назад

Confusing one !

@devaryan2201 2 года назад

do change your method of teaching seems like someone has read a book and just trying to copy thatt content from ones side .....use your own ideologies for it :)