Honestly didn't really help with my questions, but I didn't expect a 3 minute video to answer them. This was very well done, the visualization was great, and everything it touched on (while brief) was concise and accurate. Subbed.
wow. you have such a talent for explaining things so well compared to the rest of the youtube sphere. i hope you will continue to bless us with your talents.
Could you please make a video explaining how you made this video.. That would be very VERY helpful.. I've always wanted to use blender to make such animations as you've made, but couldn't make head or tails of it in blender. Most tutorials(and I've seen more than 100 videos) on blender showcase heavyduty animations which have nothing to do with mathematical explanations, as in how to make animations for maths related videos.. Your's is the first video in which I've seen such a thing. Please consider my request and kindly make a video tutorial about it(for the Blender part).
very smart. But I still need another video presenting differential to help understand the slope, opposite direction thing in 2D. But this is clear. Also I like the small step demonstration.
Hello. Thanks for this great video. Just I believe at 2:20, this variant of Gradient Descent that you explained is called the Mini-Batch Gradient Descent which uses a random subset of the training dataset. Stochastic Gradient Descent is the one that uses just one training record in each iteration.
It works just fine with multiple minimum points in a high dimension. As long as you configure your hyperparameters (learning rate, batch size, etc) correctly, you should have no problem converging to a "decent" minimum.
Great idea! Boyd's book is a good starting point (page 463 of web.stanford.edu/~boyd/cvxbook/bv_cvxbook.pdf). I will try to add more references to the video description in the future.
Interesting topic and comparison. Since you are using information from past iterations, it would be very illustrative to include a quasi-newton in your comparison. For example the BFGS.