The full Neural Networks playlist, from the basics to deep learning, is here: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-CqOfi41LfDw.html Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/
Man, your promotions are not shameless! Actually, what you do is a gift for us, for the price that you charge and for the level of the content, we are being gifted and not a buying something. You are far better than a lot of paid (and expensive) courses. Just check out your video comments to see how people few happy when they discover your videos!! Great work as always. Thank you so much!!!👏🏻👏🏻👏🏻👏🏻
He is using the concept of reverse psychology by presenting great stuff at a good price and as you mentioned theses promotions are not shameless… They are shameful, as you hinted he should indeed be ashamed of giving us such a good and advantageous offer… 😅😅😅😅
You released this video just in time for my AI exam! Thank you. Sometimes I think professors use really complex notation just to feel smarter than students, it doesn't help learning. I love your content.
I have been a student my entire life and have taught college level courses myself, and I must say you are one of the finest lecturers I have ever seen. This statquest is a gem. Your work is so succinct and clear its as much art as it is instruction. Thank you for this incredible resource!
I just want you to know your channel has been instrumental in helping me towards my Data Science degree, I'm currently in my last semester. I'll be forever grateful for your channel and the time you take to make these videos. Thank you so much.
I am studying for a Master's degree in bioinformatics now, and as someone who knows little about statistics, I really can't thank you enough for your videos and the effort that you have put into them.
I finished business school 25 years ago where I studied statistics and math. So happy to see that neural networks are fundamentally just a (much) more advanced regression analysis.
BAM!!! Thank you for supporting StatQuest! Yes, neural networks are a lot like regression, but now we can fit non-linear shapes to the data, and we don't have to know in advance what that shape should be. Given enough activation functions and hidden layers, the neural network can figure it out on its own.
It should be made a crime for anyone to see other videos on backpropagation before they reach Statquest. The world is confused by teachers who tell the big story before the basic. Learn the basic and the picture fall into place like the chain rule 😊
Love this! You've explained it far better than anywhere else I've seen, and you made it entertaining at the same time! Thank you so much for making this.
This is excellent stuff! As a visual learner, your channel is a BLESSING. Thank you so much for your fantastic work on breaking down concepts into small, bite-sized pieces. It's much less intimidating, and you deserve so much more appreciation . You also gained my subscription to your channel! Keep doing a great job, and thank you SO MUCH for having my back!
I just iterated on a gradient descent and found that this is the best possible way to teach this topic and no other lecture in the entire existence is better than this one
May I say .... You are such a good teacher that it is most enjoyable to watch your videos. I am proficient in statistics (via university econometrics 101) ... and I did not realise all those fancy terms in machine learning are actually concepts that are common items in the stats that I learned in the 1970s, e.g., biases and weights, label, activation functions etc. Anyway, I can see that a lot of viewers appreciate your work and teaching. I have also 'updated' myself. Thank you.
You are the best. I wish every ML learner find you first. I am going to do my part and tweet about you. Thanks for making these videos! Wish you more success.
Finally a proper, detailed, step by step explanation. This guy is absolutely AMAZING ! Thank you so much for all the hard work in putting these videos together for us.
Amazing explanation! I've spent years trying to learn this and it always went too quickly into the gory mathematical details. Aha moment for me was when green squiggle equal blue plus orange squiggles lol Thank you for this Josh!!!
Omg, protect this man at all costs, this was pure gold!!! Also, thank you, sir, for talking so slowly because if my brain squiggles need to work faster they will burn up x)
Honestly you do a much better job at teaching using a pre recorded video than my instructors using both written and live materials that I'm paying for.
9:00 at this moment I realised I'm watching the best math content on earth, because you never see simple stuff like this being given attention to. Luckily I already know how summation symbol works, but I didn't know it in the past, and nobody cared to explain. But it's just not about the summation symbol, imagine the other 1000 small things somebody might not understand, and doesn't realise they don't understand, because it's been skimmed over
Who else is using these videos to put together a semester project? So far, I've put Regression Trees, K-fold CV, complexity pruning, and now Neural networks as my final model construction. Josh is worth a double bam every time.
Josh, this is amazing. You really make things so easy to visualise which is crazy considering the hidden networks are meant to be so hard that they are referred to as black box! Thanks for all your videos. I have used heaps over the last twelve months. Thank you again.
JUST WOW! Thank you so much, Josh! I cannot express the feeling I had when EVERYTHING made sense!!! TRIPLE BAM! Never thought I would be extremely excited to pause the video and try to solve everything by hand before I look at the next steps
I never understood backpropagation. I knew some things from other tutorials, but as for beigginer, it was very hard to understand. This video (and probably series) is the best i could find. Thank you.
I was reading an article based on Backpropagation and I did not understand a single word. I had to watch all your videos starting from Chain Rule, Gradient Descent, NNs...I re-read the article and understood everything!!! But now I can't get the beep--boop and small/double/triple/ bam out of my head lol.
Wow! This is an incredible video. Thank you SO MUCH for making this for us. This is one of the best videos I've seen to explain this concept. The hard work you have put into this is something that I am incredibly appreciative of. Thanks, man.
Back propagation (aka finding w's and b's) start with b_final=0. you'll notice that error = (actual - predicted)^2 is really high. so you find the gradient descent of squared error wrt b_final and find out the value of b_final for which the squared error is minimum. that is your optimal b_final. gradient descent: derivative of sum of squared errors wrt b_final = derivative of sum of squared errors wrt predicted value y * derivative of y wrt b_final. d(y observed - y predicted)^2/d(y predicted) = -2*(y observed - y predicted) d(y predicted)/d(b_final) = d(sum of all those previous curves obtained through each node of the layer + b_final)/d(b_final) = 0+0+0....+0+1=1 take the predicted curve ke x values and find the derivative/slope. step size = slope*learning rate. new b_final = old b_final - step size. keep repeating until slope touches 0. this is how gradient descent works and you've found your optimal b_final.
Thank you very much for your video~ Your videos make me feel that studying English make so much sense, otherwise I can't enjoy such beautiful thing~ 👍👍👍❤❤❤
Hello, thank you for the video! This series has been really helpful to learn about deep learning. I have a couple of questions. 1. When using gradient descent and backpropagation, do we always use SSR to measure how good a fit the parameter we are estimating is? Or are there other ways? 2. The second question is when using chain rule for calculating derivatives. The first part is d SSR/ d Predicted. In that first part @ 11:25 are you using chain rule again within that first part? And when deriving the inside Observed - Predicted @ 11:34 where do you get 0 and 1 from?
1. The "loss function" we use for gradient descent depends on the problem we are trying to solve. In this case, we can use the SSR. However, another commonly used "loss function" is called Cross Entropy. You can learn more about cross entropy here: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-6ArSys5qHAU.html and ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-xBEh66V9gZo.html 2. You can learn how the chain rule works (and understand the 0 and 1) here: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-wl1myxrtQHQ.html
Wow 😮 I didn't knew I had to watch the *Gradient Descent Step-by-Step!!!* before I can watch the video related to *Neural Networks part 2* that I must watch before I can watch the *The StatQuest Introduction To PyTorch...* before I can watch the *Introduction to coding neural networks with PyTorch and Lightning* 🌩️ (it’s something related to the cloud I understand) I am genuinely so happy to learn about that stuff with you Josh❤ I will go watch the other videos first and then I will back propagate to this video...
Josh, finished watching. Thank you again 1 If I as a researcher know +/- which range of inputs I am going to insert, and which range of outputs I expect to get in the end, will I want to adjust somehow from the very beginning the weights range, maybe weights distribution, same thing about biases and same about activation functions, or today we let the algorithm to do this job? 2 most interesting question: Lets say that while finding the prediction curve we kind of discover some "hidden truth". I think our curve might never be exact also because we do not know all of the independent variables which in nature affect our dependent variable. Say we know one, but there is another one which we do not know about. If so, will it be right to say that when neural network with one input splits the input by different weights into two neurons of a hidden layer (from which the final output is calculated), it is like simulating somehow presence of another "secret independent variable" even without knowing what it is? Thanks
I'll be honest, I'm not sure how to answer question #1. I don't know. I do know that some of the methods used for initializing the weights with random values increase the variation allowed in the values based on how many layers are in the neural network - so that might do the trick. As for the second question: Adding the second node in the hidden layer allows the squiggle to go up *and* go down. If I just had one node, I would only be able to go up *or* down. So, in some sense, that is sort of like adding a secret independent variable.
@@igorg4129 It's tempting to want to initialize weights to a target range in the hopes of speeding up convergence, however this actually might be counter productive. The weights of individual nodes do not have to conform to the same distribution as your output. When you use an appropriate (adaptive) optimizer, it should be able to tune the weights pretty quickly, considering that the first few passes will likely have larger gradients.