NOTE: This is an updated and revised version of the Decision Tree StatQuest that I made back in 2018. It is my hope that this new version does a better job answering some of the most frequently asked questions people asked about the old one. Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/
Hi Josh, great video! I have one question. When you are calculating the total Gini impurity based on the weighted average, why is it not (1 - weight)*Gini instead of (weight)*Gini? Since we want to minimize Gini, wouldn't the Gini value with the most sample size have its overall Gini reduced (as in (1 - weight)*Gini ) instead of increase (as in (weight)*Gini) ? Thanks!
@@haoyuanliu8034 The more data we have to support something, the more trust we have that that something is correct. Likewise, if I don't have much data to support something, then I should probably have less confidence that that something is correct. And that's what we're doing here. The more data observations we have in a leaf, the more data we have to support the predictions made by that leaf. Thus, the weight amplifies the gini value for the leaf for the most data.
Haven't watched them all yet but probably will. And even that you have and will receive more compliments, it's always worth keeping on thanking you for this amazing job!
I'd never seen a youtube comment section so full of thankful, enlightened and happy people. You must have revolutionized teaching. Thank you Josh, for these excellent videos. You rock!
This channel helped me a loooot! It helps me from researching to looking for a job, from recreating myself to exploring the field of statistics and machine learning. You are the best! I can't express my gratefulness in words!
I never watched Andrew NG's OG course.... i just come back to these videos if I have any doubts or if I need to refresh my knowledge. Thanks a lot josh ;)
Hi, would you mind sharing what textbook you are reffering to? I noticed there is a reference to some textbook in the video. I'm guessing it's referring to the Introduction to Statistical Learning with Applications in R, but I'm not sure about the edition, and at least in the electronic versions, I can't find the relevant information on page 321.
I love this video ( in the same spirit of many other of your machine learning algorithm videos) because after watching it, I actually managed to code a simple classification tree on my own to just solidify the things I learned here, and after watching this video, all the parameters in scikit-learn DecisionTreeClassifier are making sense to me. Most of the ML videos and many of the classes out there only talk about very generalized, high-level ideas of these models. You don’t. You always do such a great job giving clear yet detailed explanation of the nitty gritty of these models. Between the ISLP/ISLR books and your videos I am able to gain basic understanding beyond just making api calls of caret in R or sklearn in python. It really made me feel like I am learning, instead of just typing formulas on the keyboard. Could never thank you enough ❤❤❤
More important than teaching people statistics and machine learning, you teach people they are capable of understanding things they would of otherwise thought themselves incapable of understanding.
@@statquest Sir can you pls tell me how should i start ML as beginner. Is this the place that should start ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-Gv9_4yMHFhI.html from your tutorial
This guy is amazing! I also love how he reminds us of what we were doing, why we were doing it and how we were doing it. Usually, halfway through my lectures I have forgotten where we came from and why we are doing what we're doing. Can't see the forest for all those trees... B)
I don't know who are you but man you are the best instructor ever I have ever seen. I wish my math teacher met you, she was teaching us the same way you do 😍
Hey Josh, I am about to go into my last exam before I graduate and this is the last video I'm watching for a topic that was covered in a day I missed I'm sure you won't see this but thank you for all the help you've done
Thank you, Josh. Based on the methods you provided I tried creating a Python function that calculates the GINI impurity for each independent variable, It really helped deepening my knowledge. thanks again.
I have been sitting on the edge of my seat rooting for the algorithm to figure out that age is the only valid indicator for whether people like an old movie. Only to realize that soda is a valid indicator to deduct somebodies age and that the final outcome suprisingly, makes sense.
about two weeks ago i was trying to learn how the slit is made on numerical data for best split. I was using python for this and was always setting the split space with np.linspace, to find the best split, but the way you showed with averaging a sorted list is very intuitive. If I have only watched this video it would let me save few days of learning how to manually calculate information gain and best split to better understand how DT is working. Great video!
Thanks in a million! Very well explained. This is the nth time that I am watching this again. Great content. Awesome. I couldn't find this explanation--simply put anywhere else. “Great teachers are hard to find”. Grade: A++ 💥
Bam! Josh, I keep watching many videos on your channel! You can explain things in a simpler way! And you videos are inspiring. You are the best teacher I have ever had! We need more great channels like you! Double Bam! Could you make some videos in different distributions series? Like topic in Gamma, chi-squared, beta, poisson distribution, etc. I could hardly find a RU-vid channel that explains them clearly. :( Triple Bam! I know you did not “officially” teach statistics and just made videos for fun, but the world needs you to create more great videos lol! Your “To-do” list would be huge! Keep it up! :D
@@statquest Yes he did , we don't really question what material professors use but , seeing the watermark , i came here to understand what the lesson was about and I wasn't disapointed