AdaBoost, Clearly Explained

StatQuest with Josh Starmer

Подписаться 1,3 млн

Просмотров 759 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

29 сен 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 1,7 тыс.

@statquest 4 года назад

Correction: 10:18. The Amount of Say for Chest Pain = (1/2)*log((1-(3/8))/(3/8)) = 1/2*log(5/8/3/8) = 1/2*log(5/3) = 0.25, not 0.42. NOTE 0: The StatQuest Study Guide is available: app.gumroad.com/statquest NOTE 2: Also note: In statistics, machine learning and most programming languages, the default log function is log base 'e', so that is the log that I'm using here. If you want to use a different log, like log base 10, that's fine, just be consistent. NOTE 3: A lot of people ask if, once an observation is omitted from a bootstrap dataset, is it lost for good? The answer is "no". You just lose it for one stump. After that it goes back in the pool and can be selected for any of the other stumps. NOTE: 4: A lot of people ask "Why is "Heart Disease =No" referred as "Incorrect""? This question is answered in the StatQuest on decision trees: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-_L39rN6gz7Y.html However, here's the short version: The leaves make classifications based on the majority of the samples that end up in them. So if most of the samples in a leaf did not have heart disease, all of the samples in the leaf are classified as not having heart disease, regardless of whether or not that is true. Thus, some of the classifications that a leaf makes are correct, and some are not correct. Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/

@parvezaiub 4 года назад

Isn't it be 0 .1109?

@statquest 4 года назад

@@parvezaiub That's what you get when you use log base 10. However, in statistics, machine learning and most programming languages, the default log function is log base 'e'.

@thunderkat2911 4 года назад

you should pin this to the top

@sidisu 4 года назад

Hi Josh - great videos, thank you! Question on your Note 3: How does omitted observations get "back into the pool"? Seems in the video around 16:16, the subsequent stumps are made based on performance of the previous stump (re-weighting observations from previous stump)... if that's the case, when do you put "lost observations" back into the pool? How would you update the weights if the "lost observations" was not used to assess the performance of the newest stump?

@lejyonerxx 3 года назад

First, thank you for those great videos. I have the same question that Tim asked. How does omitted observations get "back into the pool"?

@codeinair627 3 года назад

Everyday is a new stump in our life. We should give more weightage to our weakness and work on it. Eventually, we will become strong like Ada Boost. Thanks Josh!

@statquest 3 года назад

bam!

@enicay7562 13 дней назад

TRIPLE BAM!!!

@anishchhabra5313 2 года назад

This video is just beyond excellent. Crystal clear explanation, no one could not have done it better. Thank you, Josh.

@statquest 2 года назад

Thank you!

@AayushRampal 4 года назад

You are THE BEST, can't tell how much i've got to learn from statquest!!!

@statquest 4 года назад

Awesome! Thank you so much! :)

@miesvanaar5468 4 года назад

Dude... I really appreciate you make these videos and put so much effort in to making them clear. I am buying a t-shirt to do my small part in supporting this amazing channel,.

@statquest 4 года назад

Hooray!! Thank you very much! :)

@nicolasavendano9459 3 года назад

I can't believe how useful your channel has been these days man! I literally search up anything ML related in youtube and there's your great video explaining! The intro songs and BAMS make everything so much clearer dude, the only bad thing I could say about these videos is that they lack a conclusion song lol

@statquest 3 года назад

Thanks! :)

@cslewisster 2 года назад

I should have checked here instead of everywhere else. Josh sings a song and explains things so clearly. Love the channel. Thanks again!

@statquest 2 года назад

Thanks!

@schneeekind 5 лет назад

HAHA love your calculation sound :D :D :D

@RaviShankar-jm1qw 4 года назад

I get impressed by each video of yours..and in free time recapitulated what you taught in the videos, sometimes. Awesome Josh!!!

@statquest 4 года назад

Awesome! Thank you!

@mrcharm767 Год назад

u just cant imagine how great this way .. this could not be learnt better than this video

@statquest Год назад

Thanks again! :)

@alexthunder3897 4 года назад

I wish math in real life happened as fast as 'dee doo dee doo boop' :D

@statquest 4 года назад

Me too! :)

@univuniveral9713 4 года назад

pee dee doo poo dee deee poop.

@breakurbody 5 лет назад

Thank you Statquest. Was eagerly waiting for Adaboost, Clearly Explained.

@statquest 5 лет назад

Hooray!!! :)

@shashiranjanhere 5 лет назад

Looking forward to Gradient Boosting Model and implementation example. Somehow I find it difficult to understand it intuitively. Your way of explaining the things goes straight into my head without much ado.

@statquest 5 лет назад

Awesome! Gradient Boosting should be available soon.

@atinsingh164 5 лет назад

Thanks, that will be very helpful!

@statquest 5 лет назад

@@atinsingh164 I'm working on it right now.

@LizaSkachkova 10 месяцев назад

I came here to learn some ml. 10 seconds later I am watching the whole pack of bo burnham’s songs. Thank you!!

@statquest 10 месяцев назад

bam!

@mashinov1 5 лет назад

Josh, you're the best. Your explanations are easy to understand, plus your songs crack my girlfriend up.

@statquest 5 лет назад

That's awesome!! :)

@harshtripathi465 5 лет назад

Hello Sir, I really love the simple ways in which you explain such difficult concepts. It would be really helpful to me and probably a lot of others if you could make a series on Deep Learning, i.e., neural networks, gradient descent etc. Thanks!

@statquest 5 лет назад

Thank you so much! I'm working on Gradient Descent right now. I hope it is ready in the next week or two.

@ericcartman106 2 года назад

This guy's voice is so calming.

@statquest 2 года назад

@kaicheng9766 2 года назад

Hi Josh, great video as always! Questions: 1. Given there are 3 attributes, and the reiterative process for picking 1 out of the 3 attributes EACH TIME, I assume an attribute could be reused for more than 1 stump? and if so, when we do stop reiterating? 2. Given the resampling is by random selections (based on the new weight of course), I would assume that means everytime we re-do AdaBoost we may get different forests of stumps? 3. Where can we find more info on using Weighted Gini Index? Will they yield same model? or it can be very different? Thank you!

@statquest 2 года назад

1) The same attribute can be used as many times as needed. Keep in mind that, due to the bootstrapping procedure, each iteration gives us a different dataset to work with. 2) Yes (so, consider setting the seed for the random number function first). 3) I wish I could tell you. If I had found a good source on the weighted gini, I would have covered it. Unfortunately, I couldn't find one.

@karannchew2534 3 года назад

For my future reference. 11:36: If prediction for a sample was wrong, then increase its weight for future correction. If Amount Of Say is high (the tree is good), increase the weight more. Wrong - > Increase weight. Better Tree -> more amount of say -> adjust weight more

@statquest 3 года назад

Noted

@jonasvilks2506 5 лет назад

Hello. There is a little error in arithmetics. But AdaBoost is clearly explained! Error on 10:18: Amount of Say for Chest Pain = (1/2)*log((1-(3/8))/(3/8)) = 1/2*log(5/8/3/8) = 1/2*log(5/3) = 0.25 but not 0.42. I also join others in asking to talk about Gradient Boosting next time. Thank you.

@statquest 5 лет назад

Aaaaah. There's always one silly mistake. This was a copy/paste error. Oh well. Like you said, it's not a big deal and it doesn't interfere with the main ideas... but one day, I'll make a video without any silly errors. I can dream! And Gradient Boosting will be soon (in the next month or so).

@williamzheng5918 5 лет назад

@@statquest Don't worry about small errors like these, your time is GOLD and shouldn't be consumed by these little mistakes, use it to create more 'BAM'! The audience will check the errors for you! All you need to do is to pin that comment when appropriate so that other people will notice. PS, how to PIN a comment (I paste it here to save your precious time ^_^) : - Sign in to RU-vid. - In the comments below a video, select the comment you want like to pin. - Click the menu icon > Pin. If you've already pinned a comment, this will replace it. ... - Click the blue button to confirm. On the pinned comment, you'll see a "Pinned by" icon.

@butchdavis2062 3 года назад

Listening to this intro I can 100% guarantee that Josh is a 90's grunge/punk fan. Major Nirvana, Babes In Toyland, L7 vibes there

@statquest 3 года назад

bam! :)

@Otonium 5 лет назад

Thank you for all those crystal clear explained videos. Really appreciated.

@statquest 5 лет назад

Thank you! :)

@rahimimahdi 5 лет назад

Thanks, I’m learning machine learning with your cool videos

@statquest 5 лет назад

@@rahimimahdi Hooray! :)

@jaychiang4440 3 года назад

Pretty intuitive for me after reading over complex computation of Adaboost!

@statquest 3 года назад

Glad it helped!

@endlessriddles2663 4 года назад

Your videos have seriously been saving me! Thank you so much and keep them coming!

@statquest 4 года назад

Thank you very much! :)

@beshosamir8978 2 года назад

I swear that is the greatest channel about machine learning and statistics , Great job josh I just have a quick question : what if we have stump that both children say (left children get 2 yes 0 no ) and (right children get 2 yes and 1 no) and that is the best we could come with so what should i do ? i saw a different video and i found that he classify left children as yes and right children with no and he say the first stump make 2 errors but how !!!!!!!!!!!!!!!!!! we say that the node vote for a majority so it should be both say yes ,and the right child get 1 error so first stump made one error , right ?

@statquest 2 года назад

I believe they should both vote yes.

@jofrasavi85 10 месяцев назад

you have a good sound there man !. Want to hear the whole song ! Masssshiiiine Learningggg look so complicated.

@statquest 10 месяцев назад

bam! :)

@jofrasavi85 10 месяцев назад

bam ! bam ! aaaaaaa !!!! ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-jW3KrI_9s5s.html @@statquest

@luisloretdemola1870 6 месяцев назад

Excellent explanation. Thank you

@statquest 6 месяцев назад

Glad you liked it!

@nivednambiar6845 2 года назад

Awesome explanation for adaboost

@statquest 2 года назад

Thank you!

@jaivratsingh9966 5 лет назад

Thanks, Josh for this great video! Just to highlight, at 10:21 your calculation should be 1/2 * log((1-3/8)/3/8)=1/2*log(5/3) How did you conclude that the first stump will be on weights? because of min total error or min total impurity among three features? It might happen that total error and impurity may not rank the same for all features, though they happen to be the same rank here.

@statquest 5 лет назад

I've put a note about that error in the video's description. Unfortunately RU-vid will not let me edit videos once I post them. The stump was weighted using the formula given at 7:32

@pravin8419 4 года назад

Hi Josh. Awesome video again. My doubt: We choose 1st root,say Chest pain, based on Gini index. The Gini index for 2nd root will be calculated for all features or will we exclude Chest pain from subsequent stumps?

@statquest 4 года назад

All features will be considered for the second stump.

@ugwuchiagozie6990 Месяц назад

Thanks so much for easy explanations

@statquest Месяц назад

You are welcome!

@harrymao6870 2 года назад

Really good video. What is weighted Gini though? I know that gini impurity is mentioned in decision tree series, yet i cannot find the definition of weighted Gini. thx

@statquest 2 года назад

I can't either. I've only read that it can be used. However, I've never found the equation for it.

@tanvirkaisar7245 2 года назад

Thanks for the great explanation! One question though, how long the stumps will be created? That is- what is the termination criteria for the training stage?

@statquest 2 года назад

You know when you have made enough stumps when predictions no longer improve.

@harvey2242 3 года назад

For future video, maybe consider changing term "sample" to "instance" could help to differentiate

@statquest 3 года назад

Lately I've been calling them "observations" instead of "samples".

@devmani100 3 года назад

Great Video Josh, Concept clearly explained but there is a miscalculation at 10:18. Amount of Say should be 0.5*log(5/3)

@statquest 3 года назад

That's a known error mentioned in a pinned comment.

@haneulkim4902 3 года назад

Thanks Josh! I have 3 questions: 1. @3:43 you say weak learners are "almost always stumps" is there a case where it is not a stump? 1.a. also what is adavtage of using stump over bigger trees? 2. Does boosting algorithm only use decision trees?

@statquest 3 года назад

1) I don't know. However, it's possible and easily done (i.e. in sklearn you just change the default value for "depth"). 2) It probably depends on your data. 3) In theory, no. In practice, yes.

@aglaiawong8058 4 года назад

Excellent tutorial, thanks! Would like to ask: 1) after recalculating and updating weights for each sample, the process repeats. And in first step we found the 'weight' has the lowest Gini, then in next round we exclude the 'weight' and only consider the remaining criteria? or not? I see the 'weight' is reused when bootstrapping is used, but how about if we still using Gini? 2) if same set of criteria is re-used, then we would get a bunch of stumps with same criteria but differ only with the amount of say. Why is that useful in this case, other than simply using each criteria once only? 3) if the same set of criteria is re-used, then how can we determine when to stop the stump building process and settle down to calculate which classification result group has larger amount of say? Thx~

@statquest 4 года назад

Every stump selects from the exact same features (in this case, the features are "chest pain", "blocked arteries" and "patient weight"), however, the sample weights are always changing and this results in bootstrap datasets that contain different samples for each stump to divide. That said, while "Patient weight" might work well on the first stump, it might not work well in the second stump. This is because every sample that "Patient weight" misclassified will have a larger weight and thus, a larger probability of being included in the next bootstrapped dataset.

@aglaiawong8058 4 года назад

@@statquest this is a great explanation. Thank you very much!

@diamondcutterandf598 4 года назад

When do we stop building stumps? Is it decided by the user?

@statquest 4 года назад

We can either build until we stop improving predictions or we reach a maximum number of stumps (specified by the user).

@anushagupta4944 5 лет назад

Thanks a lot for this video Josh. So fun and easy to understand. Keep up the good work.

@sharveshsubhash3955 2 года назад

Come on people, This guy needs to be protected, get a Million subscribers for him !

@statquest 2 года назад

Thank you!! :)

@devarshigoswami9149 4 года назад

It'd be really refreshing to hear an actual model make dee doo dee doo boop' sounds while training.

@statquest 4 года назад

@nikhilchalla6658 Год назад

Hello Josh, Thank you for the amazing videos. I had a couple of questions on stumps that are created after Sample weights are updated at time 16:00. We continue with sampling from the full set assuming new weights. This means the new set will be a subset of the original dataset as you explained in 18:32 . Going forward all subsequent stumps will only work on smaller and smaller subsets, making it a bit confusing for me on how to ensure good randomness. 1. Do we also restart from the updated sample weights at 16:00 and redo sampling, thereby creating multiple different datasets but using the same sample weight values? This will probably ensure we use all the data from the original data set in some instances. 2. As a follow-up to Q1, do we perform multiple sampling at different levels to get more variations in the dataset before creating stumps? In the video, you described only one instance of sampling using new weights but I assume it needs to be performed multiple times to get variations in datasets. 3. Do we not sample from distribution at the very beginning also? All sample weights will have the same value but random sampling would mean some might not make it to the first round of ( stump creation + Amount of say + New sample weights ) itself to get unequal sample weights at 16:00. In the video you take all the samples by default hence the query.

@statquest Год назад

You just add the excluded samples back to the pool with their old weights and continue from there.

@Isa47938 2 года назад

I have a quick question- why we still kept Weights in determining which one would the second stump? I thought it should be excluded from the remaining stump pool once it's used for classification.

@statquest 2 года назад

We reuse samples, over and over, for each tree.

@daesoolee1083 4 года назад

Well explained and easy to understand.

@statquest 4 года назад

Thanks! :)

@georgemak328 Год назад

Great video. Congratulations.

@statquest Год назад

Thank you!

@__Kimes 2 года назад

Me getting better at ML

@statquest 2 года назад

@gabrielcournelle3055 4 года назад

Wow, that was great. I love how you make it sound so simple. Just a question about the construction of the third stump without weighted gini. Suppose sample number 5 was not picked to build my new dataset to feed to stump number 2. Can sample number 5 be picked to feed to stump number 3 or do I lose it for the rest of the stumps ?

@statquest 4 года назад

you just lose it for one stump. After that it goes back in the pool and can be selected for any of the other stumps.

@cwc2005 4 года назад

@@statquest Hi Josh, but wouldn't there be no sample weight assigned to that number 5 if it is not iterated in the 2nd stump? without a sample weight as a result of stump 2, then how would it be randomly selected in the selection for the 3rd stump? Does that mean the changes to the weights from stump 2, will be only applied to samples in stump 2 but the samples not selected will still retain their weights from stump 1, and the normalization of weights then be done for all samples regardless of them in stump 2 or not?

@statquest 4 года назад

@@cwc2005 I believe the old weights are retained for samples not included in building the new stump.

@marcelocoip7275 2 года назад

@@statquest But in that case, the elements that were not picked will be "more relevant" in the next stump? Seems like a weakness/inconsistency of the method.

@ANANDKUMAR-sd3is 2 года назад

Hi Josh. Thank You very much for this. I had a small doubt. Suppose we have a bad stump whose say is negative. While updating sample weights, we will decrease weights of incorrectly classified samples. This in turn will make the say even more negative, i think. So my question was why don't we increase the weights of misclassified samples for bad classifiers also so that my stump goes from a bad classifier to a good classifier?

@statquest 2 года назад

To be honest, in practice, I don't think it is possible for a stump to perform worse than 50/50 so the amount of say is never negative.

@akashprabhakar6353 3 года назад

Awesome video.Thanks!

@statquest 3 года назад

Glad you liked it!

@krishnapranavayyagari5799 5 лет назад

Really loved your explanation 😄

@statquest 5 лет назад

Thanks! :)

@MarZandvliet 4 года назад

Forest of Stumps is my new band name

@statquest 4 года назад

I love it!

@TheOraware 4 года назад

Awesome explanation , so in a nutshell if i say adaboost filter out correct classified sample and move misclassified in each new sampling for new stump right?

@statquest 4 года назад

Yep, that's pretty much it.

@TheOraware 4 года назад

@@statquest for next stump after sampling , any variable can be a a candidate of next stump based on its gini ? For example "Weight" feature has lowest gini that's why it was first stump , after doing iterations (calculating say , sample weights) next stump will be chosen based on each features smallest gini right?

@statquest 4 года назад

Yep! :)

@bextuychiyev7435 3 года назад

This is your best silly song ever!

@statquest 3 года назад

bam!

@anderarias 5 лет назад

Awesome explanation. Thank you so much!

@statquest 5 лет назад

You're welcome :)

@kamran_desu 4 года назад

Hi Josh, great explanation as always. Can you please give guidance on how Adaboost does the 'amount of say' calc and final prediction for regression? Also found something every interesting that in the sklearn Adaboost implementation it allows you choose the base learners between trees and linear models. :)

@statquest 4 года назад

Presumably the amount of say for each tree is used to calculate the weighted average of the output values of all the trees.

@picowill 2 года назад

In response to Note 3 in your corrections, how exactly does this work? Is the dataset returned back to the previous dataset exactly (before the bootstrap one), or do the sample weights need adjusting?

@statquest 2 года назад

We return the data with their weights, and then use all of the data to adjust all of the weights.

@tianhuicao3297 3 года назад

Hey Josh!! This is awesome : ) I'm just wondering what is the cutoff point for AdaBoost, are we going to keep adjusting weights?

@statquest 3 года назад

You can set a maximum value for the number of stumps to make, or you can build stumps until predictions no longer improve.

@tianhuicao3297 3 года назад

@@statquest Bam!!!!Thank you.

@gwenxu7230 11 месяцев назад

I couldn't figure out the weighted gini index instead of the bootstraping method, especially where the weight should be applied, do we need to recalculate the weight within the node? Could you explain more of this to me? Thanks!!

@statquest 11 месяцев назад

Unfortunately I've never been able to find the equation for the weighted gini. I've read that it exists, but that's all I know.

@abrahamgk9707 3 года назад

Hai Josh, I am really enjoying your videos. If you could help me answer my one question, I would be very grateful. In the new data set that you have created, it seems like the probability of appearing the wrongly classified sample is more. But how picking a random number between 0 and 1 helps us put the wrongly classified sample in the new dataset more number of times?

@statquest 3 года назад

This is answered at 16:24. Can you specify what part does not make sense to you?

@abrahamgk9707 3 года назад

@@statquest In 16:24, we use sample weight as a probability distribution and the 4th sample has got highest probability. The sample weights that we have assigned are artificial and doesn't reflect true distribution. But still in the new the new dataset (in 18:02), the sample with the highest sample weight (ie, 4th sample ) has occurred more number of times even though those sample weights doesn't reflect true probability distribution

@statquest 3 года назад

@@abrahamgk9707 I'm not sure I understand what you mean by "true distribution", but, based on the sample weights at 16:24 here is how we select samples. We pick a number between 0 and 1. Since the weight for the first sample is 0.07, if that number falls between 0 and 0.07, we select the first sample. Since the second weight is 0.07, and 0.07 + the previous threshold, 0.07 = 0.14, then if that number falls between 0.07 and 0.14, we select the second sample. Since the third weight is 0.07, and 0.07 + the previous threshold, 0.14 = 0.21, then, if that number falls between 0.14 and 0.21, we select the third sample. Since the fourth weight is 0.49, and 0.49 + the previous threshold, 0.21 = 0.70, then, if that number falls between 0.21 and 0.7, we select the fourth sample, etc. etc. The range of values that select the fourth sample, 0.21 to 0.7, is 7 times greater than the range of any other individual sample.

@abrahamgk9707 3 года назад

@@statquest Thank you so much

@minhcongnguyen3903 5 лет назад

If Total Error = 1/2 => Amount of Say = 0 => New Sample Weight isn't updated? How to resolve?

@statquest 5 лет назад

A different bootstrap dataset will have different samples in it, which will, hopefully, improve (or worsen) the Total Error. If not, you can stop after a few more trees since prediction will not improve, or you can stop when the maximum number of trees are created.

@jiangyuanliu7968 2 года назад

Awesome video! I am wondering how many stumps should be generated. Does it depend on the number of predictors in my dataset? Or we can choose the best one with cross-validation. Also, is it possible that the same predictor will be used multiple times in different stumps even with different input sample weights?

@jiangyuanliu7968 2 года назад

Also, is it possible that some samples are always misclassified after convergence using many stumps? This will not cause overfitting, correct?

@statquest 2 года назад

You can figure out the number of stumps by seeing when the classifications no longer improve (this usually means that some samples are never correctly classified). And the same predictor can be used multiple times.

@aryamansinha2932 5 месяцев назад

if lets say your goal is to maximize recall (not accuracy) where would that change be applied to choosing the next tree? or would it be in the amount of say calculation?

@statquest 5 месяцев назад

Presumably it would be applied to how you evaluated each tree.

@elvykamunyokomanunebo1441 Год назад

Hi Josh, Correct me if I'm wrong: the Gini index is a measure of inequality between two population so if one group is much larger than the other then the Gini should be high. So how come Weight has the lowest Gini despite having the maximum inequality out of the three features? Thanks in advance

@statquest Год назад

Unfortunately there are two Gini Indices. One used for populations that measures inequality (for details see: en.wikipedia.org/wiki/Gini_coefficient ) , and another used in decision trees, like these, that measures impurity (for details see: en.wikipedia.org/wiki/Decision_tree_learning#Gini_impurity ). The leaves for "weight" have the least amount of impurity, so it has the lowest Gini. Unfortunately Gini Index is used for both terms, so I can understand why this is confusing.

@elvykamunyokomanunebo1441 Год назад

@@statquest thanks Josh

@doubletoned5772 4 года назад

Q1. 11:58 since amt of say can be negative as well, shouldn't the graph and x-axis extend towards the left? Q2. If a tree has a negative amount of say then a correctly classified sample will be assigned a higher weight than the sample incorrectly classified. Looks confusing why you would assign a higher weight to a sample that has been correctly classified even if it the tree overall has a negative amount of say.

@statquest 4 года назад

A1. Sure, you can extend the graph in the negative direction. You get values closer and closer to 0 the more negative the amount of say is. A2: If a tree has a negative amount of say, that means it said most of patients in the training dataset with heart disease did not have have heart disease and most of the patients without heart disease had heart disease. Thus, if this tree "correctly" classifies a new sample, it grouped it with the observations with the opposite value, which means it did a bad job categorizing the samples. Thus, we need to spend more effort trying to group it with the same value.

@HarpreetKaur-qq8rx 4 года назад

The other thing I am not clear about is that the subsequent stumps that are made receive the output from the previous stumps or not (I mean not the error values etc but lets say a new observation needs to be classified) then will that new observation go through stump 1 and then through stump 2 and so on and so forth to be labelled at the end or does the new observation will be classified as having a heart disease by stump 1 which has amount of say lets assume 0.2 and then stump 2 classifies as no heart disease with amount of say lets say 0.4 and stump 3 classifies the new observation to have heart disease with amount of say as 0.2 again . Then will stumps 1 and 3's amount of say will be added or averaged or what will happen. Also you mentioned that stumps are created with two leaf nodes, then how do you deal with multinomial variables both in the case of independent and dependent variables

@statquest 4 года назад

The amounts of say are just added at the end. In your case, you would get a tie, since the sum of stumps 1 and 3=0.4 for "Has heart disease" and stump 2=0.4 for "no heart disease". With multinomial situations, you create several adaboost "forests", one for each classification type and each forest tests one classification vs everything else lumped together. (so if you have 3 categories, you test 1 vs not 1, 2 vs not 2 and 3 vs not 3).

@HarpreetKaur-qq8rx 4 года назад

@@statquest Thank You!

@kartikmalladi1918 5 дней назад

There are 3 amount of says calculated. Which one do we use when we pick sample?

@statquest 5 дней назад

We add them up and the classification with the largest sum ends up being the final classification for a sample. See: 19:33

@aipithicus 4 года назад

Question: When determining which features to split given updated sample weights, how did you determine the binning / ranges to assign given random draws on (0,1) ? Specifically, why bin sample weights into (0,0.07], (0.07, 0.14], (0.14, 0.21], etc? Are these based on standard deviations of a "normal" distribution of weights over (0,1)?

@statquest 4 года назад

The bin sizes are determined by the normalized sample weights.

@aipithicus 4 года назад

Ok, but I'm still confused about how the range (0,1) of possible normalized sample weights is naturally partitioned into the ones that you used? Are the bins always the same as the ranges you provided in this example? Why pick (0,0.07), (0.07,0.14) etc instead of say (0,0.25), (0.25,0.5) etc? Sorry if I am being dense

@statquest 4 года назад

@@aipithicus The normalized weights add up to 1 ( 15:07 ), which makes them suitable for describing a discrete probability distribution. This means that each normalized weight can correspond directly to a probability. For example, the normalized weight for the first sample is 0.07 and that means that it should have a probability of 0.07 to be picked for the bootstrapped dataset. Thus, at 16:25 we pick a random number between 0 and 1. Because the probability that the random number will fall between 0 and 0.07 is 0.07, we use that range to decide if we should select the first sample. Because the next sample also has a normalized weight of 0.07, it should have a probability of 0.07 to be picked for the bootstrapped dataset. Because the probability that the random number will fall between 0.07 and 0.14 is 0.07, we use that as the range to decide if we should select the second sample. etc.

@redsapph1re Год назад

At 13:48, Why do we want to make the new sample weight very small for correctly classified samples if the amount of say is very large?

@statquest Год назад

If we are already correctly classifying a sample, then we don't need to focus the next stump on correctly classifying as much as we need for the next stump to correctly classify the samples that were not correctly classified.

@dvdmrn 4 года назад

The light inside of him briefly went out at 13:02

@statquest 4 года назад

@santoshbala9690 4 года назад

Hi Josh... Thank you very much.. Excellent teaching.. I have a doubt though.. Once the new weights are created and teh new dataset is selected @16.50, it is selected in random, then how does it ensure that sample with higher weights get more chance.. because the number chosen is random?..

@statquest 4 года назад

The weights are equivalent to the probability that an observation will be added to the new dataset. So if the weight for an observation is low, because it was correctly classified by the last tree, then it has a low probability of being added to the new dataset. If the weight for an observation is high, because it was incorrectly classified by the last tree, then it has a high probability of being added to the new dataset.

@santoshbala9690 4 года назад

Thank You Josh..

@DarkNinja-24 2 года назад

Quadruple bam to your clear video!!!!

@statquest 2 года назад

Thank you! :)

@graceqin5024 2 года назад

First of all, I love your videos! I think there is another minor mistake when calculating the amount of say fo Weight: 1/2*log(7) = 0.422549. not 0.97. I think.

@statquest 2 года назад

If you use log base 10, you get 0.42. If you use log base 'e', you get 0.97. In statistics, machine learning and most programming languages, the default log function is log base 'e', so that is the log that I'm using here. If you want to use a different log, like log base 10, that's fine, just be consistent.

@mahammadodj Год назад

Thank you so much! How total is error is calculated in AdaBoost Regressor?

@statquest Год назад

Unfortunately I don't know how AdaBoost regression works.

@Djinson Год назад

Isn't the total error of the first stump just 1, since 1 entry was missclassified? Wouldn't the correct term for 1/8th be relative error?

@statquest Год назад

I'm just using the terminology that is traditionally associated with AdaBoost.

@ghofranezouaoui4269 2 года назад

Thank you very much for this video!! I have a question : at 19:13, How does a stump finally classify a patient as "Has heart disease" or "Does not have a heart disease" . I thought it depends on the value of the feature in the root node ...

@statquest 2 года назад

The leaves of each stump make the classification based on the feature and threshold in the root. Does that make sense or are you asking about something else?

@ghofranezouaoui4269 2 года назад

@@statquest Thank for you answer. Yes it does make sense but you're saying that the leaves of each stump make the classification and in 19:13 we're considering that the whole stump is doing the classification, I didn't understand the transition from the classification of each leaf to the classification of the whole stump !

@statquest 2 года назад

@@ghofranezouaoui4269 I hope it's all clear now. We get some new data, and apply it to each stump. Each stump has a root that sends us to a leaf, and the leaf gives us the classification for that stump, given the data.

@ghofranezouaoui4269 2 года назад

@@statquest Yes now I got it! I forgot that at that point we're considering new data so it's either this or that. Thank you so much for keeping things so simple and clear and for your quick responses :)

@amalsunil4722 3 года назад

On what basis are we labeling that the left child node to be "Yes heart disease" and the right child node "No heart disease"? Cuz we just split the data based on a condition yea...or is it like we train the stump and then feed the dataset into this stump to get the Correct and Incorrect values? If that's the case how's it possible in the case of Blocked Arteries when we have both child nodes as purely impure.

@statquest 3 года назад

The stumps are just short classification trees, which are explained here: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-7VeUPuFGJHk.html

@amalsunil4722 3 года назад

@@statquest Got it sir thanks a ton!

@statquest 3 года назад

@@amalsunil4722 correct

@timlee7492 2 года назад

Thanks Josh! Quick question . Does adaboot use every data at the beginning of the model?

@statquest 2 года назад

Yes

@timlee7492 2 года назад

@@statquestThanks for your prompt answer! How do we determine the number of iteration ? Is it follow with the how many data point we have ?

@statquest 2 года назад

@@timlee7492 You continue to make trees until the predictions no longer improve.

@QuranKarreem Год назад

Hi, thank you for your clear explanation. I have one question: Is the learning rate set to 0.5 in the calculation of the "amount of say"? because you didn't illustrate it

@statquest Год назад

The concept of a learning rate was never part of the original algorithm, which is why I didn't illustrate it. To be honest, I'm not really sure how it would work in this context. If we scaled every "amount of say" by 0.5, then we would still get the exact same results.

@QuranKarreem Год назад

@@statquest Got it, thank you Keep it up, you have a great way of demonstrating and explaining

@QuranKarreem Год назад

@@statquest and this's the formula which learning rate is hyperparameter amount_of_say = learning rate*log(1-e/e) the same formula

@statquest Год назад

@@QuranKarreem Maybe. The original algorithm just used 0.5 as a fixed value.

@QuranKarreem Год назад

@@statquest yeah ,Got it I don't doubt your interpretation, I'm just arguing with you

@fgfanta 5 лет назад

Thanks! What about gradient boosting? Is it used for genomics? I am aware that it has been successful used in Kaggle competitions, but don't find applications to genomics, in spite of the support of XGBoost and CatBoost for R.

@statquest 5 лет назад

You know what's really funny - I just wrote a genomics application that uses XGBoost, so I know it can work in that setting. I'm using it to predict cell type from single-cell RNA-seq data. It works better than AdaBoost or Random Forests. However, it turns out that Random Forests have some nice statistical properties that make me want to use them over gradient boost. I may pursue both methods.

@arunkumarn6371 4 года назад

Hi Josh, Thanks for the awesome video. Just a query here, you are creating some new sample dataset, it is kind of bagging(bootstrapped dataset) in random forest? Thanks in advance :)

@statquest 4 года назад

Sort of, however, the goals of the sampling are different. In this case we want the new sample to reflect the weights that we created.

@shubhamtalks9718 4 года назад

Thank you.

@statquest 4 года назад

@aleksey3231 4 года назад

Please, Can anyone make 10 hours version 'dee doo dee doo boop'?

@statquest 4 года назад

You made me laugh out loud! :)

@50NTD 4 года назад

sounds good, i want it too

@sketchbook8578 3 года назад

@@statquest I would seriously play it for my background music during work... Please make one lol.

@swaralipibose9731 3 года назад

I also want some 'dee doo Dee doo boop '

@VLM234 3 года назад

@StatQuest how to apply adaboost for regression?

@indrab3091 4 года назад

Einstein says "if you can't explain it simply you don't understand it well enough" and i found this AdaBoost explanation bloody simple. Thank you, Sir.

@statquest 4 года назад

Thank you! :)

@iftrejom 2 года назад

Josh, this is just awesome. The simple and yet effective ways you explain otherwise complicated Machine Learning topics is outstanding. You are a talented educator and such a bless for the entire ML / Data Science / Statistics learners all around the world.

@statquest 2 года назад

Awesome, thank you!

@jatintayal1488 5 лет назад

That opening scared me..😅😅

@OttoFazzl 5 лет назад

You were scared to learn that ML is not so complicated? BAMM!

@abhishek007123 4 года назад

Lolo

@dimitriskass1208 4 года назад

The real question is: Is there a model which can predict the volume of "bam" sound ?

@statquest 4 года назад

Great Question! :)

@dimitriskass1208 4 года назад

@@statquest 😆😆

@indrab3091 4 года назад

The Bam has total error 0, so the amount of say will freak out :)

@pabloruiz577 5 лет назад

AdaBoost -> Gradient Boosting -> XGBoost series will be awesome! First step AdaBoost clearly explained : )

@statquest 5 лет назад

I'm just putting the finishing touches on Gradient Descent, which will come out in a week or so, then Gradient Boosting and XGBoost.

@pabloruiz577 5 лет назад

That sounds great@@statquest! I guess you are the Machine Teaching

@Criptoverso_oficial 5 лет назад

@@statquest I'm waiting this as well!

@maleianoie7774 5 лет назад

@@statquest when will you post Gradient Boosting and XGBoost?

@shivaprasad1277 5 лет назад

@@statquest waiting for Gradient Boosting and XGBoost

@shaunchee 4 года назад

Man right here just clarified my 2-hour lecture in 20 mins. Thank you.

@statquest 4 года назад

Bam! :)

@dreamhopper 4 года назад

Wow. I cannot emphasize on how much I'm learning from your series on machine learning. Thank you so much! :D

@statquest 4 года назад

Hooray! I'm glad the videos are helpful. :)

@prudvim3513 5 лет назад

I always love Josh's Videos. There is a minor calculation error while calculating amount of say for chest pain stump. (1-3/8)/(3/8) = 5/3, not 7/3

@lisun7158 2 года назад

AdaBoost: Forest of Stumps 1:30 stump: a tree just with 1 node and 2 leaves. 3:30 AdaBoot: Forest of Stumps; Different stumps have different weight/say/voice; Each stump takes previous stumps' mistakes into account. (AdaBoot, short for Adaptive Boosting) 6:40 7:00 Total Error: sum of (all sample weights (that associated with incorrectly classified samples)) 7:15 Total Error ∈ [0,1] since all sample weights of the train data are added to 1. (0 means perfect stump; 1 means horrible stump) --[class notes]

@statquest 2 года назад

bam!

@emadrio 5 лет назад

Thank you for this. These videos are concise and easy to understand. Also, your humor is 10/10

@statquest 5 лет назад

Thanks! :)

@bhupensinha3767 5 лет назад

Hi Josh, excellent video. But I am not able to understand how weighted gini index is calculated after j have adjusted the sample weights ... Can you PL help?

@haobinyuan3260 4 года назад

I am confused as well :(

@jiayiwu4101 3 года назад

It is same as Gini Impurity in Decision Tree video.

@DawFru 2 года назад

Take the example of Chest Pain Gini index = 1 - (3/5)^2 - (2/5)^2 = 0.48 for the Yes category Gini index = 1 - (2/3)^2 - (1/3)^2 = 0.44 for the No category Since each category has a different number of samples, we have to take the weighted average in order to get the overall (weighted) Gini index. Yes category weight = (3 + 2) / (3 + 2 + 2 + 1) = 5/8 No category weight = (2 + 1) / (3 + 2 + 2 + 1) = 3/8 Total Weighted Gini index = 0.48 * (5/8) + 0.44 * (3/8) = 0.47

@catherineLC2094 4 года назад

Thank you for the study guides Josh! I did not know about them and I spend 5 HOURS making notes about your videos of decision trees and random forests. I think 3 USD value less than 5 hours of my time, I purchased the study guide for AdaBoost and cannot wait for the rest of them (specially neural networks!)

@statquest 4 года назад

Hooray!!! I'm so happy you like them. As soon as I finish my videos on Neural Networks, I'll start making more study guides.

@tejpunjraju9718 4 года назад

"Devmaanush" hai ye banda! Translation: This dude has been sent by God!

@statquest 4 года назад

Thank you very much! :)

@aakashjain5035 5 лет назад

Hi Josh you are doing great job. Can you please make a video on Xgboost. That will be very helpful

@harry5094 5 лет назад

Hi Josh, Love your videos from India, Can you please tell me how to calculate the amount of say in regression case and also the sample weights? Thanks

@saisuryakeshetty7589 Год назад

Did you get your answer? If yes, could you please explain

@swadhindas5853 5 лет назад

Ammount of say for chest pain how 7/3 i think it will be 5/3

@holloloh 5 лет назад

Could you elaborate on weighted gini function? Do you mean that for computing the probabilities we take weighted sums instead of just taking the ratio, or is it something else?

@mario1ua 9 месяцев назад

I understand he calculates Gini for every leaf, then multiplies by whatever number of predictions is in that leaf and divides by total number of predictions in both leafs (8) so this index is weighted by the size of that leaf. Then sums weighted indices from both leafs. At least I'm getting the same results when applying this formula.

@fredklan9872 5 лет назад

And .422 for the calc at 9:15?? What am I doing wrong? I typed "log(7)" into google and it says .84509 and then multiply by .5 and I get .422

@statquest 5 лет назад

In this case, you need to use "log base 'e'", or the "natural log". In machine learning and statistics, people often use "log()" to mean the natural log. I know this is confusing, but, as long as you are consistent, you can use any log base and things will work.