Тёмный

XGBoost Part 4 (of 4): Crazy Cool Optimizations 

StatQuest with Josh Starmer
Подписаться 1,3 млн
Просмотров 92 тыс.
50% 1

Опубликовано:

 

2 окт 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 217   
@statquest
@statquest 2 года назад
Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/
@박상호-e2m
@박상호-e2m 4 года назад
I appreciate your endeavor making this video. You read and summarized the paper of XGBoost! I think you are the one who best explains XGBoost in the world!!!
@statquest
@statquest 4 года назад
Thank you very much! It took a long time to figure out all of the details, but I think it was worth it. :)
@bazejmarciniak5682
@bazejmarciniak5682 3 года назад
This is funny how I always like StatQuest videos before watching them... and never regret it :D Exactly the same way as with this comment, I am so sure it's true ;)
@rohitbansal3032
@rohitbansal3032 2 года назад
Not just XGBoost but anything
@aditya4974
@aditya4974 4 года назад
There is only one channel whose videos can be liked even before watching for one second, and that's StatQuest! Bam!
@statquest
@statquest 4 года назад
Thank you very much! :)
@baochung1751
@baochung1751 3 года назад
Great series on XGBoost! Thank you very much for making them. Now I got more clear understanding about XGBoost, especially its boosting trees and computational advantages that make it fast. The animations are wonderful, making things more easily to follow. I love them so much. Hope that you will make something on LightGBM and CatBoost.
@statquest
@statquest 3 года назад
I'm working on LightGBM and CatBoost.
@dandyyu0220
@dandyyu0220 2 года назад
Thank you thank you for such a great series of XGBoost videos, so clear explained while in depth!
@statquest
@statquest 2 года назад
Glad you like them!
@aaryannakhat1842
@aaryannakhat1842 4 года назад
All I can say is: BAM!! Double BAM!! Triple BAM!! Mega BAM!! And finally, EXTREME BAM!!!!!! Thank You Josh for these wonderful lectures!
@statquest
@statquest 4 года назад
BAM! :)
@zhihaoxu756
@zhihaoxu756 3 года назад
Wow! How is it possible there are some downvotes??? This video is incredible. I vote for StatQuest, thank you so much!
@statquest
@statquest 3 года назад
Thank you very much! BAM! :)
@yanbowang4020
@yanbowang4020 4 года назад
Hi Josh, would you please do a series to clearly explain Bayesian & Genetic hyperparameter tuning algorithms?
@georgeli6160
@georgeli6160 4 года назад
I second this
@bhishanpoudel8707
@bhishanpoudel8707 3 года назад
I would also second this!
@dihancheng952
@dihancheng952 2 года назад
your videos never disappoint me, I feel I can click on "like this" even before watching your videos
@statquest
@statquest 2 года назад
bam! :)
@soujanyapm480
@soujanyapm480 3 года назад
OMG !!!! what an explanation ! extremely detailed explanation of extreme gradient boosting. I think no one on this planet can explain this topic like you Josh! You have literally done the autopsy of this algorithm to get into that details:) Thanks a ton for this amazing video !!!!
@statquest
@statquest 3 года назад
BAM! Glad you liked it!
@mohammadshahadathossain1544
@mohammadshahadathossain1544 4 года назад
Informative video with detailed explanation.
@statquest
@statquest 4 года назад
Thanks! :)
@samerrkhann
@samerrkhann 3 года назад
I couldn't thank you enough for the efforts that you have put into this series of lectures. Thank You JOSH!
@statquest
@statquest 3 года назад
Thank you very much! :)
@tejashshah5202
@tejashshah5202 3 года назад
Please accept my virtual standing ovation :) I finished the entire XGBoost tutorial and you made it sound super simple. Please also add LightGBM to the list. Thanks again for all your work.
@statquest
@statquest 3 года назад
Thank you! :)
@the-brick-train
@the-brick-train 4 года назад
This is pure class
@statquest
@statquest 4 года назад
Thanks! :)
@thedatascientistchannel7176
Simple but powerful!!!
@statquest
@statquest Год назад
Bam! :)
@TheShadyStudios
@TheShadyStudios 3 года назад
you are the GOAT!!
@statquest
@statquest 3 года назад
Thank you! :)
@TZarKar
@TZarKar 4 года назад
Thank you so much for doing this ♥️ideo. Would it be possible to do videos on Ensemble Machines.
@nitinsiwach1989
@nitinsiwach1989 Год назад
Finally.. graduated in tree based algos from the statquest academy. What a feeling :) :)... One absolute last hiccup-- What does "xgboost splits the data so that both drives get a unique set of data" mean? What is a unique set of data there? And why does it ensure that parallel reads can happen? Why can't parallel reads happen if the "unique set of data" isn't there on different drives?
@statquest
@statquest Год назад
The idea is that 1) we start out with a dataset that is too large to fit onto the same disk drive, so we have to split it up. We could split it so that there is some overlap between what goes on drive A and what goes on drive B, but there's no speed advantage to that. Instead, if each drive has a unique subset of the data, we can call each drive simultaneously (in parallel) to access records.
@tonywang5203
@tonywang5203 3 года назад
This is dope.
@statquest
@statquest 3 года назад
Thanks!
@sureshparit2988
@sureshparit2988 4 года назад
Josh aka Heisenberg, You're Awesome. Thanks for the series.
@statquest
@statquest 4 года назад
Thank you very much! :)
@ericzhang5987
@ericzhang5987 2 года назад
Amazing!!!
@statquest
@statquest 2 года назад
Thanks!
@SasidharKhambhampati
@SasidharKhambhampati 4 года назад
Hi Josh, Thank you very much for the videos on XGBoost. You have successfully explained the topic clearly though it is complicated..Thanks a ton
@statquest
@statquest 4 года назад
Thanks! :)
@weii321
@weii321 5 месяцев назад
Hi Josh! Your XGBoost videos are great! By the way, do you have a tutorial about LightGBM?
@statquest
@statquest 5 месяцев назад
Not yet.
@lima073
@lima073 Год назад
Thanks!
@statquest
@statquest Год назад
Muito obrigado!!! Thank you so much for supporting StatQuest!!! BAM! :)
@manuelagranda2932
@manuelagranda2932 4 года назад
I meaannnn! maybe you can see my thesis project and watch your name in the appreciations !!! jaja thanks a lot!! Grettings from Medellín-Colombia
@statquest
@statquest 4 года назад
Hola! Good luck with your thesis! I hope it goes well. Muchas gracias. :)
@brunomaciel4135
@brunomaciel4135 4 года назад
Thanks you very much for the videos. You give the best explanation. I'm looking forward to the lightGBM video (hope there will be one, someday). That would be a HUGE BAAM!
@vidhyapc
@vidhyapc 4 года назад
So grateful for these videos. Has made understandable of xgboost so simple though the concepts are little complicated. Closing my eye i can know say how the tress are built, how are they optimised, how are they faster etc
@statquest
@statquest 4 года назад
Hooray! I'm glad the videos are helpful. :)
@jjj78ean
@jjj78ean 3 года назад
Hi,Josh. Thank you for exciting explanation! What about to make the same amazing series about LightGBM and Catboost?
@statquest
@statquest 3 года назад
I'll keep that in mind.
@zeusserch98
@zeusserch98 2 года назад
Thanks for the great explanation! I have two questions: 1. How many dataset that we need so we can use parallel learning? 2. In parallel learning is we just make one tree so we can find weigth quantile sketch?
@statquest
@statquest 2 года назад
1. I'm not sure. However, it is probably mentioned in the documentation. 2. Parallel learning makes it so we can find the best feature split faster for a given node in a tree.
@abrahamgk9707
@abrahamgk9707 3 года назад
Can you please do a video on expectation maximization algorithm 🙏🙏?
@statquest
@statquest 3 года назад
I hope to do that one day.
@mikhaeldito
@mikhaeldito 4 года назад
You are amazing!
@statquest
@statquest 4 года назад
Thanks! :)
@nitinsiwach1989
@nitinsiwach1989 Год назад
This is the most lucid, most comprehensive single stop shop for the tree based algos that you are going to need in any job at all. Lightgbm missing definitely leaves something to be desired. Shall we expect a Lightgbm video anytime soon, Josh?
@statquest
@statquest Год назад
Maybe. I can't make any promises, but if I have time I'll try to squeeze it in.
@ronaldgiliolucana7210
@ronaldgiliolucana7210 4 года назад
Super cool!!!, It's the best explanation in the world.
@statquest
@statquest 4 года назад
Wow, thanks!
@nitinsiwach1989
@nitinsiwach1989 8 месяцев назад
Hello Wonderful Josh, Do you think it is time for xgboost video 5? Explaining the additions brought in by xgboost 2
@statquest
@statquest 8 месяцев назад
Maybe!
@jtruler
@jtruler Год назад
I just wanted to say thanks. I really struggled in college because I always felt like the professors explained things like we already knew everything they were teaching. The way that you break everything down is super helpful. I hope that you continue to make these sorts of videos even when you because a world famous musician. Keep it up :)
@statquest
@statquest Год назад
Ha! Thank you very much! I appreciate it. :)
@tytaifeng
@tytaifeng 2 года назад
thank you so much josh, I felt so lucky to find your videos on youtube, ultimately clearly explained and I just love it.
@statquest
@statquest 2 года назад
Thank you very much! :)
@shashankkapoor2828
@shashankkapoor2828 2 года назад
I read the paper thousand times, it was never this clear. not even close. Thank you so much.
@statquest
@statquest 2 года назад
Glad it was helpful!
@marcoscosta829
@marcoscosta829 4 года назад
Hey Josh, great job, looking forward for your next video! Just out of curiosity, dou you plan to make a video teaching catboost?
@statquest
@statquest 4 года назад
Not in the immediate future. I'm working on neural networks and deep learning right now, but I might swing back to catboost when that is done.
@marcoscosta829
@marcoscosta829 4 года назад
@@statquest thank you very much! keep up the excelent work you've been doing, you're helping me and a lot of others to learn so much!
@statquest
@statquest 4 года назад
@@marcoscosta829 Thanks! :)
@deana00
@deana00 11 месяцев назад
Hi Josh, Thank you for your great series. Btw, do you happen to know how LightGBM find its candidate splits in Histogram-based split finding?
@statquest
@statquest 11 месяцев назад
I have some notes on LightGBM, but I haven't made a video about it yet.
@deana00
@deana00 11 месяцев назад
@@statquest Would you mind explaining to me about it? Does LightGBM use the same concept as XGBoost in finding the candidate splits (e.g quantization)?
@statquest
@statquest 11 месяцев назад
@@deana00 My understanding is that the first handful of trees (I believe 10 is the default) are built like XGBoost, and then it uses a subset of the elements in a feature based on the size of the gradients.
@RajeshSharma-bd5zo
@RajeshSharma-bd5zo 3 года назад
Simply an awesome playlist on Boosting, so AWESOME BAM!!
@statquest
@statquest 3 года назад
Glad you enjoyed it!
@parijatkumar6866
@parijatkumar6866 3 года назад
Your videos make Machine Learning - Human learnable
@statquest
@statquest 3 года назад
That's awesome! :)
@lilianaaa98
@lilianaaa98 5 месяцев назад
what an amazing song in the beginning of this video!!!
@statquest
@statquest 5 месяцев назад
Bam! :)
@teetanrobotics5363
@teetanrobotics5363 4 года назад
Could you please make a course on Probability as well ?
@statquest
@statquest 4 года назад
One day I will.
@Smrigankiitk
@Smrigankiitk 3 месяца назад
amazing thank you for the hard work!
@statquest
@statquest 3 месяца назад
My pleasure!
@Raven-bi3xn
@Raven-bi3xn 4 года назад
Thanks, Josh. One question about the greedy part (4th minute); in random forest (say, regression applications), even though we use a subset of features for each tree and for a bootsrapped subset of the data, we could still end up with many thresholds to examine similar to the problem with xgboost. Why not using quantiles for building random trees that deal with a high number of thresholds?
@statquest
@statquest 4 года назад
Great question! I don't think there is a reason why not. All you have to do is implement it. I think a lot of the "big data" optimizations that XGBoost has could be used with all kinds of other ML algorithms.
@programmer8064
@programmer8064 3 года назад
Thank you so much
@statquest
@statquest 3 года назад
Thanks!
@vigneshht2917
@vigneshht2917 Год назад
Great work. You deserve more than a million subs for your effort and dedication😀
@statquest
@statquest Год назад
Thank you! I dream of the day! :)
@matthewbetty5237
@matthewbetty5237 3 года назад
Hi Josh. I am not sure if you still check these comments, but I wanted to thank you for making these really amazing and informative videos. I am not sure I could pursue machine learning in my own time if I did not have these great resources to clearly explain the content to me. Thanks for making the maths fun and showing all the cool details, its great :)
@statquest
@statquest 3 года назад
Thank you very much! BAM! :)
@pragatiparhad6309
@pragatiparhad6309 3 года назад
Please make video on time series model also
@statquest
@statquest 3 года назад
I'll keep that in mind.
@diedrichschmidt5869
@diedrichschmidt5869 4 года назад
Thanks for the highly accessible explanations for how XGBoost performs its calculations. What about doing a few videos on tuning the hyperparameters for improved model fits? For boosted tree, and random forest, this is not so complicated, but XGBoost has many, many parameters that can be tuned, making the optimization of the model quite challenging.
@statquest
@statquest 4 года назад
I plan on doing "xgboost in R" or "xgboost in python" videos, and those will cover the hyperparameter tuning.
@surajitchakraborty1903
@surajitchakraborty1903 4 года назад
Hi Josh, Thanks for the awesome video.While you are preparing the videos in R and Python for xgboost hyper parameter tuning, would be great if you can point to some resources for xgboost hyper parameters in the mean time.
@rrrprogram8667
@rrrprogram8667 4 года назад
MEGAAA BAMMMMM is back with BAMMMMMMM
@statquest
@statquest 4 года назад
Hooray!!! :)
@jadore801120
@jadore801120 2 года назад
This video is just too awesome
@statquest
@statquest 2 года назад
Thank you!
@yukinakamura5231
@yukinakamura5231 4 года назад
BAMMMMMMM I appreciate you
@statquest
@statquest 4 года назад
Thank you very much! :)
@hrishikeshpotdar5889
@hrishikeshpotdar5889 2 года назад
Machine Learning is more than just applied statistics - Josh Starmer
@statquest
@statquest 2 года назад
yep. This is a good example of it.
@travelwithadatascientist
@travelwithadatascientist 3 года назад
Now I can read the XGBoost paper much easily❤ Thanks a lot for these stat quests🙌
@statquest
@statquest 3 года назад
You're welcome 😊
@PasinKunamart
@PasinKunamart 3 года назад
Thank you for the detailed but easy-to-understand video. I'm also interested in LightGBM algorithm as well (seems like it was compared with xgboost a lot), so I would be happy if you made one for lgbm as well.
@statquest
@statquest 3 года назад
I hope to have videos on lightGBM and CatBoost soon.
@PasinKunamart
@PasinKunamart 3 года назад
@@statquest Looking forward to the videos!
@arjunpukale3310
@arjunpukale3310 4 года назад
Please make video on Generative Adversarial network (Gans)
@rohitrajora9832
@rohitrajora9832 3 года назад
BAAAAAAAAAAAAAAAAM !!!!!!!!!!!!!!
@statquest
@statquest 3 года назад
:)
@erfanmoosavi9428
@erfanmoosavi9428 2 года назад
TRIPLE BAM!!
@statquest
@statquest 2 года назад
YES! :)
@doyelmukherjee2769
@doyelmukherjee2769 7 месяцев назад
Hi Josh...does one on one variable correlation and multicollinearity affect ML models???
@statquest
@statquest 7 месяцев назад
It depends on the model.
@xiaotianxiaotian9974
@xiaotianxiaotian9974 Год назад
So great, cant praise your guys much!👍
@statquest
@statquest Год назад
Thank you!
@alessandroalbano5891
@alessandroalbano5891 3 года назад
You are a king
@statquest
@statquest 3 года назад
:)
@purandixit2384
@purandixit2384 2 года назад
best explanation in the world, I think there is typo at 19:12, instead of Dosage
@statquest
@statquest 2 года назад
Yep. that's a typo.
@AminBashiri28
@AminBashiri28 2 месяца назад
Can you please create one for LightGBM too?
@statquest
@statquest 2 месяца назад
I'll definitely keep it in mind.
@VBHVSAXENA82
@VBHVSAXENA82 4 года назад
Hi Josh, thanks for another great video. Would request you to make one on hyper parameter tuning as well.
@statquest
@statquest 4 года назад
OK. I'll do that soon.
@VBHVSAXENA82
@VBHVSAXENA82 4 года назад
@@statquest Thanks again :)
@rajeshnimma155
@rajeshnimma155 2 года назад
Can you please help us understanding Light Gbm , cat-boosting algorithms
@statquest
@statquest 2 года назад
I'm working on those.
@beckswu7355
@beckswu7355 3 года назад
At 19:56 when choose leaf for missing value, you select left branch as default path. It make senses because missing value residuals (-3.5 and -2.5) are negative, which is similar as nonmissing value residuals (-5.5. and -7.5). I wonder if I can select Right branch as default path for missing value if my residual is large positive e.g. 10.5 in stead of -3.5 and -2.5.
@statquest
@statquest 3 года назад
You always pick the leaf that gives the optimal gain value.
@rezaliswara4086
@rezaliswara4086 2 года назад
Nice explanation! I want to ask. When we use large dataset for sparsity aware split finding, are we must do parallel learning and weighted quantile sketch for find the threshold?
@statquest
@statquest 2 года назад
I'm not sure you must, but if you have a large dataset, it's probably a good idea. The whole idea is to be able to train the model quickly.
@hoatrinh401
@hoatrinh401 Год назад
BAM BAM
@statquest
@statquest Год назад
:)
@myelinsheathxd
@myelinsheathxd 5 месяцев назад
Tripple BAM!
@statquest
@statquest 5 месяцев назад
:)
@shivamkaushik6637
@shivamkaushik6637 4 года назад
The best explanation to XGBoost so far.
@statquest
@statquest 4 года назад
Thank you very much! :)
@arpitshrivastav4822
@arpitshrivastav4822 4 года назад
Best explanation for XGB available!
@statquest
@statquest 4 года назад
Thank you! :)
@thamus90
@thamus90 4 года назад
Loved it! Thanks!
@statquest
@statquest 4 года назад
Thank you very much! :)
@rrrprogram8667
@rrrprogram8667 4 года назад
What do u use for making the diagrams??... Power point or any other software??
@statquest
@statquest 4 года назад
I use "keynote", which is a free product that comes with apple computers.
@anweshbiswal5180
@anweshbiswal5180 3 года назад
Hi Josh!! Thanks for this ❤️. But can you explain if how did you find the residuals value of missing dosage using initial predictions in Sparsity Aware Split Finding? Or if anyone else knows can you please help on me this . Thanks in advanced.
@statquest
@statquest 3 года назад
I'm not sure I understand your question however, In order to calculate the residuals, we only need to know the drug effectiveness. In other words, we don't need to know the dosages to calculate the residuals. So we calculate the residuals and then figure out the default direction to go in the tree for the missing values (which we never have to actually fill in).
@anweshbiswal5180
@anweshbiswal5180 3 года назад
Yes yes, I completely overlooked that!!. Maximum Bam!!!❤️ Also we all really appreciate how you still reply and clear doubts from the old videos. Thanks Josh!
@xixi1796
@xixi1796 3 года назад
Great video! But I don't understand why *Hessian* is used to serve as *weights* for quantile histogram. What is the underlying mathematical reason that the 2nd order derivative plays a role of weight?
@statquest
@statquest 3 года назад
It helps us separate poorly classified samples that belong to different classes.
@MadhurDevkota
@MadhurDevkota 3 года назад
Hessian = p^ * (1-p)^, you can understand/feel Hessian as the probability threshold for whether to further branch out or not. If H = 0.9 * (1-0.9) = 0.09, H is very less, and the classification of that branch is enough as the classes are well separated, if H = 0.5 * (1-0.5) = 0.25, H is very large (max), and the resulted classification of that branch is not yet good enough, the classes for the observation are not well separated. So perhaps you can use 0.15 as the "probability threshold" Hessian value.... now making a quantile of these Hessian values well separates rightly classified and wrongly classified due their sum shown exactly @14:30.
@kennethleung4487
@kennethleung4487 3 года назад
Fantastic!
@statquest
@statquest 3 года назад
Many thanks!
@nitinsiwach1989
@nitinsiwach1989 Год назад
I have yet another question on the histogram-building process: Let's say I had 1M rows and 1 feature 1. Build the histogram. Now there are approx 33 split-points 2. Splitpoint 10 gives max gain 3. Data from the first 10 bins go to the left child and the data from the 10th bin onwards go to the right child *4. To further split the left and right child on the same feature, (a) Are histograms with ~33 split-points built again for data that landed in the left and the right child. Or, (b) is it that now only 10 already computed split-points would be considered to compute gain for the left child and 23 already computed split-points for the right child?* I think it is (b) since I think that is the only one that can result in a speedup IMO
@statquest
@statquest Год назад
Based on the manuscript alone, I would say that it is (a), but it's possible that, in practice, they reuse the splits and use (b). Theory and practice aren't always the exact same.
@nitinsiwach1989
@nitinsiwach1989 Год назад
@@statquest Thank you for the reply, Josh :). Since yesterday, I have read the histogram based split finding vs goss based split finding comparison that is given in the lgbm paper. The two algorithms are juxtaposed side by side. Basis that I am reasonably confident that it is (b) There's another trick that is given to speed up in which the histogram is computed only for one child and the histogram for the other child is merely parent_node_histogram - computed_child_histogram. This would be possible only if the histogram is computed once at the beginning of the tree and then is reused again throughout the tree. I got as much from the lightgbm manuscript. Looking forward to your thoughts on the same
@statquest
@statquest Год назад
@@nitinsiwach1989 That make sense because LightGBM builds trees "leaf-wise" - so it looks at the two leaves and selects the one that has less variation to add branches to.
@mohammadelghandour1614
@mohammadelghandour1614 2 года назад
in 14:25 what if we have 5 samples at 0.1 probability instead of 2 samples as well as another group of 5 samples at 0.9 probability instead of 2 samples in addition to the last two samples with low confidence ? will the first and second group end up in two separate quantiles with total weight of 0.45, each? if so, then then third quantile will contain the last two samples with opposite residual since their sum of weights is almost equal to that of the first two quantiles, i.e., 0.48
@statquest
@statquest 2 года назад
To be honest, I don't know.
@KnowNothingJohnSnow
@KnowNothingJohnSnow 2 года назад
i need to study operation system ...
@statquest
@statquest 2 года назад
noted
@samsimmons8370
@samsimmons8370 3 месяца назад
I feel like 15:15 deserved a triple bam, but maybe that's just me
@statquest
@statquest 3 месяца назад
I think you might be right on that one.
@pannawitathipatcharawat1585
@pannawitathipatcharawat1585 3 года назад
Thank you for the amazing video !! but I have some questions Does it mean that all the missing value for that feature (let's say feature A) need to be in the same side (all in left or all in right)? so if it is, it means that XGBoost will treat all the missing value in the feature A as the same? Thank you Josh : D
@statquest
@statquest 3 года назад
Yes
@radocisar3420
@radocisar3420 4 года назад
Great, as always.
@statquest
@statquest 4 года назад
Thank you! :)
@sabrinahung5584
@sabrinahung5584 3 года назад
Thank you so much for your videos! I have learnt so much from them. Could you do a video on LightGBM and catboost as well? :)
@statquest
@statquest 3 года назад
I'll keep those topics in mind.
@virgildjogbessi
@virgildjogbessi 3 года назад
Thank you for your really helping serie about XGBoost I've got a question. When you talk about huge data base, what do you mean? Also, is a 67,000 rows X 9 columns can be considered as a huge data base? Thank you in advance for your answer. BAM!
@statquest
@statquest 3 года назад
In this case "huge" = so big that we can not fit all of it in the available ram at the same time. On my computer 67,000 x 9 would not be huge because I can load that into memory all at once.
@virgildjogbessi
@virgildjogbessi 3 года назад
@@statquest Thank you once again. I catch it now!
@Exiled1517
@Exiled1517 4 года назад
If i may ask politely, can someone explain it to me in general yet simple explanation of how did xgboost fill the missing value in datasets? It's for my thesis as my lecturer ask me to explain it more specific about it. Thank you:)
@statquest
@statquest 4 года назад
I explain this how XGBoost deals with missing data at 16:13. Is there some part that doesn't make sense to you?
@rahul-qo3fi
@rahul-qo3fi Год назад
❤❤❤❤
@statquest
@statquest Год назад
:)
@rahul-qo3fi
@rahul-qo3fi Год назад
What about categorical variables ? As quantiles wont be available how would these vars be handled?
@statquest
@statquest Год назад
@@rahul-qo3fi XGBoost converts all categorical variables to numeric via One-hot-encoding ( ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-589nCGeWG1w.html ). XGBoost can do this efficiently for variables with lots of options by using sparse matrices (which only keep track of the non-zero values).
@rahul-qo3fi
@rahul-qo3fi Год назад
@@statquest 😍
@nehamanpreet1044
@nehamanpreet1044 4 года назад
At15:06 minutes, we have sum of weights in one quantile as 0.18, in another we have 0.18, then the last two we have 0.24. But as far as I understood you explained that in weighted quantiles, the sum of weights in all the quantiles are equal but here its not equal in all 4??
@statquest
@statquest 4 года назад
It's not equal, but it's as close as it can be to being equal. Does that make sense? If equal isn't an option (and that is the case here), you XGBoost gets as close to equal as possible.
@nehamanpreet1044
@nehamanpreet1044 4 года назад
@@statquest Okay so that means approximately equal. Thanks
@HasanAYousef
@HasanAYousef 2 года назад
Can we use XGBoost for time series forecasts?
@statquest
@statquest 2 года назад
I believe it is possible, but I haven't tried it myself.
@huishiyang3561
@huishiyang3561 4 года назад
Hello Josh. Many thanks for another super BAM video! I have a question about the missing value part. You explained how xgboost incorporates missing values in training and makes predictions for missing values in future data. What happens when there are no missing values in training, but there are in testing/future data?
@statquest
@statquest 4 года назад
Good question! Unfortunately I do not know the answer. :(
@akshatsuwalka5759
@akshatsuwalka5759 3 года назад
We can make our training data to have missing values so that if in future any test data have missing values then our model can easily handle that test data.
@lkhhoe
@lkhhoe 3 года назад
Hi Josh, thanks for making these awesome video to learn the XGBoost in depth. But I want to ask the *weight* you mentioned in, is it same with the *cover* you mentioned in previous videos? Since both of them have the same formula.
@statquest
@statquest 3 года назад
At 9:19 I say that the weights are derived from the Cover metric, and since I later say they have the exact same formula, then we can assume that they are the same.
@lkhhoe
@lkhhoe 3 года назад
@@statquest wow thanks for the patient explanation! it's nice learn the algo in depth from you.
@pandikapinata62
@pandikapinata62 4 года назад
Thanks for XGBoost series, if the feature like Dosage is a string format how XGBoost calculate that to build a tree? Like time series data ex. Sunday, Monday, Tuesday etc. Thanks
@statquest
@statquest 4 года назад
Here's a tutorial on XGBoost and time series: www.kaggle.com/robikscube/tutorial-time-series-forecasting-with-xgboost
@pandikapinata62
@pandikapinata62 4 года назад
@@statquest Thanks a lot :) Greetings from Bali
@rishabhahuja2506
@rishabhahuja2506 4 года назад
Thanks Josh for this amazing video. Your explainations are really great and helpful. It would great if you could share your approach for understanding the complex algorithms like xgboost. Currently I am digging into catboost, just curious to know what resources or plan you follow when you want to understand a new algorithm like reading research papers, understanding maths behind it, etc.
@statquest
@statquest 4 года назад
I read everything I can about a subject and then re-read it until it starts to make sense. Then I create a simple example and play with it.
@rishabhahuja2506
@rishabhahuja2506 4 года назад
Thanks Josh for the response!!
@xinnywillwin
@xinnywillwin 4 года назад
三联
@statquest
@statquest 4 года назад
Yes!
@alanpress3019
@alanpress3019 4 года назад
Th great ML channel. Josh, are you planning to give lectures on Convolutional neural network and Capsule network for Deep Learning? I'm expecting those Bam!
@statquest
@statquest 4 года назад
I'm working on neural networks right now.
@HussainAlyousif
@HussainAlyousif 4 года назад
@@statquest waiting for this. thanks alot
@ashwathshettyr5363
@ashwathshettyr5363 4 года назад
but, what if one missing values residual fit into the left side of the tree and one to the right side of the tree? then how will you predict?
@statquest
@statquest 4 года назад
To be honest, I'm not 100% sure. However, I believe that the default, when everything else is the same, is to go to the left.
@ashwathshettyr5363
@ashwathshettyr5363 4 года назад
@@statquest thanks for the reply josh. double bammmm!
@HemangJoshi
@HemangJoshi 4 года назад
Your videos are the excellent quality but please stop the bad music at every video start.... It is a humble request... Because it distracts the mind before starting the learning process...
@statquest
@statquest 4 года назад
Noted!
@FranciscoLlaneza
@FranciscoLlaneza 8 месяцев назад
@statquest
@statquest 8 месяцев назад
Hi!
Далее
AdaBoost, Clearly Explained
20:54
Просмотров 760 тыс.
XGBoost in Python from Start to Finish
56:43
Просмотров 227 тыс.
🎙Пою РЕТРО Песни💃
3:05:57
Просмотров 1,3 млн
🦊🔥
00:16
Просмотров 487 тыс.
Entropy (for data science) Clearly Explained!!!
16:35
Просмотров 611 тыс.
Gradient Boosting : Data Science's Silver Bullet
15:48
When to Use XGBoost
7:08
Просмотров 4,3 тыс.
ROC and AUC, Clearly Explained!
16:17
Просмотров 1,5 млн
Covariance, Clearly Explained!!!
22:23
Просмотров 557 тыс.
Naive Bayes, Clearly Explained!!!
15:12
Просмотров 1 млн
XGBoost Part 1 (of 4): Regression
25:46
Просмотров 652 тыс.