No video :(

Understanding and Applying XGBoost Classification Trees in R

Spencer Pao

Подписаться 11 тыс.

Просмотров 6 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

21 авг 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 25

@user-sh5os8pd2o 2 года назад

讲的真好，而且还很帅！谢谢老师。

@francis3321 2 года назад

Thank you very much

@annsu8960 2 года назад

Thank you for this video! Can you also show how to get variable importance?

@SpencerPaoHere 2 года назад

Depending on the package, you can use varImp(model_name, scale = False) or xgb.importance(model feature names, model_name)

@annsu8960 2 года назад

@@SpencerPaoHere thank you! :D

@deannanuboshi1387 2 года назад

Great video! Do you know how to get prediction or confidence interval for xgboost?

@SpencerPaoHere 2 года назад

Prediction: Pass in new data to your new model using the predict() Confidence Interval: It's a little more complicated since there really isn't a straightforward way in doing this. You could however train 100 different types of xgboost models (with randomized parameters) and run the test set to get a range of predictions.

@deannanuboshi1387 2 года назад

@@SpencerPaoHere Thank you for your reply! I tried predict(), but it does not return intervals like random forest (I set conf_nt=TRUE for rf and predict() does return intervals). I see that one can get the prediction interval if run 100 times xgboost, but wouldn't the prediction will be the mean of the 100 predictions instead of the original prediction? (also the interval will be the interval for the mean of 100 predictions, not the interval for the original prediction). Also I tuned my xgboost models and would like to use the hyperparameter values from the best xgboost model to do prediction, so if I run xgboost 100 times and use different hyperparameter values in each xgboost model, then it kind of defeats the tuning purpose. So is there a way to get the prediction interval for the original prediction (for each new observation)? Any idea will be appreciated!

@SpencerPaoHere 2 года назад

@@deannanuboshi1387 What xgb library are you using? It might be a different syntax to predict. And related to the predictions, perhaps. Though you can get some form of a distribution / frequency curve going on and obtain a more accurate assessment from there. It also seems that the learners of xgboost don't support the error handling you might be looking for. Though, maybe this article may be of help if you haven't seen it already towardsdatascience.com/confidence-intervals-for-xgboost-cac2955a8fde

@trishiasingla2168 2 года назад

What needs to be matrix type? Both train_x and train_y or train_y? I passed train_x as dataframe and train_y as vector (because it has only one column). Model runs just fine (i ran the model just like you did). However model at my end returned following variables. There is no Accuracy, Kappa. Any pointers? . My model code is below (it is same as in the video) max_depth nrounds RMSE Rsquared MAE grid_tune

@SpencerPaoHere 2 года назад

Hi! Try casting your independent variables to matrices. Try as.matrix()

@wereskiryan Год назад

When I run my data into my predict, I can a value between 0 and 1. I assume this is a probability value. How do you decide which probability output is equal to "50K".

@SpencerPaoHere Год назад

When you run predict(), it depends on what the backend of library you are using. And you can find which threshold may be best to identify the class prediction. But in this case, when you are attempting to link the predictions to your dependent variables, there should be an identifier when you convert your "50K" to a category. (either one hot encode or cast to int) You then use that "legend" to determine which matches with which. (i.e remember which of the orderings relate to which category)

@wereskiryan Год назад

@@SpencerPaoHere Thanks Spencer, much appreciated. Keep up the great videos, especially on R! :)

@philippekamdem9497 2 года назад

Please help me, when I running the code I get an error message like this: .xgb.Dmatrix(x,0,,drop=FALSE):unsused argument (drop=FALSE)

@SpencerPaoHere 2 года назад

I'm not sure what the issue is here. Do you have a reproducible example? Some things to be wary of: Matrix Sparsity and Data type (for your X); make sure it is a matrix! It also might be an issue of data imputation (if you've done this on your dataset)

@philippekamdem9497 2 года назад

@@SpencerPaoHere 😢😢

@philippekamdem9497 2 года назад

It’s ok now thanks you

@Stoic_might 2 года назад

How many model should be there in XGBoost ?

@SpencerPaoHere 2 года назад

I’m not sure what you mean. Are you referring to number of trees within XGboost ?

@Stoic_might 2 года назад

@@SpencerPaoHere yes

@Stoic_might 2 года назад

Can u tell me

@SpencerPaoHere 2 года назад

@@Stoic_might Hi! It actually depends. I'd recommend hyper tuning to find out number of trees needed for your specific use case. Some use cases to try out for number of trees: [100, 300, 500, 800, 1000, 1500] That should provide you a good range. (Start off with 100 to see how well your model performs.) You can also check out my dinosaur vid where I provide a high level overview on the tuning process (with bayesian optimization) ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-PXQgRGPl4gw.html

@Stoic_might 2 года назад

@@SpencerPaoHere thank you