No video :(

Predict childcare costs in US counties with xgboost and early stopping

Julia Silge

Подписаться 15 тыс.

Просмотров 3,5 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

28 авг 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 20

@michaelmahoney3806 Год назад

I don't believe that I have ever watched one of your videos that I didn't come away with some new nugget. Thanks, Julia!

@hesamseraj Год назад

As always, thank you for such great screen cast.

@tofreddy Год назад

I stumbled into your channel. Thank you for the teachable moment.

@carvalhoribeiro Год назад

Very Very useful. Thank you so much Julia !

@wilrivera2987 Год назад

Dream job . To work in Posit

@CaribouDataScience Год назад

Thanks, that was interesting!

@djangoworldwide7925 Год назад

Hey.. rsample::validation_set does not exist anymore. As to 24-06-2023 we can use validation_split/time_split/group_validation_split. I had a feeling it was the validation_split anyway but i wonder, maybe i should use the dev version?

@omoniyitemitope6113 5 месяцев назад

Hi, I have these data with 35 variables and want to run some regression(RF,xgboost, etc..) on it. I am new to R and want to know if you have any special online training that I can register for?

@JuliaSilge 5 месяцев назад

I recommend that you work through this: www.tidymodels.org/start/ And then take a look at this book: www.tmwr.org/ Good luck!

@omoniyitemitope6113 5 месяцев назад

Thanks so much for your response. I followed one of your screencasts and got rsq of 0.37 for the RF model, is/are there anything I can do to improve the fit of my model?@@JuliaSilge

@JuliaSilge 5 месяцев назад

@@omoniyitemitope6113This definitely depends on the specifics of your situation! I recommend that you check out a resource like *Tidy Modeling with R* for digging deeper on the model building process: www.tmwr.org/

@omoniyitemitope6113 5 месяцев назад

@@JuliaSilgeThanks for your response. I will go through it. I did something that I did not know the statistical implication. I took the log of my dependent variable and performed a RF, and to my surprise I got % var explained to be 99.74, this looks too good to be true to me

@anselmekouame1913 Год назад

Hi Julia, how might a multicollinearity affect the machine learning model? If multicollinearity is found, should we remove variables that are highly correlated?

@JuliaSilge Год назад

If you are using a linear model, correlated features can be a big problem! In cases like that, you would want to remove features that are highly correlated with other ones, or use something like PCA. Check out feature engineering approaches like these: recipes.tidymodels.org/reference/step_corr.html recipes.tidymodels.org/reference/step_pca.html Tree-based models tend to do OK with correlated features and it often doesn't really help to handle them in a special way. Just crank it on through the model!

@anselmekouame1913 Год назад

@@JuliaSilge thank you bunch.

@geralgariza7199 Год назад

nice work! well done!

@danielhallriggins9008 4 месяца назад

Thanks Julia, love your videos! To get a more accurate sense of performance, would it be helpful to use {spatialsample} to account for spatial autocorrelation?

@JuliaSilge 4 месяца назад

That would be a great thing to do! This dataset doesn't have explicitly spatial information in it (just county FIPS code) so you would need to join some spatial info together with the original dataset.

@konormccracken Год назад

Always grateful for these videos! Though the grating little economist in me screamed a bit when you discounted the fixed-effect of "county" here 🫥

@JuliaSilge Год назад

Ah yep! The xgboost algorithm does not have the ability to incorporate fixed effects the way that a multilevel model does, say like those from multilevelmod: multilevelmod.tidymodels.org/ However, we could still use a resampling approach that takes into account how a given county is in this dataset a bunch of times, to avoid overly optimistic performance estimates. We'd want to switch out `initial_split()` for `group_initial_split()` and `validation_split()` for `group_validation_split()`: rsample.tidymodels.org/reference/validation_split.html