No video :(

Get started with random forest tuning and tidymodels using IKEA price data

Подписаться 15 тыс.

Просмотров 11 тыс.

50% 1

Use tidymodels scaffolding functions for getting started quickly with random forests, predicting #TidyTuesday IKEA furniture prices. Check out the code on my blog: juliasilge.com...

Опубликовано:

28 авг 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 49

@BigChewbowski 3 года назад

Thank you for taking the time to make these videos! The have been an immense help in my R journey!

@alelust7170 3 года назад

Thanks, Julia You always bring some interesting library in your analysis!

@jennyhansen Год назад

Thank you, Julia. This was tremendously helpful for me!

@TURALOWEN 2 года назад

usemodels package is magic!

@stes5429 3 года назад

It seems there is a conflict between "step_knnimpute()" and "doParallel::registerDoParallel()". The problem can be easily solved removing "doParallel::registerDoParallel()" and using the option "nthread" in "step_knnimpute(depth, height, width, options = list(nthread = 12))" choosing the cores available. Before running restart the R session. Love these tutorials! Cheers

@prod.kashkari3075 3 года назад

Thank god for your course and book, I was seriously struggling trying to learn tidymodels from the docs. One thing, in your course, do you want to maybe add how to use the “stacks” package for stacking models and building ensemble learners?

@seaniam 3 года назад

Love these videos - thanks Julia!

@MattBirch Год назад

This is awesome. Thanks!

@davidjackson7675 3 года назад

What about calculating the square inches of the top?

@JorgeThomasM 9 месяцев назад

Hi @JuliaSilge ! Would be volume = height * width * depth a sort of interaction / new variable? Thanks so much for all these wonderful sessions.

@JuliaSilge 9 месяцев назад

Yeah, for sure! We'd call that "feature engineering" because you are creating a custom feature from the original variables based on your domain knowledge of how furniture works. 😄

@yangyang6008 Год назад

Hi Julia, thank you for the great tutorial! For the training set cross-validation, what is the difference between "bootstraps" and "vfold_cv"? Which method is more appropriate for training a machine learning model? Thank you.

@JuliaSilge Год назад

You can check out this chapter for the differences: www.tmwr.org/resampling.html#resampling-methods Also, this Cross Validated answer by Max tells you a bit about when you might choose one over the other: stats.stackexchange.com/a/18355/133241 If you have enough data, cross validation is usually the best bet.

@yangyang6008 Год назад

@@JuliaSilge Hi Julia, thank you very much for your explanation!

@psxcl9817 2 года назад

Hello Julia! thanks for your video. I hope to whether can I obtain the importance values of each feature in vip package instead of plotting it. I did not find the relevant function in vip.

@JuliaSilge 2 года назад

Do you mean you want to get the importance values as a dataframe, rather than a visualization? You can use `vi()` for that: koalaverse.github.io/vip/reference/vi.html

@artathearta 3 года назад

10:43 why did vfold_cv give you small testing folds? 18:25 I got an error: ``` ... All models failed. See the `.notes` column. ... Warning message: This tuning result has notes. Example notes on model fitting include: preprocessor 1/1: Error in UseMethod("prep"): no applicable method for 'prep' applied to an object of class "c('step_clean_levels', 'step')" preprocessor 1/1: Error in UseMethod("prep"): no applicable method for 'prep' applied to an object of class "c('step_clean_levels', 'step')" preprocessor 1/1: Error in UseMethod("prep"): no applicable method for 'prep' applied to an object of class "c('step_clean_levels', 'step')" ``` I followed your steps exactly, and I even tried directly copying and pasting your code from your blog post. EDIT: I was able to fix this problem by commenting out the workflow() command and instead piping the recipe through prep() after step_knnimpute, and then setting up tune_grid to take in the model as its object and ranger_recipe as its preprocessor.

@JuliaSilge 3 года назад

With that many folds on data this small, that's just how cross-validation works! You can read about bootstrap vs. cross-validation here: stats.stackexchange.com/questions/18348/differences-between-cross-validation-and-bootstrapping-to-estimate-the-predictio I forgot to mention in the video that `step_clean_levels()` is in the development version of textrecipes, so you'll need to install from GitHub to be able use that function: `devtools::install_github("tidymodels/textrecipes")`

@artathearta 3 года назад

@@JuliaSilge I uninstalled the copy of textrecipes I got from CRAN and installed it from GitHub and now it doesn't work even if use prep() 😆 It's all good, I'll still follow along and hope it eventually works with my computer or put it on my macbook. Still great video! Excited to use {usemodels}!

@artathearta 3 года назад

@@JuliaSilge Okay, while I was filling out the steps for submitting a bug on textrecipes, I discovered that it works with the workflow() object if I remove `doParallel::registerDoParallel()` before running `tune_grid`.

@JuliaSilge 3 года назад

@@artathearta Hmmmm, can you make sure you have the most up-to-date version of tune from CRAN? That contains bug fixes for parallel processing on Windows.

@artathearta 3 года назад

@@JuliaSilge tune_0.1.2. I can open a github issue if you'd like

@MoCtheFirst 2 года назад

When using 'predict()' in the end (24:49) i get the Error: "Workflow has not yet been trained. Do you need to call `fit()'? Any suggestions as to what went wrong? Thanks for all the input!

@JuliaSilge 2 года назад

If you want to walk through the blog post to follow along, you can call `predict()` on the fitted workflow that is "insight" of `final_res`: juliasilge.com/blog/sf-trees-random-tuning/ You can check out my latest blog post for a more explicit example of how to do this: juliasilge.com/blog/chocolate-ratings/

@stoianandreimircea1509 3 года назад

Superb.

@panagiotischionas5828 3 года назад

Hi Julia really love your work. A quick question: since you take the log of price as input to your model, if you want to show the actual price predicted by the model, how would you do that?

@JuliaSilge 3 года назад

I used log10(), so you can do 10^price to get it back. 👍

@seunghoonlee5275 2 года назад

Thank you so much Julia! It's a great video. I wonder whether I can use weight variable in random forest analysis (or in general tidymodel package). Could you recommend any materials?

@JuliaSilge 2 года назад

Yes, this has been a focus of the tidymodels team this year! You can read more here: www.tidyverse.org/blog/2022/05/case-weights/ Since that post, much of the case weight work has been released to CRAN.

@seunghoonlee5275 2 года назад

@@JuliaSilge Thank you so much Julia! I will go over the link.

@JamesLee1 3 года назад

Hello Julia, thanks for the video. I'm a big fan. Could you please let me know how to make html/notebook outputs from Rmarkdown better looking? When I use your tidytuesday rmd file from your github, the resulting html file has the default singled spaced very small calibre font text. But your website has ~1.5 spaced big custom font that's pretty. If I don't intend to publish my html report on github or online because my work data is sensitive. Could I still make html outputs to have the same formatting as your website? Wowchemy - Academic theme is only for publishing online through github? I would like to change html text formatting locally.

@JuliaSilge 3 года назад

My website uses Hugo and I'm sure you don't want to get that set up just for individual reports. Instead, take a look at some of the styling options you have for HTML reports. There are built-in options using Bootswatch: bookdown.org/yihui/rmarkdown/html-document.html#appearance-and-style Or other contributed formats like html_pretty and html_clean: rmarkdown.rstudio.com/formats.html

@shahidraza5571 3 года назад

can you provide me some source where i can learn random forest algorithm for predicting groundwater contamination map due to fluoride using r studio along with Q GIS?

@ROCK962 3 года назад

Hi Julia! Thank you for your awesome tutorials. I am trying to replicate the Palmer Penguin´s episode, but I am having a problem with the bootstraping step. When I run the bootstraps function from rsample, R is creating empty splits. Do you know what could be the issue?

@JuliaSilge 3 года назад

Wow, no, I haven't seen that before. Can you work on creating a reprex: www.tidyverse.org/help/ And then posting the problem on RStudio Community? rstd.io/tidymodels-community

@mkklindhardt 3 года назад

Hi Julia, Once again thank you for your amazing videos and your great enthusiasm. I have some question. 1) Why do you use knn imputation? You did not really explain why you did not go for linea or mean imputation mode. 2) Can usemodels also be used to prepare my data (recipe, workflow, prep etc) for a linear mixed model? Ultimately I would like to use the same data setup for comparing different regression models, such as; linear mixed models (stepwise AIC regression), kNN regression and Random Forest regression as well as XGBoost. Is it possible to have the same data setup for all my models? I guess that's needed when comparing model performance and evaluate models? Or am I wrong? Thank you

@JuliaSilge 3 года назад

Choosing nearest neighbors for imputation over something like linear imputation or just a single value (mean/median) is similar to making that choice for modeling overall; it lets you use nonlinear, more complex relationships in the data for the imputation. I think this paper is a pretty nice discussion: www.ncbi.nlm.nih.gov/pmc/articles/PMC4959387/ You can see the models that are currently supported in usemodels here: usemodels.tidymodels.org/reference/index.html If you are interested in comparing quite a number of models, you might check out using the tidyposterior package, as described in this chapter: www.tmwr.org/compare.html

@mkklindhardt 3 года назад

Appreciated @@JuliaSilge! Is it "fair" to compare linear regression models with machine learning regression models? 1) are there then specific areas, generally, that one needs to be aware of when comparing linear mixed models with machine learning models (e.g. random forest, XGBoost and kNN)? Such as changes in predictor variables, continuous vs. factor for variables, etc? 2) are there tidymodels ways I can deal with or prevent collinearity and high correlation between variables before I perform the linear regression modelling? Perhaps like an AIC stepwise regression? Is that the same as the vip() function? But then my predictors for the linear model will change compared to the ones in the ML regression modelling, right? Sorry for the many questions.. Hope they are somehow clear. Hope you had a good weekend Julia. Your help is precious to me! Best regards , Kamau

@JuliaSilge 3 года назад

@@mkklindhardt Yep, there is nothing wrong with comparing linear models with models that can account for more complex, non-linear behavior. If you are thinking about comparing models, I recommend reading in detail this section, as well as Chapters 10 and 11: www.tmwr.org/software-modeling.html#model-types In tidymodels, we have preprocessing steps to filter out variables that are highly correlated or a linear combination of each other: recipes.tidymodels.org/reference/index.html#section-step-functions-filters We don't recommend using stepwise regression, for the reasons outlined here: www.stata.com/support/faqs/statistics/stepwise-regression-problems/ More on that here: stats.stackexchange.com/questions/20836/algorithms-for-automatic-model-selection/20856#20856

@sjrigatti 3 года назад

Hi. This is great. I work with survival data a lot and I was wondering how an analysis like this would differ with a survival object as the outcome. Is it just a matter of changing the mode of the ranger fit?

@JuliaSilge 3 года назад

No, actually, we still have a bit of work to do for survival models. We have some notes sketched out here: github.com/tidymodels/planning/tree/master/survival-analysis And there are some proof of concepts floating around in various repos. This is something we will work more on in 2021, so look for survival support next year!

@sjrigatti 3 года назад

@@JuliaSilge this seems like something Dr. Harrell at Vanderbilt would be interested in working on. Has he contributed anything at this point?

@JuliaSilge 3 года назад

@@sjrigatti Not at this point, but an interesting idea!