Тёмный

Modeling hotel bookings in R using tidymodels and recipes 

Julia Silge
Подписаться 15 тыс.
Просмотров 31 тыс.
50% 1

Опубликовано:

 

11 сен 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 85   
@Themightyquinn1703
@Themightyquinn1703 3 года назад
Julia, I cant thanks you enough, i just submitted my dissertation in machine learning and I wouldn't have been able to do it without you. Tidymodels literally saved me. Your videos are great and you are an inspiration to budding data scientists. I hope you keep up the tidy Tuesdays data screencasts cause they are great! Thanks again!
@abdallahel-kafrawy4114
@abdallahel-kafrawy4114 4 года назад
Thank you Julia for this great tutorial, it means alot to see someone doing modelling inaction !! Better than any online course. Good luck
@malleshamyamulla3195
@malleshamyamulla3195 4 года назад
It’s a good one and thanks for doing the screencasts every week.. Looking forward for the next week lesson..
@jacnah63
@jacnah63 4 года назад
Hi Julia - thank you for doing these vids - They really really help some of us who are trying to improve our R skills!
@Sefran12
@Sefran12 4 года назад
Thank you, Julia. Ever since I’ve subscribed to you channel I’ve become better at using R.
@markbozinovic706
@markbozinovic706 4 года назад
Hi Julia, I have now watched this highly informative video twice and will watch it again and again. Very well put together and articulated with concise explanations and a very good structured sequence. Will look for more videos by yourself and the community in conjunction with my studies, Regards Mark
@FieldsDynamic
@FieldsDynamic 4 года назад
Pretty nice screencast, thanks so much. Im waiting for more...🤓
@mattm9069
@mattm9069 3 года назад
I'm so happy I found your blog. Thanks for the help. Your book is awesome! I learned this stuff in the caret package. For some reason I remember the model fitting steps and model evaluation steps getting tangled up. I could be mistaken, but caret allowed me to estimate model parameters and evaluate models (and maybe tune) almost simultaneously. And if I'm understanding correctly we fit the chosen model to the entire training data in a separate step. Tidy is great!
@JuliaSilge
@JuliaSilge 3 года назад
If you want to see a more detailed walk through with this data, check it out here: www.tidymodels.org/start/case-study/
@thomasaquinas399
@thomasaquinas399 2 года назад
Thank you so much for this video Julia. It's been exceedingly helpful to myself in starting to use recipes and tidymodels.
@redparrot6454
@redparrot6454 2 года назад
That was a hilarious mistype at 33:18 😂
@mishmohd
@mishmohd 2 года назад
I like the tile arrangement
@estatistico2010
@estatistico2010 4 года назад
Thank you very much for sharing this knowledge. Congratulations
@thunde7226
@thunde7226 2 года назад
Hey Thanks Julia....................great training....................and tutorial...........:) ..........bye
@nishadseeraj7034
@nishadseeraj7034 3 года назад
I love your videos!! so many great tips and tricks and great explanations of everything!!
@paulycdong
@paulycdong 4 года назад
Thank you for this great tutorial. The available steps in recipe amazes me and I am keen to learn about the tuning functionality.
@hesamseraj
@hesamseraj 3 года назад
As always your work is great. I am learning a lot. Thank you very much.
@carpoe541
@carpoe541 3 года назад
That was soooooo informative! Thank you so much for this. Please keep going!
@PatrickBateman12420
@PatrickBateman12420 4 года назад
Thanks for sharing Julia. One minor issue at 26:48, I would normalize the training and test set separately though. By normalizing first and then splitting, we "leak information." It's also incorrect in the caret documentation.
@yangyang6008
@yangyang6008 Год назад
Hi Julia, thank you for the wonderful tutorial! Could you explain a little bit about how "class imbalance" impacts the calculation or accuracy in machine learning? Thanks.
@JuliaSilge
@JuliaSilge Год назад
You can see more about this in this more recent blog post: juliasilge.com/blog/project-feederwatch/
@yangyang6008
@yangyang6008 Год назад
@@JuliaSilge Thank you Julia.
@tankUpp
@tankUpp 2 года назад
TidyTuesday is my new Netflix binge series 😂
@jstello
@jstello 3 года назад
Absolutely brilliant brilliant brilliant
@420coolbro69
@420coolbro69 Год назад
I couldn't get the below to work unless I put "knn_spec" before 'children ~ .". around 40min mark Great Video! Error in `fit_resamples()`: ! The first argument to [fit_resamples()] should be either a model or workflow. knn_res
@user-yd5ck7dg1r
@user-yd5ck7dg1r 4 месяца назад
dear 420coolbro69 I have same issue with you. After switching the position of knn_spec and formula, it worked. Thank you very much.
@duquesealand3240
@duquesealand3240 3 года назад
Julia is so awesome!!!!
@AlperYilmaz1
@AlperYilmaz1 4 года назад
great demonstration of recipes, loved it! it would be nice to see an example of boosted trees with xgboost engine in another video..
@alelust7170
@alelust7170 4 года назад
Perfect as always!
@kunalbali810
@kunalbali810 4 года назад
Nice tutorial but any possibility to make tutorial series on netcdf/hdf satellites files? it would be very useful.
@user-qy8lz9di4y
@user-qy8lz9di4y 3 года назад
Amazing tutorial, thank you for sharing with us! On 44:25 how do you do that?
@JuliaSilge
@JuliaSilge 3 года назад
It's called "reindent lines" in RStudio and on a Mac it is Cmd+I. It's one of my most-used shortcuts! You can see shortcuts in RStudio itself, but there is also a list here: support.rstudio.com/hc/en-us/articles/200711853-Keyboard-Shortcuts
@vikrantnag86
@vikrantnag86 4 года назад
Thank you Julia.. It would be great if you can make one video on sentiment analysis 🙂
@azizchaqdid7181
@azizchaqdid7181 3 года назад
Thank you very much Julia
@felixzhao9070
@felixzhao9070 4 года назад
This is really great work! Thanks Julia. By the way, great book on Tidytext and it was a great read too!
@bilalarif7841
@bilalarif7841 3 года назад
Great channel !! Definitely subscribing :)
@MistahKurtzASMR
@MistahKurtzASMR 4 года назад
Excellent lesson, thanks very much.
@mohsinramay
@mohsinramay 3 года назад
Wonderful!
@taiwankyh
@taiwankyh 4 года назад
Thanks for great video; I have a question about “dummy variables “. Sometimes R automatically made it, but when do we need to make it manually?
@JuliaSilge
@JuliaSilge 4 года назад
If you are using a model with a formula interface, R typically will make dummy variables for you. This is really a point of confusion in the R modeling world, though, and one we are trying to address using recipes!
@taiwankyh
@taiwankyh 4 года назад
Julia Silge thanks very much
@djangoworldwide7925
@djangoworldwide7925 Год назад
Julia I would love to hear if you stand behind this way of using recipes because in a different talk, Max Kuhn said it is better to not use bake if not necessary and in general that we want to perform changes only in trained data, not test data. Thank you
@JuliaSilge
@JuliaSilge Год назад
Ah yes, when I made this screencast, the tooling around workflows was not as robust as it is today. I know Max and I are on the same page in terms of how useful workflows are and how you don't typically need to use `bake()` except for debugging. You can read more about that here: www.tmwr.org/dimensionality.html#recipe-functions It's definitely important to carefully use the test data so that you only apply learned transformations to it and don't use it for any training. This post/screencast does stick with that and use the testing data in a correct way, but it's easier now with the tidymodels workflows infrastructure.
@djangoworldwide7925
@djangoworldwide7925 Год назад
​@@JuliaSilgeThank you so much for your elaborate response. You are such a thoughtful educator! I got this link in my favorites already ;)
@sjrigatti
@sjrigatti 4 года назад
Great video. Thanks for your effort. Question: When you apply a recipe to the test data and it includes scaling and centering, (value - mean(values))/sd(values), does the scaling applied to the test set use the standard deviation from the training data, or from the test data ?
@fatehbekioua
@fatehbekioua 4 года назад
It uses the mean and standard deviation from the training data.
@JuliaSilge
@JuliaSilge 4 года назад
As Fateh says below, the recipe uses the transformation estimates (for scaling/centering, standard deviation and such) from the *training* data and will apply it to any new dataset, such as the testing data. The prep() function estimates the parameters from the data, while the bake() function applies the learned parameters to new data.
@sjrigatti
@sjrigatti 4 года назад
Julia Silge great. Thank you both.
@angvl8793
@angvl8793 Год назад
Hi Julia, thank you again for the great tutorial! At 27.34 you applied dummy transform and then you normalized all the numeric predictors. So this means that the step normalize will also normalize the dummy variables. Is this ok? I mean i don't think we need to normalize dummy vars. So maybe the step normalize should be above the step dummy or it does not matter at all?
@JuliaSilge
@JuliaSilge Год назад
That was definitely on purpose, yep! For models like k-nearest neighbors that are based on a distance metric, all the predictors need to be on the same scale. This includes predictors that have been converted to dummy or one-hot numeric variables. You can check out this vignette for some advice on ordering recipe steps: recipes.tidymodels.org/articles/Ordering.html And this appendix for advice on what kind of preprocessing is needed for different models: www.tmwr.org/pre-proc-table.html
@angvl8793
@angvl8793 Год назад
​@@JuliaSilge These 2 links are diamonds! Really helpful. Thank you very much!
@mkklindhardt
@mkklindhardt 3 года назад
The happy weekend question, Do you have any suggestion on how I could use tidymodels for linear mixed models? I am doing a comparison of how well different regression models perform for predicting a response ratio (continuous variable) based on many hundred covariates. Thank you!
@JuliaSilge
@JuliaSilge 3 года назад
If you are interested in experimenting with our new package for mixed effects models, you can check that out here: github.com/tidymodels/multilevelmod If you are interested in an approach for evaluating different predictor sets, you might try something like this: workflowsets.tidymodels.org/articles/evaluating-different-predictor-sets.html
@TURALOWEN
@TURALOWEN Год назад
Julia, thanks for these videos. Immensely helpful. Quick question: how can we get training statistics that are used during prepping the recipe? For example, if we use step_impute_median() for a variable, somewhere in that recipe object, that training median must be stored. How can we extract that?
@JuliaSilge
@JuliaSilge Год назад
You can extract that info by using `tidy()` with the recipe: www.tmwr.org/recipes.html#tidy-a-recipe
@TURALOWEN
@TURALOWEN Год назад
@@JuliaSilge thanks for the reply! tidy(rec) produces a table with operations and their ids. I am more looking for training statistics for imputation steps. It does not seem to have statistics saved. Or maybe I missing something. I believe rec$steps has it as a list of detailed information but I am unable to extract the,. For example, say, a numerical column has a few missing points, and its median for non-missing points is 10. Then when we step_impute_median, somewhere in the recipe object the sample median 10 should be saved for testing data imputations. I want to extract and see those training statistics.
@JuliaSilge
@JuliaSilge Год назад
@@TURALOWEN Yep, keep reading in that section I linked to for tidying an individual recipe step.
@TURALOWEN
@TURALOWEN Год назад
@@JuliaSilge I see now. Thank you! [one needs to id the step, and then extract recipe and tidy it with that id]
@hesamseraj
@hesamseraj 2 года назад
Hello Julia Silge, I am trying to redo the model but unfortunately, the step_downsample in the recipes package doesn't work. I also searched the internet to see if I can find why! I also tried to use the downsample function from caret package. It didn't work. I also downloaded the development version from github (devtools::install_github("tidymodels/recipes")) but still the function didn't work. What should I do now? this is the error > require(recipes) > recipe(children ~. , data = training_hotel) %>% + step_downsample(children) %>% + step_dummy(all_nominal(), -all_outcomes()) Error in step_downsample(., children) : could not find function "step_downsample" ah, half hour later found it from themis package. sorry.
@JuliaSilge
@JuliaSilge 2 года назад
Yep, it is here: themis.tidymodels.org/reference/step_downsample.html
@King_of_carrot_flowers
@King_of_carrot_flowers Год назад
@@JuliaSilge I'm having the same problem. It seems that step_downsample is not available in the latest version of recipes that I can find online (1.0.3). I get the error: Error in step_downsample(., sample_type) : could not find function "step_downsample". If I try recipes::step_downsample I get: Error: 'step_downsample' is not an exported object from 'namespace:recipes'
@King_of_carrot_flowers
@King_of_carrot_flowers Год назад
I think I've figured it out: You need to install the themis package
@JuliaSilge
@JuliaSilge Год назад
@@King_of_carrot_flowers Yep, it is here: themis.tidymodels.org/reference/step_downsample.html
@infamousprince88
@infamousprince88 2 года назад
Not finding any info on how to load tidymodels in. I am getting an error. Restarted R and ran as an administrator to get it to work and still not happening? Any suggestions??
@JuliaSilge
@JuliaSilge 2 года назад
Are you saying you are having trouble installing the tidymodels metapackage? I don't have specific suggestions based on just what you have said here, but you can set yourself up for getting effective help by creating a reprex: www.tidyverse.org/help/ And then posting on a forum like RStudio Community: community.rstudio.com/
@infamousprince88
@infamousprince88 2 года назад
@@JuliaSilge ok I’ll give those a try. Looks like someone else has suggested installing rlang from CRAN first, restarting R again and then installing and loading tidymodels. There’s a “load Name Space” error message that mentions rlang inside the error message. Just noting this here in case someone else comes across this message down the road
@infamousprince88
@infamousprince88 2 года назад
@@JuliaSilge @ about the 40min mark I am getting this error: Warning: The `...` are not used in this function but one or more objects were passed: '' Error: The `resamples` argument should be an 'rset' object, such as the type produced by `vfold_cv()` or other 'rsample' functions It is on the knn_res
@JuliaSilge
@JuliaSilge 2 года назад
@@infamousprince88 Yes, this screencast is a bit older and some of the tidymodels tuning functions have changed since this time. If you would like to see a more up-to-date example with this dataset, you can check this out: www.tidymodels.org/start/case-study/
@infamousprince88
@infamousprince88 2 года назад
Thank you!
@perlaconchitacuentos
@perlaconchitacuentos 2 года назад
I'm trying to apply SVM to this model but I get it overfitted, how can I fix it?
@JuliaSilge
@JuliaSilge 2 года назад
You might want to check out this chapter (along with the previous ones), which walks through tuning an SVM model: www.tmwr.org/iterative-search.html#svm
@perlaconchitacuentos
@perlaconchitacuentos 2 года назад
@@JuliaSilge thank you!!!
@maksim0933
@maksim0933 4 года назад
Do we have comprehensive book about tidymodels with more examples?
@JuliaSilge
@JuliaSilge 4 года назад
Just yesterday we announced our new book, currently with eleven chapters released: www.tmwr.org/
@VinnieGaul
@VinnieGaul 4 года назад
Is there an advantage to step_downsampe as part of the recipe versus using "strata =" in the initial split?
@JuliaSilge
@JuliaSilge 4 года назад
Yes, when you use strata in the initial split, then both the training and testing set will have the same proportion of positive/negative cases but that proportion is still small (
@VinnieGaul
@VinnieGaul 4 года назад
@@JuliaSilge of course, thank you
@mahdip.4674
@mahdip.4674 4 года назад
@@JuliaSilge This cross validation is not satisfying at all, since the validation sets coming from juic() are not representetive of real word (original imbalanced data). Right?
@JuliaSilge
@JuliaSilge 4 года назад
@@mahdip.4674 The test set has the original imbalance as the "real world". This validation set is balanced like the training set; it is in fact resamples of the training set, like you noticed, because it is being used to compute performance metrics for the training data.
@mahdip.4674
@mahdip.4674 4 года назад
@@JuliaSilge Thanks for reply and great video. But I would like to make sure what i am doing is correct. Usually I split the data to Train and Test. Then I use the Test for cross validation. But in the cross validation phase I still preserve the target distribution mimicking real word for the validation segment. With this approach I try to find the best parameter space. The Test, I only use once for final evaluation. In case of cross validation, for every round of modelling I can then down or upsample the training part while the validation set is preserved as it is and I apply the bake() to each validation set. The bake() is applied on results of prep() for each training parts. Is it right? Not necessary? Thanks for reply.
@dr.chiragmalik586
@dr.chiragmalik586 2 года назад
Hi Julia, besides other best things about your videos and techniques, may I take your permission to give you a compliment......you are very beautiful.....
@shueibsharif9955
@shueibsharif9955 Год назад
24:23: addressing the wrong conclusions that might be arrived at having class imbalances, you said, "No one has children. wow! look at all those people without children". I thought that was Funny.
@ambhat3953
@ambhat3953 3 года назад
Then.......corona happened. Model went to dogs..........
Далее
Lasso regression with tidymodels and The Office
44:49
Они захватят этот мир🗿
00:48
Просмотров 920 тыс.
To mahh too🫰🍅 #abirzkitchen #tomato
01:00
Просмотров 2,7 млн
Data preprocessing and resampling using tidymodels
47:44
Tuning XGBoost using tidymodels
50:36
Просмотров 18 тыс.
3 Reasons to Use Tidymodels with Julia Silge
1:23:53
Просмотров 3,9 тыс.
Cooking Your Data with Recipes in R with Max Kuhn
1:23:40
PCA and UMAP with tidymodels and cocktail recipes
43:53