Predictive modeling in R with tidymodels and NFL attendance

Julia Silge

Подписаться 15 тыс.

Просмотров 23 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

8 сен 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 50

@BThiessen 4 года назад

Yes, this was a very useful walkthrough of tidymodels. Thanks, Julia. In case anyone else runs into an issue I had, I needed to force an update of the tidymodels package to use fit_resamples().

3 года назад

34:49 Had an error in fit_resamples(). To solved it I just changed the order of arguments: rf_spec, weekly_attendance ~ ., ... (Swap lines 163 and 164). Great video, thanks!

@JuliaSilge 3 года назад

Yep, there was a change in the tune interface a while back, since this video was made: github.com/tidymodels/tune/blob/master/NEWS.md#breaking-changes-1

@diabbluiszegpidelgado1674 4 года назад

Excellent quality content! Looking forward to see more applied tidymodels, tuning, feature engineering and whatever you want to teach us.

@joel09878 3 года назад

I really enjoy the structure of 'finding a problem' - in this case, overfitting - and then fixing it (resampling to get a less biased estimate). Thanks

@johngoldin4672 4 года назад

Glad to see we got a quick flash of your black cat at the very end.

@JuliaSilge 4 года назад

HA I wondered if anyone would notice that! 😻

@arturmatysik9177 3 года назад

Great video! I didnt know about tidymodels and was coding everything by hand, now my life has changed :)

@hesamseraj 2 года назад

Once again, thank you very much for the machine learning. Looking forward to checking all of your screencasts.

@mygeorgyboy Год назад

Good explanation, thanks. I change the "fit_resamples" function, the first 2 parameters are inverted now (2023). Regarding the results, an average attendance per team and year behaves better than "rf" and almost the same that "lm". Greetings, excellent tutorial on the use of tidymodels

@arpitchaurasia5132 4 года назад

you are awesome Julia really I appreciate your effort for all your all content. thanks

@jiwanheo7737 4 года назад

Thanks Julia! Big fan of your blog!!

@user-ee7sd6lt7h 4 года назад

Thanks Julia ... great demosntration

@carpoe541 3 года назад

Such a blessing. Thank you!

@aleccampanini9585 3 года назад

Hi, I just wanted to give a tip. Margin of victory is something that’s given every game. Instead of using distinct before plotting, I would’ve gone with a group_by() %>% summarise(sum(margin))

@ricjrob 2 года назад

I love this, you should provide some tips to my Lecturers on my MSc :)

@neuroling 4 года назад

This is awesome. Thank you, Julia! 🙌🏼

@salkonify 4 года назад

Awesome! Great content!

@klaldju 3 года назад

Hi Julia. Thank you for this great tutorial. I followed the code exaclty as you did but I got the "Error: The first argument to [fit_resamples()] should be either a model or workflow." after this code rf_res

@JuliaSilge 3 года назад

Yep, there was a change in tidymodels packages since this video was published last year, and you need to make the first argument the model (or a workflow). You should switch the order of the first two arguments, the formula preprocessor (which should now come second) and the model spec (which should now come first).

@klaldju 3 года назад

Great. Thanks!

@Insipidityy 4 года назад

Can't describe how thankful I am for this screencast AND your blog. Really excited for your future screencasts and blog posts that will remove the shroud of mystery covering tidymodels. One question: What is the purpose of drawing a geom_abline(lty = 2)?

@JuliaSilge 4 года назад

That makes a slope = 1 dashed line. Try it out to see what it does! I wanted to see visually on the graph where the predictions and true values would be the same.

@rafaelcallejo8367 Месяц назад

buenos días, sus videos son excelentes, solo pedirle para futuros videos poder enfocar mas la cámara al código ya que se ve muy pequeño, disculpas por la sugerencia.

@circulartext 2 года назад

wow this is cool

@kevinbrahm8409 2 года назад

This may have been brought up elsewhere, but I wonder if including the away team for attendance might skew the results.

@codygoggin1097 2 года назад

This video was a great help! If you think you can improve your model by possibly removing variables, how would you see what factors are most important in the RF model and what could you possibly get rid of? Is there another video on that?

@JuliaSilge 2 года назад

Yes, you want to measure variable importance for a random forest, as shown here: juliasilge.com/blog/sf-trees-random-tuning/ You may also be interested in this chapter of TMwR: www.tmwr.org/explain.html

@faraza5161 4 года назад

Hi Julia.. This was awesome.. Please make videos on mlr too .. pretty please ???

@mattsworkstuff9506 4 года назад

Not sure why, but ~14:30 mark, I had to put quotes around the line initial_split(strata = "playoffs") (line 79) or else it would give me an object not found error. Posting in case it's useful for anyone else walking through.

@JuliaSilge 4 года назад

You might check that you have updated versions of tidymodels packages.

@mattsworkstuff9506 4 года назад

@@JuliaSilge Yeah, I am on a version of Microsoft R that looks like it's a bit behind the times compared to the non-Microsoft versions. Don't need that any more anyway so going to kill it for a fresh install and pick back up. Fantastic job on the videos!

@Tdiddy9182 4 года назад

Hopefully a very easy question for you. What are you using to create those little subsections that start with '''{r} and creates that kind of paragraph of code? I like how clean it makes things look.

@JuliaSilge 4 года назад

I'm using R Markdown in these screencasts, and those are R "chunks". You can read more here: rmarkdown.rstudio.com/

@wouldntyaliktono 4 года назад

Thanks for the video! Quick style question: Why do you prefer to assign x and y implicitly in your aes() statements? I find it can sometimes make code difficult to read for someone who is looking at my work for the first time.

@JuliaSilge 4 года назад

That's a great question! Using the explicit argument names is really important when teaching ggplot2, and I can see how using implicit assignment like I did here could be less clear for folks who have less familiarity with either ggplot2 or functions in R.

@mattm9069 3 года назад

Thanks for the video. Can you give a blanket statement for what variable to add as a strata and why? In this case, it was "playoffs". Is there a special reason you chose this one?

@JuliaSilge 3 года назад

I don't think I can give a blanket statement but generally the outcome is a good way to go. In hindsight in fact, it may have been a better move to stratify on the outcome here, `weekly_attendance`. You can read more a little more about this here: www.tmwr.org/splitting.html#splitting-methods

@jamescutler428 3 года назад

Thanks Julia! I love these videos. I have a question though. Not sure why this is happening to me, but strangely I got an error when piping rf_fit into the predict function with the training data as the new data: "Error in predict.randomForest(object = object$fit, newdata = new_data) : New factor levels not present in the training data" The strangest thing about it is that I trained the rf model on the same dataset that it's now saying has new factor levels (the training set)! Nothing I've found online seems to explain how this could happen when it's the same dataset that it has already seen before. I should add that this only happens when I use the randomForest engine, and not when I use ranger. I wonder if that means this is a randomForest package bug?

@JuliaSilge 3 года назад

Hmmm, it's hard to say with just that much information and no code to look at. You might consider creating a reprex (can you make the same thing happen with any smaller dataset?) and posting it on RStudio Community: rstd.io/tidymodels-community

@jamescutler428 3 года назад

@@JuliaSilge Thanks Julia I'll give that a try! Thanks for responding so quickly! Another interesting development as I make my way through this video is that now I'm wondering why my RMSE from resampling is still about the same as it was before. The initial test rf RMSE (before trying resampling) was around 8600. Now it's the same--around 8600. I used the same seed and everything. I wonder how stuff like this could happen, given that yours dropped from ~8500 to ~8200.

@kevinbrahm8409 3 года назад

Hi Julia. Thank you for doing this video. I got a bit lost in the resampling piece, because I don't quite understand why resampling means the model would actually perform better if it's using the same underlying data. Also, do you have anything that shows how the model could predict attendance for a particular team, week, or something else?

@JuliaSilge 3 года назад

I haven't rewatched this video in a while but it's possible I did not word things as perfectly as possible. Today, I'd point you to this article on resampling, which explains why we use resampling to estimate model performance (look for "what happened here?" and "resampling to the rescue"): www.tidymodels.org/start/resampling/ The idea is not that the model performs better but that we get a more accurate estimate of how the model performs, instead of an overly optimistic estimate. We also discuss this here: www.tmwr.org/resampling.html

@kevinbrahm8409 2 года назад

@@JuliaSilge Thanks Julia. Do you have anything for the second question?

@JuliaSilge 2 года назад

@@kevinbrahm8409 You can use the `predict()` function with whatever input data you want to know about: parsnip.tidymodels.org/reference/predict.model_fit.html This post walks through how to predict with data that we create: juliasilge.com/blog/bird-baths/

@elOtorongo96 3 года назад

Great video, I have a question and is: how do her plots have a different theme than the default one if she doesnt add the theme layer?

@JuliaSilge 3 года назад

Notice how I use theme_set() (you can see on line 15 at 1:30) to do this. You can check out more details here: ggplot2.tidyverse.org/reference/theme_get.html