No video :(

Partial dependence plots for Mario Kart world records

Julia Silge

Подписаться 15 тыс.

Просмотров 3,4 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

28 авг 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 13

@alexandroskatsiferis 3 года назад

Hey Julia great screencast as always! You can also consider taking into account the Matthews Correlation Coefficient when selecting the best model. It's a nice metric, implemented as 'mcc' in yardstick. It's a bit more strict since it generates a high score only if the prediction correctly classified a high percentage of negative cases and a high percentage of positive ones.

@jordankrogmann352 3 года назад

As always great work Julia!

@alexbajana9518 3 года назад

This is a great videom, thanks Julia

@My-NaMeS_jEfF 3 года назад

I wanted to see who holds the top record for each track as of the latest date we can see. records %>% group_by(track) %>% filter(date == max(date)) There are 16 tracks so we should have 16 records if we're grouping by track, right? However, this is not the case. Notice there are many duplicate records which indicate shortcut yes AND shortcut no for the same exact record/time/track/date/player/duration - everything. Such as records 4:7 for Kalimari Desert where Dan has 2 records set on the same day, with 2 observations made for each record one with shortcut == yes, and one with shortcut == no. I am pretty sure you cannot have the same exact time and everything when using and not using shortcuts. Maybe I am wrong but these times are identical down to the thousandth of a second. Then again lookat rows 12:13 Shortcut 12 Wario Stadium Single Lap No Dan PAL 2021-01-26 1M 25.82S 85.82 31 13 Wario Stadium Single Lap Yes Dan PAL 2021-01-26 1M 25.82S 85.82 31 I think to have accurate predictions we have to weed out the incorrect duplicates. How do we know which is the correct observation to be used in shortcut prediction? You can see right at 4:00 in your video how the single lap records are almost identical for No and Yes, they're duplicate records. This cannot be correct. It does make sense for the three lap times where shortcuts being used significantly reduce time, but for the single lap records, almost complete duplication. Therefore, predictions for shortcut should only be used on records made on three lap runs, and one lap runs should be omitted. Definitely needs to be cleaned up before prediction is performed. Make sense?

@AshBlossomWorshiper 2 года назад

Is the reason you chose accuracy over a metric like precision/recall because the dataset was relatively balanced?

@hamidehmoayyed9876 2 года назад

Hi Julia, thank you so much for all your great and informative screencasts! I have a question about the partial dependence plots. In the PDP package in R, there is an option that we can add rugs to the x-axis that display the deciles of the distribution because based on the package manaul "It is not wise to draw conclusions from PDPs in regions outside the area of the training data". I wonder if this option is also available in DALEX?

@JuliaSilge 2 года назад

I don't believe this is an option the DALEX default plots, although you can explore more here: ema.drwhy.ai/partialDependenceProfiles.html You could consider opening an issue on their repo, or if you are making a more custom PDP visualization like I did you can add it yourself: ggplot2.tidyverse.org/reference/geom_rug.html

@hamidehmoayyed9876 2 года назад

@@JuliaSilge Thank you so much, Julia! I really appreciate your help.

@JJManioke 3 года назад

Julia, your videos are helping me so much in grad school, thanks so much! Is it always the case that vfold_cv has a higher ratio of training to assessment, and is there a good threshold for saying 'I should really switch to bootsraps, this vfold is too small?'(I noticed you did the same thing in the IKEA furniture video). Also, would there be much difference here if you used a different tree based model like ranger::rand_forest as in the IKEA video? (And how would you decide which to use)?

@JuliaSilge 3 года назад

When I used v-fold cross-validation, I noticed that the assessment sets were

@davidjackson7675 3 года назад

Very interesting.

@Dolandtromm 3 года назад

Miss Julia where do you live and where have you studied and pls let me know what have studied........I mean I want to know what you have to do to become a data scientist.........thx

@JuliaSilge 3 года назад

You can read more about my background here: ropensci.org/blog/2018/06/08/rprofile-julia-silge/