No video :(

Multinomial classification with tidymodels and volcano eruptions

Julia Silge

Подписаться 15 тыс.

Просмотров 8 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

28 авг 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 35

@pablotercero4860 4 года назад

Amazing !! Thanks for sharing , I learn something incredibly useful every time, even tips and tricks.

@haraldurkarlsson1147 Год назад

In regards to the tectonic settings it would be best to simply lump together all intraplate, all Rift zone, all Subduction to get three factor. Another approach is to group it by crust into categories oceanic and continental crust and intermediate crust. I think this would be better than simply tossing stuff.

@haraldurkarlsson1147 Год назад

As a side comment. I worked on a project as undergraduate to determine the type of volcano that most likely had produced a particular mix of rock types. Based on this work (around 1977) we concluded that a particular mix of rock samples dredged of the coast Iceland originated from a central volcano or not (there was also gravity data and possibly paleomagnetic data).

@JuliaSilge Год назад

That is so interesting! 👀

@hesamseraj 3 года назад

Thank you very much Julia.

@jonathanjayes 4 года назад

Thank you Julia! This was fascinating!

@haraldurkarlsson1147 3 года назад

Very interesting project! I must say, however, as a geologist, that I would have been surprised if the data correlated with latitude and longitude. Volcano types are mostly linked to their tectonic settings. Shield volcanoes are almost exclusively linked to oceanic settings and hotspots such as Iceland or Hawaii and are dominated with basalts. Stratovolcanoes on the other hand are typically linked with andesite (and rhyolite) and are found around subduction zones. Most active volcanoes link up with plate boundaries and those boundaries have no relation to latitude or longitude. When I was an undergrad in Iceland I worked on volcanic rocks dredged off the seafloor near Iceland. My task was to identify the volcano type they were associated with based on the mix of rock types we collected at each site. I would have loved to access to the tools you are using here but alas those did not exist. It would have been much easier to infer the origin of these rocks.

@lukasputtmann3590 4 года назад

I really enjoyed this video! Thanks a lot.

@user-ld6rv4gu2t 4 года назад

Thanks for the tutorial. Great.

@UsmanKhaliq10 4 года назад

thanks! this was a pretty cool tutorial.

@foobar4275 3 года назад

@Julia: In the volcano_rec recipe I think there is a mistake. Minute mark ~21 - EDIT: I thought there was a mistake but it turns out there is no mistake, just a different way to handle a feature matrix with continuous and dummy variables. The issue is step_zv and step_normalize on all_predictors after creating dummy variables. In the recipe, dummy variables are created for tectonic_settings and major_rock_1. Then, all variables are passed to steps zero variance and normalization. I ran a quick simulation on my personal machine and the recipe as written would calculate the variance for the previously created dummy variables and standardize the dummy variables. EDIT: I thought that binary variables shouldn't be standardized but apparently there is some literature that suggests binary variables should be standardized (Tibshirani) or how to standardize continuous variables to approximate the scale of a [one-hot encoded] binary variable. I haven't finished the video yet so if you go back and correct this, I apologize. Otherwise, others be warned, those steps are wrong. One solution would be to do the step_zv and step_normalize before the dummy step as step_zv(all_numeric_predictors()) and step_normalize(all_numeric_predictors()). I've tested this and it works.

@JuliaSilge 3 года назад

Well, it's not necessarily a "no-no" to center and scale dummy variables: community.rstudio.com/t/should-i-center-scale-dummy-variables/43212

@foobar4275 3 года назад

@@JuliaSilge Thank you for sharing the link! =D I wasn't aware of Tibshirani's or Gelman's views on standardizing binary variables.

@christopheraloo5121 4 года назад

had a feeling population within kilometres would make for a good predictor since different types of volcanoes have different amount of footprint(used loosely)

@haraldurkarlsson1147 3 года назад

Julia, I am enjoying your videos tremendously. Currently I am focusing on the tidymodels. Do you have a suggestion for which order they should be watched in or are they each stand-alone? Thanks P. S. I have used skimr for sometime but recently it has stopped working? I have updated the version but no change. Any ideas?

@JuliaSilge 3 года назад

I unfortunately haven't invested time at this point in putting the videos "in order"; they do vary in how advanced they are and I have tried to note in the descriptions which ones are better for folks just starting out with tidymodels. Sorry about that! They have been made somewhat organically week by week using Tidy Tuesday data. I haven't had any problems with skimr lately, but if you can create a reprex showing the problem, I'm sure the maintainers would be happy to see what is happening: github.com/ropensci/skimr/issues

@haraldurkarlsson1147 3 года назад

@@JuliaSilge Thanks - I understand. I do love the wholistic approach though of working through a project from beginning to end. That to me has been my main issue with places like DataCamp where you see more bite-site projects.

@taiwankyh 4 года назад

You suggest a good article for multi-classification; could you please spell the author or give the hyperlink? Thanks

@haraldurkarlsson1147 3 года назад

Does the step_zv remove variables with perfect correlation? Possible confounding variables?

@JuliaSilge 3 года назад

No, just those with zero variance: recipes.tidymodels.org/reference/step_zv.html You can filter out variables that are highly correlated with step_corr(): recipes.tidymodels.org/reference/step_corr.html

@haraldurkarlsson1147 3 года назад

Ah - thanks

@clarkevansteenderen7827 4 года назад

Thank you for this awesome tutorial!! Does one only subset data into training and testing sets if there is a lot of data available? Or how do you decide whether to do that, or to just use bootstrapping on the original data as a whole, as you did in this example?

@JuliaSilge 4 года назад

Almost *always* you want to split into training/testing; this is the most important step in empirical model validation. The only time when you might not want to do this is when the available data is "pathologically" small, like this dataset of volcanoes.

@brodiegus2473 3 года назад

I dont mean to be offtopic but does anybody know a tool to log back into an Instagram account? I was stupid lost the password. I love any assistance you can give me!

@everettleonel2844 3 года назад

@Brodie Gus instablaster :)

@brodiegus2473 3 года назад

@Everett Leonel i really appreciate your reply. I found the site through google and im waiting for the hacking stuff atm. I see it takes a while so I will get back to you later when my account password hopefully is recovered.

@brodiegus2473 3 года назад

@Everett Leonel it worked and I actually got access to my account again. I am so happy! Thanks so much, you saved my ass!

@renanxcortes2 4 года назад

Very cool video! Very didactic and informative! I wonder where in the code of tidymodels (or its dependencies) the predicted probabilities generated are corrected by the resampling strategy that the user uses (for example, oversampled some of the minority categories). Similarly as explained here: www.knime.com/blog/correcting-predicted-class-probabilities-in-imbalanced-datasets. Also, I think the metric was good, wasn't it? Because in this case the "Naive Guessing" would be 33,33% of probability and not 50%, therefore and AUC higher than 60% is already good, isn't it? Thank you so much again for posting this video!

@flamboyantperson5936 4 года назад

You are amazing. could you please recommend someone like you who makes video in Python? It would be of great help.

@JuliaSilge 4 года назад

I really like Rachael Tatman's livestreams: www.twitch.tv/rctatman

@flamboyantperson5936 4 года назад

@@JuliaSilge Thank you so much.

@flamboyantperson5936 4 года назад

@@JuliaSilge Can I add you on facebook?

@JuliaSilge 4 года назад

@@flamboyantperson5936 HA well I'm not on Facebook, actually.

@flamboyantperson5936 4 года назад

@@JuliaSilge No Problem. There is a lot to learn from you but unfortunately my company is not working on R. I wish you could give the same knowledge in python. You are very very talented.