4 Reasons Non-Parametric Bootstrapped Regression (via tidymodels) is Better then Ordinary Regression

Подписаться 9 тыс.

Просмотров 10 тыс.

50% 1

If the assumptions of parametric models can be satisfied, parametric models are the way to go. However, there are often many assumptions and to satisfy them all is rarely possible. Data transformation or using non-parametric methods are two solutions for that. In this post we’ll learn the Non-Parametric Bootstrapped Regression as an alternative for the Ordinary Linear Regression in case when assumptions are violated.
If you only want the code (or want to support me), consider join the channel (join button below any of the videos), because I provide the code upon members requests.
Enjoy! 🥳
Welcome to my VLOG! My name is Yury Zablotski & I love to use R for Data Science = "yuzaR Data Science" ;)
This channel is dedicated to data analytics, data science, statistics, machine learning and computational science! Join me as I dive into the world of data analysis, programming & coding. Whether you're interested in business analytics, data mining, data visualization, or pursuing an online degree in data analytics, I've got you covered. If you are curious about Google Data Studio, data centers & certified data analyst & data scientist programs, you'll find the necessary knowledge right here. You'll greatly increase your odds to get online master's in data science & data analytics degrees. Boost your knowledge & skills in data science and analytics with my engaging content. Subscribe to stay up-to-date with the latest & most useful data science programming tools. Let's embark on this data-driven journey together!

Опубликовано:

3 окт 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 51

@utubeleo5037 2 года назад

This was a great watch. it was really well put together, with a good mix of visuals, code and narrative. Thank you for putting it together and sharing

@yuzaR-Data-Science 2 года назад

Glad you enjoyed it, Leo! Thanks for your feedback!

@SergioUribe 2 года назад

Thanks for share! I will start to use this model

@yuzaR-Data-Science 2 года назад

👍 you can use any type of model with bootstrap 😉

@ambhat3953 2 года назад

Thanks for this...i think now i have a direction to solve the data set at work which is not normally distributed

@yuzaR-Data-Science 2 года назад

You are welcome 🙏 if not normal distribution is your only problem, look into non parametric statistical tests, like Mann Whitney or Kruskal Wallis

@ambhat3953 2 года назад

@@yuzaR-Data-Science Will do, thanks!

@eyadha1 2 года назад

great video. Thank you

@yuzaR-Data-Science 2 года назад

Thanks 🙏 glad you enjoyed it

@zane.walker Год назад

I recently discovered bootstapped prediction intervals working with mixed-effects models and was quite impressed (thank goodness for modern computing power!). You present a persuasive argument to always using bootstrapped regression when any of the linear regression assumptions are violated. Are there any situations where you would use alternative methods, such as log transforms of the data, or weighted regression, to deal with issues such as heteroscedasticity rather than bootstrapping?

@yuzaR-Data-Science Год назад

Surely there are different methods to solve problems! Many roads lead to Rom ;). Bootstrapping is one of them, if you have lot's of data (> 10.000, the more the better) and where you can't fix the assumptions, does't matter what you do. Besides, it's personal preference. Not normality from Shapiro Test is always there when you have lot's of data, even, if residuals look perfectly normal. I personally don't like log-transform data if data itself is interpretable, like weight of animals. I would never use log-weicht. But I would use log-virus-load, because the spread is huge and log shows the trend, while you would not see anything without log. Another think is - I'd rather trust averaged model from the distribution of coefs then a single coefficient from a normal "lm". I would not use bootstrap on small datasets. Finally, it's a question of context and how can you get the closest to the truth out-there.

@EdoardoMarcora 6 месяцев назад

I don't understand how bootstrapping dispenses you from the distributional assumptions of the linear model (normality of residuals etc). What bootstrapping is doing is generating the sampling distribution free of its usual asymptotic assumptions, but the assumptions of the likelihood distribution are still there, right?

@yuzaR-Data-Science 6 месяцев назад

certainly it can, I am 100% sure, but please, don't believe some random youtube video, there is a lot of trash out-there (most likely some of my videos are partly incorrect too), thus, please, check it online or in stats book yourself. For example, here is a reference from a stat book which might explain more, but even the first half page will do it, I think: www.sagepub.com/sites/default/files/upm-binaries/21122_Chapter_21.pdf

@festusattah8612 6 месяцев назад

Thanks for this insightful video. However, I have one question. If I want to use this approach in a research paper, do you know of some papers I can cite to back up my choice of this model.

@yuzaR-Data-Science 6 месяцев назад

In my opinion you only need some reasons to do that. For example many assumptions are not met. I am sure there are papers, but don’t have any from the top of my head. But even if nobody cited, somebody should start. I certainly will, after I am done with my current paper on quantile regression.

@CCL-ew7pl Год назад

Great video, thanks Yury ( Munchausen cartoon was an unexpected special treat :))

@yuzaR-Data-Science Год назад

😂 I wasn't sure anyone would recognise Baron Münchausen 😁 Glad you enjoyed it!

@joaoalexissantibanezarment4766 3 месяца назад

This is an excellent video!! I was thinking, a nonparametric alternative for linear regression could be LOESS regression and boostrapp could be done without problem but, because LOESS is a nonparametric, instead of medians the means could be used properly or also in this case the medians should be used?

@yuzaR-Data-Science 3 месяца назад

While resampling allows for a better use of means, I am a big fan of medians, because if the distribution of anything after bootstrapping does not get normal, like in the case of p.values, I would trust the median, but not the mean. So, I would use median as much as I can.

@joaoalexissantibanezarment4766 3 месяца назад

@@yuzaR-Data-Science Ok, I really thank you for answer!

@yuzaR-Data-Science 3 месяца назад

you are very welcome!

@joaoalexissantibanezarment4766 3 месяца назад

@@yuzaR-Data-Science I had another question. Althoguh bootstrapping is not exactly an option to handle outliers, could be the case that the more resamples used, the more robust is the model to outliers?

@yuzaR-Data-Science 3 месяца назад

Yes, because then you would resample the most frequent cases more often, so their distribution would be higher, and the outliers ... hmm, we would not get rid of them, but they will be resampled very rarely. hope that helps. cheers

@ariancorrea2711 Год назад

Hi, how can i extract the r.squared for each model?

@yuzaR-Data-Science Год назад

hey, from the "glance" fucntion library(broom) # for tidy(), glance() & augment() functions nested_models % mutate(models = map(data, ~ lm(wage ~ age, data = .)), coefs = map(models, tidy, conf.int = TRUE), quality = map(models, glance), preds = map(models, augment)) I did a demo about it in a video on "many models"

@jeffbenshetler 4 месяца назад

Excellent demonstration in R.

@yuzaR-Data-Science 4 месяца назад

thanks a lot Jeff, glad you enjoyed it! :)

@chacmool2581 2 года назад

What does this resemble? Random Forests, RF. Except that RF bootstraps/samples observations as well as bootstraping predictors. Am I seeing this correctly? Of course, one loses interpretability with RF. Great stuff as always!

@yuzaR-Data-Science 2 года назад

Sure you loose interpretability with RF. No coefficients. And that’s exactly what normal models do. But they have assumptions. So, we bootstrap/resample the data and fit 1000 models, which relaxes most assumptions, especially distributional ones

@rolfjohansen5376 Год назад

How do I calculate a simple Maximum likelihood for a simple non-parametric regressior: y_i = b_i + e_i (number of datapoints = number of parameters?) thanks

@yuzaR-Data-Science Год назад

Sorry, can't really say that with certainty, because never needed that till today. But if you somehow figure this out, please, let me know! Thanks for watching!

@jonascruz6562 Год назад

Great video! Anyway to conduct a Bootstrap regression but using the robust (Huber) regression instead of conventional linear model for data with many outliers?

@yuzaR-Data-Science Год назад

sure there is a way, just exchange the "lm" with "lmrob" function from library(robustbase). I actually did a video on robust regression. However, I don't think it will be necessary, because first, the bootstrapping will smooth out the influence of outliers, but if you still have to many, may be they are not outliers, but the data has a weird distribution and you need some other type of model, like poisson or similar. Thanks for your feedback and thank you for watching!

@jonascruz6562 Год назад

Thank you for the answrr. I work with environmental contaminants, so I have a Lot of outliers even after log-transform the data. I am testing some New models. I Just found the boot.pval package, which is a Low-code package for Bootstrap regression, including rlm. Bye the way, I love you Low-code vídeos. Greatings from Brazil

@yuzaR-Data-Science Год назад

Hey, Jonas, thanks for the recommendation. I'll check out the boot.pval package, because I model everyday with real-world data and need robust options. Thanks also for the feedback and for watching!

@Maxwaener Год назад

Can you use this approach if you have a numeric predictor (change in percent) for a categorical outcome (2-4 levels)?

@yuzaR-Data-Science Год назад

hey, sorry for late replay, I was on holidays. Yes, you can! the model you apply is up to you, you just need to specify it in the "map" function. It will then be run over the bootstrapped data, so that you can use any model. In your case it would be multinomial, I guess. But if it is only one predictor, I would turn it upsidedows and use quasibinomial model of percentage as an outcome with categorical predictor. It's easier to interpret then a multinomial in my opinion. cheers

@gonzalodequesada1981 Год назад

Is it possible to do a bootstrap for a non-parametric multiple regression model?

@yuzaR-Data-Science Год назад

that's a great question! :) the short answer is - yes, but it's not necessary, because the method I describe is non-parametric by itself. but my scientific curiosity says - let's do it! What kind of non-parametric regression do you mean? Write a function, like "lm()" or try it out please and post it here so everyone in the community can benefit. Thanks!

@desaiha 2 года назад

How do u apply this technique to temporal data which has trend and or seasonality.

@yuzaR-Data-Science Год назад

"strata" argument might help in the bootstrap funktion. ask R this: ?bootstraps. Or google of people who might have done something similar in tidymodels. I still didn't

@johnsonahiamadzor7404 Год назад

Great work. How do I get these codes for practice? I'm very new to R.

@yuzaR-Data-Science Год назад

In the description of the video is a link to a blog post where you can get all the R code and the explanations. If you are very new to R, don't be discouraged if not everything is clear and working now. Bootstrapping is kind of advanced topic. Thanks for watching!