Тёмный

Correcting Skewed Data with Scipy and Numpy 

AnalytiCode
Подписаться 2,9 тыс.
Просмотров 8 тыс.
50% 1

Skewed data can adversely affect your analysis and machine learning models. In this video, I demonstrate five methods for cleaning skewed data using the NumPy and SciPy modules. The methods include taking the square root, cube root, fourth root, log, and Yeo-Johnson transform. I also showcase the effectiveness of each method by summarizing the skewness of the data after each transformation with a bar plot.

Опубликовано:

 

4 окт 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 37   
@marcom5873
@marcom5873 2 месяца назад
First time I have seen your videos. This is genuinely a very good video. Very well explained and clear. I am subscribing. The music wasn’t off putting either!
@CJP3
@CJP3 2 месяца назад
Thank you so much!!! I really appreciate it. If there’s anything you’d like to see just let me know!
@marcom5873
@marcom5873 2 месяца назад
@@CJP3Sent you an invite on LinkedIn!
@Lendemeier
@Lendemeier 6 месяцев назад
Bro this is data science ASMR 🤤
@CJP3
@CJP3 6 месяцев назад
Hahaha I didn’t mean for it to be but glad you enjoyed it (I hope) 😂
@officialscience101
@officialscience101 Год назад
the on-screen text is a great addition, Dr. P!
@CJP3
@CJP3 Год назад
🙏🏽, I’ll incorporate more in upcoming videos! Thanks for the feedback!
@metinunlu_
@metinunlu_ 8 месяцев назад
Thank you for the video, subscribed! RU-vid needs more quality content like this.
@MikitaRashetnikau
@MikitaRashetnikau 4 месяца назад
Amazing video I like it's structure: motivation, overview with examples, practical advices Thanks!
@CJP3
@CJP3 4 месяца назад
Thanks for the feedback! I’ll do more of this style!
@mushinart
@mushinart Год назад
Outstanding explanation, professor
@CJP3
@CJP3 Год назад
Thank you so much!
@noobmasteroo69
@noobmasteroo69 10 дней назад
You're doing great, please avoid bg music while explaining. Thank you.
@CJP3
@CJP3 9 дней назад
Thanks for the feedback! Glad you enjoyed the video
@nicolaslpf
@nicolaslpf Год назад
Amazing video! I was creating a function for measuring the same you forgot to name log1p Wich is log of (x+1) really useful for right skewed data with values less than 1
@pabloagogo1
@pabloagogo1 3 месяца назад
This is interesting. If one corrects the original skewed data, via doing these kinds of transformations, in the context of linear regression or multiple linear regression, will that not change the interpretation of the original data. Curious to know.
@CJP3
@CJP3 3 месяца назад
Perhaps, but that change may be for the better. I’d say it’s worth considering these transformation if you know you have skewed data. Many models especially linear models assume normally distributed variables. I usually build models with and without significant preprocessing and feature scaling/engineering.
@dannybee9068
@dannybee9068 Год назад
Thank you! That was helpful! So we basically can make the root of any power? Is there a drawbag for exploiting it , like keep increasing the n value for feature to the power of 1/n?
@CJP3
@CJP3 Год назад
Hi Danny! Context definitely matters. For analytical chemistry 1/n scaling is usually ok. a few downsides are that it makes the models less sensitive to potential outliers. Also its not suitable for certain distributions. Lastly, because 1/n scaling is non-linear, it can make data interpretation more difficult.
@thoniasenna2330
@thoniasenna2330 6 месяцев назад
SUBSCRIBED! What should one do before? Or, what's the correct order? - treating outliers, impute missing values, correct symmetry? Thanks Dr. P!
@CJP3
@CJP3 6 месяцев назад
You’re not going to like the answer 😂… it depends a lot on the application. It’s first best to be aware they exist and then evaluate their impact on your outcome. For example if you’re trying to determine outlier samples - then outlier msmts wouldn’t be so bad.. maybe. Or missing values could be useful depending on the application so instead of imputing maybe you engineer a new feature.
@CJP3
@CJP3 6 месяцев назад
Don’t unsubscribe after my answer! 😂 🤣
@AyahuascaDataScientist
@AyahuascaDataScientist 4 месяца назад
Skewing doesn’t necessarily matter if you’re using XGBoost, correct? For classification or regression, that is
@CJP3
@CJP3 4 месяца назад
Exactly! Skewed data doesn’t impact all model frameworks.
@prathambhatnagar8653
@prathambhatnagar8653 6 месяцев назад
please dont add background music
@CJP3
@CJP3 6 месяцев назад
Thanks for the feedback. Most of the newer coding tutorials don’t have background music. Have a great day!
@AyahuascaDataScientist
@AyahuascaDataScientist 4 месяца назад
I like it. Don’t listen to this hater!
@undertaker7523
@undertaker7523 2 месяца назад
So what about if we were to standardize using z-scoring? It seems like that would get largely the same impact, wouldn't it?
@CJP3
@CJP3 26 дней назад
Howdy, Z-scaling won’t improve the skew. The data will be mean-centered but will carry the non-uniform distribution
@undertaker7523
@undertaker7523 26 дней назад
@@CJP3 that explains it. Thanks!
@CJP3
@CJP3 26 дней назад
@@undertaker7523 I think I’ll make a video that graphically illustrates this point. Thanks for asking :)
@undertaker7523
@undertaker7523 26 дней назад
@@CJP3 yes that would be amazing! Thank you!
@pewkaboo
@pewkaboo Год назад
What if my data contains a lot of useful '0' values?
@CJP3
@CJP3 Год назад
Howdy! Can you explain more about the 0’s?
@pewkaboo
@pewkaboo Год назад
@@CJP3 it is a expenditure data where the budget column contains a lot of '0' (not null) values.
@mouhsineelqesry9446
@mouhsineelqesry9446 4 месяца назад
Bro you explain a concept, but go you need the music!! It’s distracting
@CJP3
@CJP3 4 месяца назад
I 💯 understand, they newer videos don’t have the music and the audio has a better EQ :)
Далее
Handling skewness
11:33
Просмотров 29 тыс.
HA-HA-HA-HA 👫 #countryhumans
00:15
Просмотров 5 млн
PERFECT PITCH FILTER.. (CR7 EDITION) 🙈😅
00:21
Просмотров 4,5 млн
NumPy vs SciPy
7:56
Просмотров 39 тыс.
How I’d learn ML in 2024 (if I could start over)
7:05
25 Nooby Pandas Coding Mistakes You Should NEVER make.
11:30