Тёмный

Sampling Data with SMOTE, Tomek Links, and Nearmiss in R 

Spencer Pao
Подписаться 11 тыс.
Просмотров 5 тыс.
50% 1

===== Likes: 66 👍: Dislikes: 1 👎: 98.507% : Updated on 01-21-2023 11:57:17 EST =====
Have an imbalanced dataset and cant seem to get good enough predictions? Look no further! This is a detailed guide on what sampling methods there are and how to use them!
Methods I go over: DownSampling, UpSampling, SMOTE, Tomek Links, NearMiss
Questions? Let me know down in the comments.
Github: github.com/SpencerPao/Data_Sc...
Since you've read this, leave a like :)
0:00 - Why do we need to Sample?
1:33 - Theory behind Sampling Algorithms
3:40 - Confusion Matrix, Important metrics, What makes a good model?
8:40 - Oversampling Algorithm (Upsampling)
10:16 - SMOTE Algorithm
11:55 - Undersampling Algorithm (Downsampling)
13:21 - Tomek Links Algorithm
15:48 - NearMiss Algorithm + Ways to improve your metrics

Наука

Опубликовано:

 

15 июл 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 15   
@gabrielmurarideandrade5755
@gabrielmurarideandrade5755 2 года назад
Thanks from Brasil! Helped me in econometrics class.
@ALEX-he3fx
@ALEX-he3fx 2 года назад
Nice video! I want to additionally mention that positive = "1" should be added to the confusion matrix. If not, many of the statistics will be wrong.
@alecmunnur5918
@alecmunnur5918 3 года назад
Best video on this topic i've seen so far!
@SpencerPaoHere
@SpencerPaoHere 3 года назад
Much appreciated :)
@fernandobrasil1718
@fernandobrasil1718 2 года назад
Save in my academy test, thanks from Brazil!!!!!
@younesgasmi8518
@younesgasmi8518 6 месяцев назад
Thanks so much bro..Can I use undersampling techniques to balance the dataset before Splitting the dataset into training and testing because it doesn't introduce any data leakage
@eandresricop8676
@eandresricop8676 2 года назад
Very interesting. Any possibility to share the data? I just want to apply all the tutorial step by step. Thank You.
@SpencerPaoHere
@SpencerPaoHere 2 года назад
Sure thing. Check out my github: github.com/SpencerPao/Data_Science/tree/main/Sampling%20Methods
@joaovitordesouzafaria1357
@joaovitordesouzafaria1357 2 года назад
Hello! Sir is there a alternative function to use in SMOTE place? Im super new to ML and my version of R doesnt allow the instalation of the DMwR package.
@SpencerPaoHere
@SpencerPaoHere 2 года назад
Hello! There are many packages that you can use in lieu of SMOTE. If you have python, you can also use imblearn. You can also try 'smotefamily'. Else, I'd search what type of libraries are available in your environment and hope that there is a package that supports SMOTE. You can also do some watered down sampling methods such as upsample or downsample without the need for fancy packages.
@xavierroy4163
@xavierroy4163 5 месяцев назад
DMWR no longer exist to performe SMOTE. Do you have a solution to my problem?
@SpencerPaoHere
@SpencerPaoHere 4 месяца назад
Interesting. Have you attempted to install DMWR through the install_github()? i.e remotes::install_github("cran/DMwR")
@dianaaraiz5150
@dianaaraiz5150 4 месяца назад
it doesn't work either@@SpencerPaoHere
@Galmion
@Galmion 2 года назад
I enjoyed the explanation of the individual methods, but basically all of them sucked for this case? not sure how what to take away from that.
@SpencerPaoHere
@SpencerPaoHere 2 года назад
😂 Sampling methods are most useful with imbalanced datasets / large amounts of data. So, although they are not ‘great’, they are better than not using them at all. Tomek links or (tomek-smote) is most commonly used. Note that the dataset that I was using does not have very great independent variables that can predict the dependent variable. You can evaluate the successfulness with a confusion matrix. Sampling methods are great at eliminating false positive or false negative entries. You can use the methods I’ve mentioned in this video to test out your various use cases.
Далее
Basics of Regex (Regular Expressions)
20:13
Handling Imbalanced Datasets   SMOTE Technique
24:32
Просмотров 49 тыс.
Редакция. News: 125-я неделя
48:25
Просмотров 1,7 млн
Imbalance Data using R ROSE & Smotefamily
31:58
Просмотров 2 тыс.
Ensemble Method: Stacking (Stacked Generalization)
13:14
Understanding and Applying Naive Bayes in R
10:46
Просмотров 5 тыс.
Colorful Vulcan w rtx 4070ti Super
13:30
Просмотров 50 тыс.