No video :(

Hands-on Class Imbalance Treatment in Python | Oversampling | Undersampling | SMOTE | Data Science

Подписаться 35 тыс.

Просмотров 1,9 тыс.

50% 1

🚀 In this video, we show you how to handle imbalanced datasets in Python! 🐍 This video is a sequel to our previous video which covered the theoretical aspects. Here we roll up our sleeves and apply various techniques to address imbalances effectively using imblearn package.
📊 We kick off by loading the popular Cancer dataset from scikit-learn library. This dataset features a target column with two categories: Malignant (0) and Benign (1). Each step of the way, we demonstrate the imbalances present and then apply appropriate techniques to bring equilibrium to the data.
🔄 Explore the effectiveness of Random Undersampling, Random Oversampling, Tomek Links, SMOTE (Synthetic Minority Over-sampling Technique), SMOTE Tomek, and Adasyn. Witness firsthand how these techniques transform the dataset, mitigating imbalances and paving the way for more accurate and robust machine learning models.
🔍 Ready to bridge the gap between theory and application? Join us on this hands-on journey to master the art of handling imbalanced datasets in Python.
Happy Learning!

Опубликовано:

21 авг 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 10

@amazing_performances 6 месяцев назад

Great videos! Thak you for sharing!!!

@prosmartanalytics 6 месяцев назад

Glad you like them!

@DanielTok-bs5mn 3 месяца назад

awesome, but what about stratify when splitting?

@prosmartanalytics 3 месяца назад

Thank you! Stratify maintains the same proportion of 0s and 1s in both train and val/test sets as that of the overall data, but it won't resolve the class imbalance issue. We may stratify at the time of split to maintain whatever imbalance we have, and then apply imbalance treatment only to the train set.

@user-tk9jl8wm1t 5 месяцев назад

Awesome presentation. Kindly make a presentation on these also Hybrid Sampling/Ensemble Systems. Thanks

@prosmartanalytics 5 месяцев назад

Thank you! We'll keep these suggestions in mind.

@younesgasmi8518 7 месяцев назад

Thanks for the presentation if Can I use SMOTE before Splitting the dataset into training and testing dataset ?

@prosmartanalytics 7 месяцев назад

Welcome! Good question. Any imbalance treatment needs to be applied only to the train data i.e. for training the model, but because the test data represents future data, it is not supposed to be treated for imbalance.

@younesgasmi8518 7 месяцев назад

@@prosmartanalytics i mean whene we use oversampling on the whole dataset (before Splitting) because whene i used this way I have got a good confusion Matrix and better metrics ( accuracy recall F1 precision) and there is not any problelm of overfitting.

@prosmartanalytics 7 месяцев назад

Yes, but there is a leakage problem. The results so obtained won't be considered reliable. Test data is suposed to be representing the future. So if we are predicting defaults for a bank where the historical default rate is only 2%, test data should represent this value and not 50%. If we use the entire data for imbalance treatment, somehow the data that we are going to use as test later has already participated in the training process because we generated our labels using that too.