Тёмный

5 ways to work with imbalanced data | Imbalanced dataset machine learning | Imbalanced data 

Unfold Data Science
Подписаться 89 тыс.
Просмотров 18 тыс.
50% 1

5 ways to work with imbalanced data | Imbalanced dataset machine learning | Imbalanced data
#ImbalancedDataClassification #UnfoldDataScience
Hello ,
My name is Aman and I am a Data Scientist.
About this video:
In this video, I explain how to work with imbalanced data in machine learning classification use case. I explain multiple ways in which we can take care of imbalanced data and train a better machine learning model.
Below topics are explained in this video:
1. 5 ways to work with imbalanced data
2. Imbalanced dataset machine learning
3. Imbalanced data in classification
4. Undersample and oversample
5. Undersample majority class
6. smote meaning
7. smote in python
imblearn page link - imbalanced-learn.org/stable/r...
About Unfold Data science: This channel is to help people understand basics of data science through simple examples in easy way. Anybody without having prior knowledge of computer programming or statistics or machine learning and artificial intelligence can get an understanding of data science at high level through this channel. The videos uploaded will not be very technical in nature and hence it can be easily grasped by viewers from different background as well.
If you need Data Science training from scratch . Please fill this form (Please Note: Training is chargeable)
docs.google.com/forms/d/1Acua...
Book recommendation for Data Science:
Category 1 - Must Read For Every Data Scientist:
The Elements of Statistical Learning by Trevor Hastie - amzn.to/37wMo9H
Python Data Science Handbook - amzn.to/31UCScm
Business Statistics By Ken Black - amzn.to/2LObAA5
Hands-On Machine Learning with Scikit Learn, Keras, and TensorFlow by Aurelien Geron - amzn.to/3gV8sO9
Ctaegory 2 - Overall Data Science:
The Art of Data Science By Roger D. Peng - amzn.to/2KD75aD
Predictive Analytics By By Eric Siegel - amzn.to/3nsQftV
Data Science for Business By Foster Provost - amzn.to/3ajN8QZ
Category 3 - Statistics and Mathematics:
Naked Statistics By Charles Wheelan - amzn.to/3gXLdmp
Practical Statistics for Data Scientist By Peter Bruce - amzn.to/37wL9Y5
Category 4 - Machine Learning:
Introduction to machine learning by Andreas C Muller - amzn.to/3oZ3X7T
The Hundred Page Machine Learning Book by Andriy Burkov - amzn.to/3pdqCxJ
Category 5 - Programming:
The Pragmatic Programmer by David Thomas - amzn.to/2WqWXVj
Clean Code by Robert C. Martin - amzn.to/3oYOdlt
My Studio Setup:
My Camera : amzn.to/3mwXI9I
My Mic : amzn.to/34phfD0
My Tripod : amzn.to/3r4HeJA
My Ring Light : amzn.to/3gZz00F
Join Facebook group :
groups/41022...
Follow on medium : / amanrai77
Follow on quora: www.quora.com/profile/Aman-Ku...
Follow on twitter : @unfoldds
Get connected on LinkedIn : / aman-kumar-b4881440
Follow on Instagram : unfolddatascience
Watch Introduction to Data Science full playlist here : • Data Science In 15 Min...
Watch python for data science playlist here:
• Python Basics For Data...
Watch statistics and mathematics playlist here :
• Measures of Central Te...
Watch End to End Implementation of a simple machine learning model in Python here:
• How Does Machine Learn...
Learn Ensemble Model, Bagging and Boosting here:
• Introduction to Ensemb...
Build Career in Data Science Playlist:
• Channel updates - Unfo...
Artificial Neural Network and Deep Learning Playlist:
• Intuition behind neura...
Natural langugae Processing playlist:
• Natural Language Proce...
Understanding and building recommendation system:
• Recommendation System ...
Access all my codes here:
drive.google.com/drive/folder...
Have a different question for me? Ask me here : docs.google.com/forms/d/1ccgl...
My Music: www.bensound.com/royalty-free...

Опубликовано:

 

13 апр 2022

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 45   
@enchanted_swiftie
@enchanted_swiftie 2 года назад
I was at the same problem for the imbalance in dataset and by then I researched for different methods to take on. Here I am presenting my shortlist that I have created which might help you somewhere. Possible Solutions: 1. Make some changes in the algorithm • Adjust the class weight so it becomes sensitive to the minority class • Adjust the decision threshold (we can check by PR curve) • Penalize the algorithms by putting class_weight='balanced' 2. Discard the minority examples and treat all classes as one • Here we can treat the problem as the "anomaly detection" problem instead of classification For anomaly detection "Isolation forest" tend to give promising results 3. Balance the dataset by sampling • Undersample • Oversample & SMOTE 4. Ensemble learning by downsampling • It bootstraps different samples and each time it will balance the classes by undersampling the majority classes and then aggregates the results for voting 5. Usage other techniques • Algorithms such as Tomek links (which removes k nearest majority pair to increase division) • Focal loss I have also tried to look for the kaggle notebooks there people have also found out that XGBoost slightly outperforms other algorithms even it would require to give different class weights. - This was my cheat sheet of the 5 ways. Share your thoughts!!
@UnfoldDataScience
@UnfoldDataScience 2 года назад
Very good explanation and thanks for putting the learning here. I will pin this comment on top for others benefit. My view - Data Science is all about trying/experimenting/failing and learning. Then something very good comes up.
@enchanted_swiftie
@enchanted_swiftie 2 года назад
@@UnfoldDataScience Won't lie, but when I started watching your videos, your explanations made things much simpler. You know, I was used to freak out (sorry for the words) by listening DBSCAN, Hierarchical Clustering and what not, but when I see those topics explained by you I feel so comfortable that now I would understand this. How simply but accurately you explain without missing the important things. PS: I was introduced to assumptions of linear regression by your channel. Before that I knew the model, came to know that there is something called "assumptions" and how important are they!! Totally missed by the instructions on online courses! Your channel is a huge contribution to the data science community on YT.
@sreebvmcreation9388
@sreebvmcreation9388 2 дня назад
Thank you sir, iam searching methods for imbalaced data , finally i got the methods with your video.Thank u so much once again. All in methods which one is best method .
@ayushparihar5989
@ayushparihar5989 Год назад
Good explanation
@dd3371
@dd3371 2 года назад
Thanks very much for sharing and explaining. What's your thought on logistic regression? Would imbalanced data still a problem if you build the model in GLM using logistic regression?
@KastijitBabar
@KastijitBabar 2 месяца назад
You are the best Data Science And Machine Learning Teacher I have ever seen. Thanks a lot!!
@UnfoldDataScience
@UnfoldDataScience 2 месяца назад
You are welcome!
@karthebans248
@karthebans248 2 года назад
Learned new things about the balancing of data sets for Imbalanced data sets. Thanks.
@UnfoldDataScience
@UnfoldDataScience 2 года назад
Welcome.
@zahedinima732
@zahedinima732 2 года назад
Such a clear and concise explanation. Thank you, Aman!
@UnfoldDataScience
@UnfoldDataScience 2 года назад
Thanks A lot.
@nivednambiar6845
@nivednambiar6845 2 года назад
An important concept when dealing with classification Thanks for sharing Aman 👍👍
@UnfoldDataScience
@UnfoldDataScience 2 года назад
Thanks Nived.
@atod2572
@atod2572 Год назад
Awesome explanation. Can you please tell us when we use which technique? I mean with an example of dataset and selection of sampling technique.
@sadhnarai8757
@sadhnarai8757 2 года назад
Very nice Aman
@UnfoldDataScience
@UnfoldDataScience 2 года назад
Thank you
@avikdinda7827
@avikdinda7827 19 дней назад
If oversampling gives data leakage issues in total data? Or if I use smote in train data after the train test split it is giving poor precision to the minority however recall is ok...so what do I do to improve the precision of the minority class?
@younesgasmi8518
@younesgasmi8518 6 месяцев назад
Can I use oversampling or undersampling before Splitting the dataset into training and testing ?
@bijaynayak6473
@bijaynayak6473 2 года назад
Very Nice explanation kudos
@UnfoldDataScience
@UnfoldDataScience 2 года назад
Thanks for liking Bijay
@dilshadmuhammed8224
@dilshadmuhammed8224 7 месяцев назад
in my case i have more than 2 classes and those classes are in text ,for eg- well being , business analytics etc how will balance such classes
@NeeRaja_Sweet_Home
@NeeRaja_Sweet_Home Год назад
Hi Aman, In most of videos we could see imbalanced Dataset for classification problems but how to check and Handle imbalanced Dataset for regression problem. Thanks,
@mamataparab9803
@mamataparab9803 2 года назад
Hello Aman, this is the third time I have watched this video, simply to learn your way of explaining things. Is it possible for you to create a video or give us some notes so we can find all the important questions for ensembling techniques?
@UnfoldDataScience
@UnfoldDataScience 2 года назад
Thanks Mamata, I do keep sharing on Instagram, please follow "unfolddatascience" On Instagram.
@mamataparab9803
@mamataparab9803 2 года назад
Sure, Aman. Thank you
@maasahebbiustad8514
@maasahebbiustad8514 Год назад
Hello sir, How to solve A Classification problem in which training data has only one class? 'This solver needs samples of at least 2 classes in the data, but the data contains only one class: 1', please help me out
@riva.4484
@riva.4484 Год назад
Thank you so much! This video help me a lot. I have a question, how can we choose and decide which way is the best fit for our imbalance dataset?
@UnfoldDataScience
@UnfoldDataScience Год назад
Its always trial and error.
@snehalvaidya5843
@snehalvaidya5843 2 года назад
Thanks for sharing knowledge 🙂, plz share how to explain PCA in front of interviewer..
@UnfoldDataScience
@UnfoldDataScience 2 года назад
ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-osgqQy9Hr8s.html
@tharindumadusanka3038
@tharindumadusanka3038 2 года назад
i am doing MBA using apriori algorithm by using google colab. the problem is when i use more than 20 rows in csv transaction data it displays error. if the no of rows is less than 20 expected result come.
@UnfoldDataScience
@UnfoldDataScience 2 года назад
Thats not number of rows problem, some hidden issue may be there with row number 21 probably. I am just guessing.
@nagarajsundar7931
@nagarajsundar7931 2 года назад
Hi Aman, Thanks for explaining various method. One question, when to use which method ?
@UnfoldDataScience
@UnfoldDataScience 2 года назад
Thanks Naga, cant have like one to one go for rule. some pointers are there which I can cover in different video, thanks for asking
@dhanushraj3697
@dhanushraj3697 Год назад
The video was good but i request to add some extra information and explanation for each methods.
@chalmerilexus2072
@chalmerilexus2072 2 года назад
Which method is preferable?
@UnfoldDataScience
@UnfoldDataScience 2 года назад
This is discussed towards end.
@ratnajyotibhowmick9801
@ratnajyotibhowmick9801 2 года назад
Please share the source of the notebook. Thanks.
@UnfoldDataScience
@UnfoldDataScience 2 года назад
drive.google.com/drive/u/0/folders/13pZrCIqk1XN6W4I95A07bK8YRHBB3btt
@mihretdesta9153
@mihretdesta9153 Год назад
hey sir, how about imbalanced image data for deep learning?
@UnfoldDataScience
@UnfoldDataScience Год назад
Data augmentation is one option.
@hasantalib6254
@hasantalib6254 10 месяцев назад
Hello I’m irritated to know from you how can deal with unbalanced penal data ? How can i transform the data when there is missing year ??
@PalaSheshu111
@PalaSheshu111 Год назад
github link
Далее
I Built a EXTREME School Bus!
21:37
Просмотров 6 млн
How to handle imbalanced datasets in Python
11:48
Просмотров 48 тыс.
Handling Imbalanced Datasets   SMOTE Technique
24:32
Просмотров 49 тыс.
I Built a EXTREME School Bus!
21:37
Просмотров 6 млн