Тёмный

This is why you should care about unbalanced data .. as a data scientist 

ritvikmath
Подписаться 160 тыс.
Просмотров 16 тыс.
50% 1

What do you do when your data has lots more negative examples than positive ones?
Link to Code : github.com/ritvikmath/RU-vid...
My Patreon : www.patreon.com/user?u=49277905

Опубликовано:

 

30 июл 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 25   
@jessibenzel243
@jessibenzel243 2 года назад
We just talked about this in my machine learning course this week!! Great timing! This video is very helpful.
@haneulkim4902
@haneulkim4902 2 года назад
Great content, these practical content is gold. Thank you :)
@pgbpro20
@pgbpro20 2 года назад
ritvikmath coming with a video of one of my favorite topics - instant like!
@tech-n-data
@tech-n-data Год назад
Thank you so much for all you do.
@JessWLStuart
@JessWLStuart 9 месяцев назад
Well presented!
@igorbreeze3734
@igorbreeze3734 2 года назад
Hi! Great video. Is there any way you would like to creat a full in-depth catboost tutorial on some random data? Would be super useful.
@davidzhang4825
@davidzhang4825 Год назад
Great video. For other ML algorithms like logistic regression, SVM, KNN etc, can we implement the first method (upweight the minority class) ? or this is only applicable to decision tree ?
@Sameerahmed373
@Sameerahmed373 2 года назад
Can we customise loss function? For example more weight for misclassification of true minor class and less weight for the other error?
@joelrubinson9973
@joelrubinson9973 2 года назад
very interesting. AdTech modeling of conversions as caused by advertising always suffers from imbalance. (Conversion rates are usually low-mid single digits).
@d.a.k.o.s9163
@d.a.k.o.s9163 Год назад
Great video! But don’t you think with such unbalanced dataset it would be better going for an anomaly detection algorithm instead of classification algorithm?
@bmebri1
@bmebri1 2 года назад
Excellent video! One question though: are certain classification models immune from class imbalance? Thanks!
@LanNguyen-eq6lf
@LanNguyen-eq6lf 2 года назад
To my knowledge, don't think any classification what immunes from imbalanced dataset because they are data-driven. However, you are still able to get very good accuracy from imbalanced dataset. It happens when inter-class separability is very high, for example, detection of water bodies (often a minority class) over a large area is often quite accurate.
@aghazi94
@aghazi94 2 года назад
you are seriously so underrated
@danielwiczew
@danielwiczew 2 года назад
Okey, but with oversampling - how do you use cross validation ? Because if you use it on the oversampled dataset, you'll have dataleak
@ritvikmath
@ritvikmath 2 года назад
I think you'd want to define the folds on the original data and then oversample holding some folds fixed. Example: 3-fold CV. - split original data into 3 folds (A,B,C) - consider (A,B) as training data -> oversample that data -> validate using C. - repeat using A,B as validation sets - note that there is no data leak in this case
@mrirror2277
@mrirror2277 2 года назад
Hi just wondering if SMOTE is applicable for image data? I saw only one article on it online, so I am not sure if it even works since generating synthetic images is likely much harder.
@shahrinnakkhatra2857
@shahrinnakkhatra2857 6 месяцев назад
That's where image augmentation comes to play. You can create different variations of that image by rotating, flipping etc various transformations
@zahrashekarchi6139
@zahrashekarchi6139 Год назад
Great demo! just one thought, why did you not talk about downsampling the majority class? and see what can be the impact?
@douwe7493
@douwe7493 4 месяца назад
This is something I am wondering about too!
@Septumsempra8818
@Septumsempra8818 2 года назад
Are you familiar with Latent vectors in network analysis? s/o from South Africa
@chenxiaodu2557
@chenxiaodu2557 2 месяца назад
It should be "imbalanced data" instead of "unbalanced data"
@brenoingwersen784
@brenoingwersen784 Месяц назад
Lol 😂
@bernardfinucane2061
@bernardfinucane2061 2 года назад
You could predict that aircraft engines NEVER fail and almost always be right.
@junkbingo4482
@junkbingo4482 2 года назад
hi when people have problems with unbalanced data, it's just the proof they did not get what they do when i was young ( a long time ago, so), our teachers wanted us to do things ' step by step' to be ( nearly) sure we knew what we were calculating as it's not the case anymore, yes, people dont get the methodology and the maths, but practice data science, wich is sad
@junkbingo4482
@junkbingo4482 2 года назад
ups, nuance wrote 'yes'!!; thx to lstm, i did not check my post, sorry! ;-)
Далее
Probability Calibration : Data Science Concepts
10:23
I gave 127 interviews. Top 5 Algorithms they asked me.
8:36
Bias-Variance Tradeoff : Data Science Basics
12:25
Просмотров 48 тыс.
Gradient Boosting : Data Science's Silver Bullet
15:48
Support Vector Machines : Data Science Concepts
8:07
How to Deal with Toxic, Jealous, Insecure Coworkers
14:10