This is why you should care about unbalanced data .. as a data scientist

Подписаться 160 тыс.

Просмотров 16 тыс.

50% 1

What do you do when your data has lots more negative examples than positive ones?
Link to Code : github.com/ritvikmath/RU-vid...
My Patreon : www.patreon.com/user?u=49277905

Опубликовано:

30 июл 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 25

@jessibenzel243 2 года назад

We just talked about this in my machine learning course this week!! Great timing! This video is very helpful.

@haneulkim4902 2 года назад

Great content, these practical content is gold. Thank you :)

@pgbpro20 2 года назад

ritvikmath coming with a video of one of my favorite topics - instant like!

@tech-n-data Год назад

Thank you so much for all you do.

@JessWLStuart 9 месяцев назад

Well presented!

@igorbreeze3734 2 года назад

Hi! Great video. Is there any way you would like to creat a full in-depth catboost tutorial on some random data? Would be super useful.

@davidzhang4825 Год назад

Great video. For other ML algorithms like logistic regression, SVM, KNN etc, can we implement the first method (upweight the minority class) ? or this is only applicable to decision tree ?

@Sameerahmed373 2 года назад

Can we customise loss function? For example more weight for misclassification of true minor class and less weight for the other error?

@joelrubinson9973 2 года назад

very interesting. AdTech modeling of conversions as caused by advertising always suffers from imbalance. (Conversion rates are usually low-mid single digits).

@d.a.k.o.s9163 Год назад

Great video! But don’t you think with such unbalanced dataset it would be better going for an anomaly detection algorithm instead of classification algorithm?

@bmebri1 2 года назад

Excellent video! One question though: are certain classification models immune from class imbalance? Thanks!

@LanNguyen-eq6lf 2 года назад

To my knowledge, don't think any classification what immunes from imbalanced dataset because they are data-driven. However, you are still able to get very good accuracy from imbalanced dataset. It happens when inter-class separability is very high, for example, detection of water bodies (often a minority class) over a large area is often quite accurate.

@aghazi94 2 года назад

you are seriously so underrated

@danielwiczew 2 года назад

Okey, but with oversampling - how do you use cross validation ? Because if you use it on the oversampled dataset, you'll have dataleak

@ritvikmath 2 года назад

I think you'd want to define the folds on the original data and then oversample holding some folds fixed. Example: 3-fold CV. - split original data into 3 folds (A,B,C) - consider (A,B) as training data -> oversample that data -> validate using C. - repeat using A,B as validation sets - note that there is no data leak in this case

@mrirror2277 2 года назад

Hi just wondering if SMOTE is applicable for image data? I saw only one article on it online, so I am not sure if it even works since generating synthetic images is likely much harder.

@shahrinnakkhatra2857 6 месяцев назад

That's where image augmentation comes to play. You can create different variations of that image by rotating, flipping etc various transformations

@zahrashekarchi6139 Год назад

Great demo! just one thought, why did you not talk about downsampling the majority class? and see what can be the impact?

@douwe7493 4 месяца назад

This is something I am wondering about too!

@Septumsempra8818 2 года назад

Are you familiar with Latent vectors in network analysis? s/o from South Africa

@chenxiaodu2557 2 месяца назад

It should be "imbalanced data" instead of "unbalanced data"

@brenoingwersen784 Месяц назад

Lol 😂

@bernardfinucane2061 2 года назад

You could predict that aircraft engines NEVER fail and almost always be right.

@junkbingo4482 2 года назад

hi when people have problems with unbalanced data, it's just the proof they did not get what they do when i was young ( a long time ago, so), our teachers wanted us to do things ' step by step' to be ( nearly) sure we knew what we were calculating as it's not the case anymore, yes, people dont get the methodology and the maths, but practice data science, wich is sad

@junkbingo4482 2 года назад

ups, nuance wrote 'yes'!!; thx to lstm, i did not check my post, sorry! ;-)