What do you do when your data has lots more negative examples than positive ones? Link to Code : github.com/ritvikmath/RU-vid... My Patreon : www.patreon.com/user?u=49277905
Great video. For other ML algorithms like logistic regression, SVM, KNN etc, can we implement the first method (upweight the minority class) ? or this is only applicable to decision tree ?
very interesting. AdTech modeling of conversions as caused by advertising always suffers from imbalance. (Conversion rates are usually low-mid single digits).
Great video! But don’t you think with such unbalanced dataset it would be better going for an anomaly detection algorithm instead of classification algorithm?
To my knowledge, don't think any classification what immunes from imbalanced dataset because they are data-driven. However, you are still able to get very good accuracy from imbalanced dataset. It happens when inter-class separability is very high, for example, detection of water bodies (often a minority class) over a large area is often quite accurate.
I think you'd want to define the folds on the original data and then oversample holding some folds fixed. Example: 3-fold CV. - split original data into 3 folds (A,B,C) - consider (A,B) as training data -> oversample that data -> validate using C. - repeat using A,B as validation sets - note that there is no data leak in this case
Hi just wondering if SMOTE is applicable for image data? I saw only one article on it online, so I am not sure if it even works since generating synthetic images is likely much harder.
hi when people have problems with unbalanced data, it's just the proof they did not get what they do when i was young ( a long time ago, so), our teachers wanted us to do things ' step by step' to be ( nearly) sure we knew what we were calculating as it's not the case anymore, yes, people dont get the methodology and the maths, but practice data science, wich is sad