Why We Don't Use the Mean Squared Error (MSE) Loss in Classification

Подписаться 9 тыс.

Просмотров 3,9 тыс.

50% 1

In this video we discuss why the mean squared error (MSE) loss is not used for classification problems. We take a look at three important aspects: (1) the MSE assumes a gaussian prior, (2) the MSE applied on classification problems results in a non-convex function and (3) the MSE doesn't penalise well enough the errors in classification compared to the binary cross entropy loss function.
References
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
Gaussian distribution explained: • Multivariate Normal (G...
Binary cross entropy prior for Bernoulli distribution: towardsdatascience.com/where-...
Demonstration that the binary cross entropy loss for classification is convex: towardsdatascience.com/why-no...
Related Videos
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
Why neural networks are universal functions approximators: • Why Neural Networks Ca...
Why we need activations in neural nets: • Why We Need Activation...
Bias variance Trade-off: • Why Models Overfit and...
Neural networks on tabular data: • Why Deep Neural Networ...
Why we divide by N-1 in the sample variance: • Why We Divide by N-1 i...
Contents
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
00:00 - Intro - MSE for classification
01:12 - Reason 1 - MSE assumes a gaussian prior
04:15 - Reason 2 - MSE non-convexity
08:03 - Reason 3 - MSE weak penalisation
08:42 - Outro
Follow Me
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
🐦 Twitter: @datamlistic / datamlistic
📸 Instagram: @datamlistic / datamlistic
📱 TikTok: @datamlistic / datamlistic
Channel Support
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
The best way to support the channel is to share the content. ;)
If you'd like to also support the channel financially, donating the price of a coffee is always warmly welcomed! (completely optional and voluntary)
► Patreon: / datamlistic
► Bitcoin (BTC): 3C6Pkzyb5CjAUYrJxmpCaaNPVRgRVxxyTq
► Ethereum (ETH): 0x9Ac4eB94386C3e02b96599C05B7a8C71773c9281
► Cardano (ADA): addr1v95rfxlslfzkvd8sr3exkh7st4qmgj4ywf5zcaxgqgdyunsj5juw5
► Tether (USDT): 0xeC261d9b2EE4B6997a6a424067af165BAA4afE1a
#mse #bce #classification #stats

Опубликовано:

2 авг 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 18

@datamlistic 2 дня назад

**Video correction:** - BCE should be plus infinity at the end (thx @guilhermethomaz8328)

@ShahFahad-ez1cm Месяц назад

I would like to suggest a correction in Linear Regression, the data itself is not assumed to come from a normal distribution, but the errors are assumed to come from a normal distribution

@datamlistic 21 день назад

Agreed, sorry for the novice mistake. I've corrected myself in my latest video. :)

@guilhermethomaz8328 4 дня назад

Excelent video. BCE should be plus infinity at the end.

@datamlistic 2 дня назад

Thanks for the correction!

@user-sp6yu9rb6q 10 месяцев назад

It's very helpful!! Many thanks.

@datamlistic 10 месяцев назад

Thanks! Glad it was helpful! :)

@shahulrahman2516 6 месяцев назад

great lecture

@datamlistic 6 месяцев назад

Thanks! Glad you liked it! :)

@ramirolopezvazquez4636 6 месяцев назад

Thanks for the wonderful video. Could anybody be so kind to comment (or share some reference) on why the MSE loss assumes a Gaussian distribution for the underlying data?

@datamlistic 6 месяцев назад

You're welcome! Here's a link that explains in much more detail why the MSE loss assumes a Gaussian prior: towardsdatascience.com/why-using-mean-squared-error-mse-cost-function-for-binary-classification-is-a-bad-idea-933089e90df7. Hope it helps! :)

@ramirolopezvazquez4636 6 месяцев назад

Thanks a lot for your kind answer and awesome work! @@datamlistic

@atendragautam4925 11 месяцев назад

Summary: Q) Why can't we use MSE loss in logistics regression than binary cross-entropy loss? Ans: 1. While maximizing the probability if you assumes output comes from Gaussian distribution than it can be proven that it's equivalent to Minimize MSE loss but If we take output distribution as Bernoulli than BCE loss would come. So There is mismatch in distributions of output 2. If you use MSE as loss in logistics, loss would become non convex funtion(can be proved by taking the second derivative) whereas with BCE it's convex 3. MSE doesn't penalize misclassification enough, BCE does

@datamlistic 11 месяцев назад

Thanks for the summary! :)

@anamitrasingha6362 10 месяцев назад

So I tried out the math and am I correct to say that in the interval 0 to 1, the loss function is neither convex nor concave hence it becomes hard to optimize this loss function via the methods which assume functions to be either convex or concave

@datamlistic 10 месяцев назад

@@anamitrasingha6362 That's really nice. Would you mind sharing your calculation? :)