Тёмный

Lei Wu - Understanding the implicit bias of SGD: A dynamical stability perspective 

One world theoretical machine learning
Подписаться 1,9 тыс.
Просмотров 269
50% 1

Abstract: In deep learning, models are often over-parameterized, which leads to concerns about algorithms picking solutions that generalize poorly. Fortunately, stochastic gradient descent (SGD) always converges to solutions that generalize well even without needing any explicit regularization, suggesting certain “implicit regularization” at work. This talk will provide an explanation of this striking phenomenon from a stability perspective. Specifically, we show that a stable minimum of SGD must be flat, as measured by various norms of local Hessian. Furthermore, these flat minima provably generalize well for two-layer neural networks and diagonal linear networks. As opposed to popular continuous-time analysis, our stability analysis respects the discrete nature of SGD and can explain the effect of finite learning rates, batch size, and why SGD often generalizes better than GD.

Наука

Опубликовано:

 

7 июн 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии    
Далее
The Most Impressive Basketball Moments!
00:36
Просмотров 13 млн
Autoencoders | Deep Learning Animated
11:41
Просмотров 3 тыс.
Deep Networks Are Kernel Machines (Paper Explained)
43:04
Так ли Хорош Founders Edition RTX 4080 ?
13:00
OZON РАЗБИЛИ 3 КОМПЬЮТЕРА
0:57
Просмотров 506 тыс.