Тёмный

Aditya Varre - On the spectral bias of two-layer linear networks 

One world theoretical machine learning
Подписаться 1,9 тыс.
Просмотров 192
50% 1

Abstract: In this talk, we analyze the behaviour of two-layer fully connected networks with linear activations trained with gradient flow on the square loss. We show how the optimization process carries an implicit bias on the parameters that depends on the scale of its initialization. The main result of the paper is a variational characterization of the loss minimizers retrieved by the gradient flow for a specific initialization shape. This characterization reveals that, in the small-scale initialization regime, the linear neural network's hidden layer is biased toward having a low-rank structure. To complement our results, we showcase a hidden mirror flow that tracks the dynamics of the singular values of the weights matrices and describe their time evolution. Towards the end, we discuss the implications for stochastic gradient descent and show some empirical evidence beyond linear networks.

Наука

Опубликовано:

 

26 мар 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 2   
@oncedidactic
@oncedidactic 3 месяца назад
great talk thanks! very interested in more about the effect of discretization on SGD
@javier2luna
@javier2luna 3 месяца назад
you says two hidden layers in a neural network, right?
Далее
SCRUB: SpaceX Attempt One - Starship Flight Test
9:9:58
General Science Quiz - How Many Can You Answer?
26:55
Просмотров 740 тыс.
SQLite: How it works, by Richard Hipp
1:39:27
Просмотров 2,9 тыс.
Телефон-електрошокер
0:43
Просмотров 390 тыс.