Тёмный

Simon Du - How Over-Parameterization Slows Down Gradient Descent 

One world theoretical machine learning
Подписаться 1,9 тыс.
Просмотров 291
50% 1

Abstract: We investigate how over-parameterization impacts the convergence behaviors of gradient descent through two examples. In the context of learning a single ReLU neuron, we prove that the convergence rate shifts from $exp(−T)$ in the exact-parameterization scenario to an exponentially slower $1/T^3$ rate in the over-parameterized setting. In the canonical matrix sensing problem, specifically for symmetric matrix sensing with symmetric parametrization, the convergence rate transitions from $exp(−T)$ in the exact-parameterization case to $1/T^2$ in the over-parameterized case. Interestingly, employing an asymmetric parameterization restores the $exp(−T)$ rate, though this rate also depends on the initialization scaling. Lastly, we demonstrate that incorporating an additional step within a single gradient descent iteration can achieve a convergence rate independent of the initialization scaling.

Наука

Опубликовано:

 

25 июн 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии    
Далее
Gradient Descent, Step-by-Step
23:54
Просмотров 1,3 млн
Brazil Fan 😢
00:14
Просмотров 2,9 млн
Solve any equation using gradient descent
9:05
Просмотров 52 тыс.
Fixing RAG with GraphRAG
15:04
Просмотров 6 тыс.
Gradient Descent Explained
7:05
Просмотров 60 тыс.
22. Gradient Descent: Downhill to a Minimum
52:44
Просмотров 76 тыс.
how to make every day special
9:42
Просмотров 30 тыс.
Телефон-електрошокер
0:43
Просмотров 79 тыс.