Тёмный

Mufan Li - Infinite-Depth Neural Networks as Depthwise Stochastic Processes 

One world theoretical machine learning
Подписаться 1,9 тыс.
Просмотров 901
50% 1

Abstract: Recent advances in neural network research have predominantly focused on infinite-width architectures, yet the complexities inherent in modelling networks with substantial depth call for a novel theoretical framework. In this presentation, we explore a unique approach to modelling neural networks using the proportional infinite-depth-and-width limit.
In fact, naively stacking non-linearities in deep networks leads to problematic degenerate behaviour at initialization. To address this challenge and achieve a well-behaved infinite-depth limit, we introduce a fundamentally novel framework: we treat neural networks as depthwise stochastic processes. Within this framework, the limit is characterized by a stochastic differential equation (SDE) that governs the feature covariance matrix. Notably, the framework we introduced leads to a very accurate model of finite size networks. Finally, we will briefly discuss several applications, including stabilizing gradients for Transformers, saving computational costs in hyperparameter tuning, and a new spectrum result for products of random matrices.

Наука

Опубликовано:

 

9 апр 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 1   
@Kram1032
@Kram1032 2 месяца назад
very interesting talk! Seems like a surprisingly easy shaping method you found. That should be huge! This is perhaps only indirectly related, but: I wonder if we could somehow go from a "discrete" layer-for-layer evaluation to one that's akin to various monte-carlo techniques which just sample from the continuum solution. Not sure how you'd take care of or replace the discrete parameters of an NN in such a setting, but I'm kinda picturing the difference between "radiosity" and "path tracing" when it comes to rendering. In Path tracing, if done correctly, you can kinda directly and unbiasedly approximate the continuum limit of the distribution of light in a scene, and it's all built on stochastic processes. You can even take care of "infinitely deep paths" correctly by stochastically cancelling paths at a *finite* depth through a Russian Roulette procedure, and you can combine many sampling procedures optimally through multiple importance sampling. More recently, that's even possible for a *continuum* of sampling methods in the form of *stochastic* importance sampling. I'd imagine something similar could be used for *actually* training evaluating *"infinite"* (both in width and depth) NNs by simply evaluating them to some finite but task-dependent depth. The main question to me is how to even set or store weights in such a setting in a finite amount of memory. I'm guessing you'd somehow have weights be defined through like a Gaussian mixture process or the like, but it's probably much easier said than done.
Далее
Happy 4th of July 😂
00:12
Просмотров 19 млн
НУБ ИЩЕТ ЖЕНУ В GTA SAMP
22:34
Просмотров 363 тыс.
5 Easy Ways to help LLMs to Reason
50:37
Просмотров 3,5 тыс.
Ti John | Introduction to Gaussian processes
1:18:26
xLSTM: Extended Long Short-Term Memory
57:00
Просмотров 32 тыс.
Stochastic Depth for Neural Networks Explained
26:36
Neural Differential Equations
35:18
Просмотров 135 тыс.
MIT 6.S191: Convolutional Neural Networks
1:07:58
Просмотров 42 тыс.