Mufan Li - Infinite-Depth Neural Networks as Depthwise Stochastic Processes

Подписаться 1,9 тыс.

50% 1

Abstract: Recent advances in neural network research have predominantly focused on infinite-width architectures, yet the complexities inherent in modelling networks with substantial depth call for a novel theoretical framework. In this presentation, we explore a unique approach to modelling neural networks using the proportional infinite-depth-and-width limit.
In fact, naively stacking non-linearities in deep networks leads to problematic degenerate behaviour at initialization. To address this challenge and achieve a well-behaved infinite-depth limit, we introduce a fundamentally novel framework: we treat neural networks as depthwise stochastic processes. Within this framework, the limit is characterized by a stochastic differential equation (SDE) that governs the feature covariance matrix. Notably, the framework we introduced leads to a very accurate model of finite size networks. Finally, we will briefly discuss several applications, including stabilizing gradients for Transformers, saving computational costs in hyperparameter tuning, and a new spectrum result for products of random matrices.

Наука

Опубликовано:

9 апр 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 1

@Kram1032 2 месяца назад

very interesting talk! Seems like a surprisingly easy shaping method you found. That should be huge! This is perhaps only indirectly related, but: I wonder if we could somehow go from a "discrete" layer-for-layer evaluation to one that's akin to various monte-carlo techniques which just sample from the continuum solution. Not sure how you'd take care of or replace the discrete parameters of an NN in such a setting, but I'm kinda picturing the difference between "radiosity" and "path tracing" when it comes to rendering. In Path tracing, if done correctly, you can kinda directly and unbiasedly approximate the continuum limit of the distribution of light in a scene, and it's all built on stochastic processes. You can even take care of "infinitely deep paths" correctly by stochastically cancelling paths at a *finite* depth through a Russian Roulette procedure, and you can combine many sampling procedures optimally through multiple importance sampling. More recently, that's even possible for a *continuum* of sampling methods in the form of *stochastic* importance sampling. I'd imagine something similar could be used for *actually* training evaluating *"infinite"* (both in width and depth) NNs by simply evaluating them to some finite but task-dependent depth. The main question to me is how to even set or store weights in such a setting in a finite amount of memory. I'm guessing you'd somehow have weights be defined through like a Gaussian mixture process or the like, but it's probably much easier said than done.