What's going on with deep learning? What sorts of models get learned, and what are the learning dynamics? Singular learning theory is a theory of Bayesian statistics broad enough in scope to encompass deep neural networks that may help answer these questions. In this episode, I speak with Daniel Murfet about this research program and what it tells us.
Patreon: / axrpodcast
Ko-fi: ko-fi.com/axrp...
Topics we discuss, and timestamps:
0:00:26 - What is singular learning theory?
0:16:00 - Phase transitions
0:35:12 - Estimating the local learning coefficient
0:44:37 - Singular learning theory and generalization
1:00:39 - Singular learning theory vs other deep learning theory
1:17:06 - How singular learning theory hit AI alignment
1:33:12 - Payoffs of singular learning theory for AI alignment
1:59:36 - Does singular learning theory advance AI capabilities?
2:13:02 - Open problems in singular learning theory for AI alignment
2:20:53 - What is the singular fluctuation?
2:25:33 - How geometry relates to information
2:30:13 - Following Daniel Murfet's work
The transcript: axrp.net/episo...
Daniel Murfet's twitter/X account: / danielmurfet
Developmental interpretability website: devinterp.com
Developmental interpretability RU-vid channel: / @devinterp
Main research discussed in this episode:
Developmental Landscape of In-Context Learning: arxiv.org/abs/...
Estimating the Local Learning Coefficient at Scale: arxiv.org/abs/...
Simple versus Short: Higher-order degeneracy and error-correction: www.lesswrong....
Other links:
Algebraic Geometry and Statistical Learning Theory (the grey book): www.cambridge....
Mathematical Theory of Bayesian Statistics (the green book): www.routledge....
In-context learning and induction heads: transformer-ci...
Saddle-to-Saddle Dynamics in Deep Linear Networks: Small Initialization Training, Symmetry, and Sparsity: arxiv.org/abs/...
A mathematical theory of semantic development in deep neural networks: www.pnas.org/d...
Consideration on the Learning Efficiency Of Multiple-Layered Neural Networks with Linear Units: papers.ssrn.co...
Neural Tangent Kernel: Convergence and Generalization in Neural Networks: arxiv.org/abs/...
The Interpolating Information Criterion for Overparameterized Models: arxiv.org/abs/...
Feature Learning in Infinite-Width Neural Networks: arxiv.org/abs/...
A central AI alignment problem: capabilities generalization, and the sharp left turn: www.lesswrong....
Quantifying degeneracy in singular models via the learning coefficient: arxiv.org/abs/...
15 окт 2024