Neural Networks as Quantum Field Theories (NNGP, NKT, QFT, NNFT)

Подписаться 3,4 тыс.

Просмотров 17 тыс.

50% 1

Introductory breakdown of Neural Network Gaussian Processes (NNGP), Neural Tangent Kernel (NKT) theory, Quantum Field Theory (QFT), Neural Network Field Theory (NNFT), Links: gist.github.com/Nikolaj-K/88e...
A good starting point is
en.wikipedia.org/wiki/Neural_...
The paper discussed is at
arxiv.org/pdf/2307.03223.pdf
And so in the video we get into Feynman Diagrams for studying Deep Neural Network initialization, QFT observation prediction via neural networks, in particular architectures and sampling schemes for quartic interaction Lagrangians.
All that good stuff...

Опубликовано:

27 дек 2023

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 55

@MuonRay 7 месяцев назад

I am actually working in this field of quantum complex networks and once you go down the rabbit hole of this, with quantum tensor networks being the explanation of many-body entanglement, percolation being a feature of quantum networks that display similarities to classical Kuramoto-style synchronization of oscillators as well as the way in which Lagrangians can be used to give a sum-over-histories description of entanglement then you truly never see the world the same way again. Renormalization group theory has also a history in use in complex networks so the mathematical fuel is there and the phenomenology of quantum and network emergent adaptive behaviors' is an increasing field of study.

@enricobianchi4499 7 месяцев назад

Recently I've been reading Carlo Rovelli's divulgative work on loop quantum gravity and his interpretation of quantum mechanics RQM. Indeed in LQG the structure of spacetime is this big graph where every node is a quantum of space, and events in the world only occur through the mediation of each link in the graph. Do you think there's a connection to be made?

@thejswaroop5230 6 месяцев назад

Jargonification much eh?😂

@RoboticusMusic 6 месяцев назад

Can you explain in English without referencing jargon?

@pik910 6 месяцев назад

That's what she said

@eqwerewrqwerqre 6 месяцев назад

@everyone it's literally impossible to explain this in a youtube comment without jargon. The op is giving a survey of the largest and most interesting results in a very large field that you _might_ understand after completing a graduate degree focused on the subject. There are years and years of focused study required to understand this, you obviously would do well to get a degree in physics followed by another higher degree in physics, but i'd bet even the vast vast majority of people who fit that criteria wouldn't understand every word, and might take months of careful study to fill in the gaps. I'm not trying to say you shouldn't try to understand things, but don't expect that everything can be un-jargoned. This is edge of human knowledge shit. My advice for learning, pick a word and Google it, find the Wikipedia article or a RU-vid video and try to understand, if you come across a word you don't understand in that learning, repeat. Take notes, build understanding, and learn things, it can be incredibly rewarding to finally piece together something strange like this. Besides, even if we could unjargon it, so much information will be lost that the result would either be completely different and meaningless, or just a different set of incredibly difficult concepts no one will understand but without the elegance imposed by the initial choice of terms. All those terms have very specific meanings that depend on a wide base of knowledge, some of this terms require a very specific and tall base of knowledge. I'm not trying to be a dick or anything, but this is the actual truth about that particular set of jargon.

@ChaoticNeutralMatt 7 месяцев назад

Fascinating to see this come out now.

@vtrandal 7 месяцев назад

Thank you for sharing your interest and ideas regarding this paper. Really, thank you!

@NikolajKuntner 7 месяцев назад

Glad you enjoy it. Do you come from the NN or the QFT angle?

@Harsooo 6 месяцев назад

Servus aus Linz! Amazing what the youtube algorithm keeps dishing me up; fascinating

@NikolajKuntner 6 месяцев назад

Servas.

@alejandrorosales2863 6 месяцев назад

This was a great video! I eprosnally enjoyed the formatting, and the subject matter was super interesting! Great content :)

@NikolajKuntner 6 месяцев назад

Thanks. Thanks. What does 'formatting' denote here?

@drdca8263 7 месяцев назад

Thanks for the video and links, I hadn’t heard of this connection

@NikolajKuntner 7 месяцев назад

hot off the press

@digguscience 7 месяцев назад

one of the amazing applications of the concept of artificial neural networks

@NikolajKuntner 7 месяцев назад

Glad you enjoy it also. And in your case, do you come from the NN or the QFT angle?

@onnilattu9138 6 месяцев назад

What an amazing connection. I dont know much about neural nets but i do know about qft, excited to see whether this idea gets explored further

@NikolajKuntner 6 месяцев назад

Well you're in luck then, the subject of QFT is a fair bit harder to learn. That said, I suppose both are tedious if you want to get anywhere new.

@onnilattu9138 6 месяцев назад

@@NikolajKuntner Both are challenging for sure. Ive been meaning to get more into deep learning once i graduate with physics

@lucmccutcheon6703 7 месяцев назад

Really interesting, would love to see more stuff like this. Maybe something on Reinforcement Learning with path ntegrals? Such as MPPI could be interesting too, as an AI scientist with a cs background it's great to get some explanation of the maths

@NikolajKuntner 7 месяцев назад

Mhm, do you have a scenario in mind where such concepts would come together. The field theory part developed so far doesn't seem to be tied much to the learning process.

@anthonygraybosch2202 7 месяцев назад

Great video

@NikolajKuntner 7 месяцев назад

Thanks!

@travisporco 7 месяцев назад

what do you think the minimal needed background for making some sense of this is?

@NikolajKuntner 7 месяцев назад

If you know how a neural network encodes functions, and if you know how an auto-correlation function is defined, then it's probably already worth a try.

@philipm3173 7 месяцев назад

You should learn matrix algebra, 3Blue1Brown describes Jacobian matrices.

@vfwh 6 месяцев назад

Question about initial conditions weights in the NN: isn't it the case that with very large layers in the billions or trillions of nodes, the initial weights don't matter so much, given that you are not sampling a small number of weights from a very large probability space, you are pretty much taking all the values of weights up to a trillion, or something? It's obviously not that simple, but I'm wondering whether the influence of the specific sample of initial condition weights decreases as the total number of nodes increases? In other words, the larger the number of nodes, the larger the entropy of the initial weights whatever they are. And the larger the initial entropy, the more likely that the training will eventually land on a very similar result in the end. Whereas if you have low entropy small selection of initial weights, the less freedom the training has to explore all the probability space. Do you think there's something there?

@physbuzz 6 месяцев назад

Loved the video! It's a really interesting concept. This is a bit of a naysayer thing to do, but I took some notes on my skepticism of deep meaningful connections between NNs and QFTs. These are things I'm typing out so that I can concretely be proven wrong, or see how they're addressed in papers! 1. If you start at a Gaussian theory, that is fine, but the interesting things in ML should happen far away from this Gaussian theory. Conversely, for using NNs to get at near-Gaussian theories, it seems unlikely that there is any deep understanding/advancement to be made in terms of NNs that doesn't already look better in terms of perturbative QFT or lattice QFT. 2. About gradient descent, it seems like steps not being continuous is a good thing in ML. Right, stochastic gradient descent makes sure that we aren't actually just flowing downhill. I'm not sure what the graph looks like in a real NN (meaning: if we're doing MNIST with a dense NN and we're in the middle of training, what does f(t)=Error( theta(t0) - (t-t0) grad(Error(theta(t0))) ) look like, on the scale of a good step size h? Is it smooth or are we already skipping a lot of maxima and minima?) 3. The gradient descent flow theta'(t)=-grad Error(theta(t)) doesn't have interesting symplectic structure, does it? So the tools of dynamics from classical and quantum mechanics don't directly apply. 4. In QFTs, we can do Feynman diagrams in the position space representation, and the interesting thing is propagators between different points labeled by space. If we're not labeling nodes in the NN by position (so if it's densely connected or, like a human neuron, branches out into many arbitrary dendrites) then any propagator will be much less well-behaved and much less useful. This goes out the window for CNNs though. 5. What you say at 35:30 sounds like a really great way to describe quantum field theories to someone who knows machine learning well and is used to working in these large parameter spaces. So forming this pedagogy and dictionary is nice, but there's the notion of "conservation of difficulty" (Terence Tao "Generalised solutions") where it doesn't seem like translating between a QFT and an NN exploits any essential features. Perturbative QFT exploits being near gaussianity and translation invariance. Renormalization exploit how your theory can be specified given only several free parameters even though it may look initially like there are infinitely many free parameters. But I thought the beauty of large NNs is that they exploit having an enormous number of free parameters which are all important, rather than being set by a few mass scales! I don't know though, maybe after building the dictionary then other concepts like those from gauge theories could be applied to NNs to great effect.

@NikolajKuntner 6 месяцев назад

As for the question, I don't think there's a nice symplectic geometry to be found in any neural network of practical interest. The cost function (or "potential") based on actual training data will also be hella messy.

@RoboticusMusic 6 месяцев назад

I don't understand any of the terms or the math. Help.

@onnilattu9138 6 месяцев назад

Yeeeaaa you have some reading ahead of you if you wanna make sense of qft

@capitalistdingo 7 месяцев назад

As someone who is a complete layman but finds both neural networks and field theory interesting it is unfortunate that the RU-vid algorithm keeps showing me videos about the potential overlap of these subjects since I am never going to understand any of it. 😆 It’s like having rabies and being thirsty but not being able to swallow while someone keeps presenting you with a tall cool glass of ice water with a garnish of lemon. 😝

@NikolajKuntner 7 месяцев назад

I'm sorry Sir, you now have 2 weeks to live.

6 месяцев назад

Khan Academy, 3blue1brown, study.

@mastershooter64 7 месяцев назад

woah Matthew D. Schwartz? Is that the QFT book schwartz?

@NikolajKuntner 6 месяцев назад

Looks like it, yes.

@CalculationConsulting 6 месяцев назад

You can also use techniques from quantum field theory to compute the capacity of a NN layer. See my NeuriPS talk on the subject ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-ZT0yZ-wFIaE.html

@NikolajKuntner 6 месяцев назад

Will check it out

@grimsk 6 месяцев назад

미치겠다 로저 펜로즈가 좋아하겠네

@NikolajKuntner 6 месяцев назад

Ο άντρας είναι ακόμα ζωντανός. Εάν πρόκειται να τον ρωτήσετε, παρακαλώ αναφέρετε τι απάντησε.

@RoboticusMusic 6 месяцев назад

Why is Gaussian special here? Bell curve shows up in everything everywhere.

@NikolajKuntner 6 месяцев назад

Central limit theorem (Zentraler Grenzwertsatz). The necessary sum required for it arises in the standard network architecture, when you combine the outputs of the nodes from one layer into a new node in the next layer.

@user-ni2we7kl1j 6 месяцев назад

The math here is completely out of my league and I'm just a humble CS student, but I have to ask, what the hell does delta raised to the power of delta could even mean conceptually at (1.11) and (4.20)? I understand "(d^3)r" being a shortcut for dxdydz but "(d^d)x" is some next level math wizardry

@NikolajKuntner 6 месяцев назад

It's called dimensional regularization and has its own Wikipedia page to read up. The tl;dr is that the d in ^d is an auxiliary parameter. You try to compute the generic integral and look at the behaviour of the result as d goes to the dimension you're interested in. Using two different d's is not helpful, but sadly that's the standard notation. Btw. I wouldn't call either of them "delta."