Тёмный
No video :(

Discovering Symbolic Models from Deep Learning with Inductive Biases (Paper Explained) 

Yannic Kilcher
Подписаться 261 тыс.
Просмотров 46 тыс.
50% 1

Опубликовано:

 

28 авг 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 85   
@IproCoGo
@IproCoGo 4 года назад
Yannic, thank you for presenting these papers. Your efforts are helpful, to understate the matter. With the number of videos you produce, this project must be time consuming. Know that it is worthwhile.
@YannicKilcher
@YannicKilcher 4 года назад
Thanks, I'm enjoying it so far :)
@annestaples7503
@annestaples7503 4 года назад
This was a great walk-through of the paper. Thank you for taking the time to do this. I wish all papers had similar walk-throughs!
@freemind.d2714
@freemind.d2714 3 года назад
hope future ML will do this automatically
@sorontar1
@sorontar1 4 года назад
I have found your channel by accident and I got my mind blown. Thank you for the effort!
@KyleCranmer
@KyleCranmer 4 года назад
Wow, what a surprise... this is great... thank you!
@jasdeepsinghgrover2470
@jasdeepsinghgrover2470 4 года назад
Amazing paper... Did you guys try it on physical systems like double pendulum??
@smnt
@smnt 4 года назад
Great work Kyle, I followed you when I worked on the LHC. Always loved your work. I almost applied to post-doc with you but was too scared, haha.
@zh4842
@zh4842 4 года назад
good job Kyle, where I can find the git repos to do some experiments, thanks in advance
@randommm-light
@randommm-light 4 года назад
This is so awesome! I’m very curious to see how the latent vector space derived by neural graphs is shaped and this really is first light in that direction for me. Thx!
@welcomeaioverlords
@welcomeaioverlords 4 года назад
Very clear explanation of a fascinating application area. Thanks Yannic!
@mau_lopez
@mau_lopez 4 года назад
Very interesting paper and an excellent explanation. Thanks a lot for sharing!
@carllammy8020
@carllammy8020 4 года назад
Thanks for walking me thru differing disciplines.
@bautistapeirone6345
@bautistapeirone6345 4 года назад
Thanks for your hard work, and excellent explanation. It really clears out things viewing them graphically. Keep on like this
@Basant5911
@Basant5911 11 месяцев назад
Amazing explanation bro. Enjoyed more than netflix.
@sahandajami3171
@sahandajami3171 3 года назад
Many thanks Yannic. Your explanations are really helpful.
@DistortedV12
@DistortedV12 4 года назад
oooo I was just going to ask you about this!! Thanks Yannic!
@MyU2beCall
@MyU2beCall 3 года назад
ThX Yannic. Yet another great Vid !
@edeneden97
@edeneden97 2 года назад
if we have 2 vertices and an edge between them, the force (in 2 dimensions) is not enough to update the vertices as the force direction is opposite for each vertex. Therefore we either need more than 2 hidden dimensions to represent the force or use 2 edges (directed edges) for each pair that only apply on the destination vertex for example
@__Somebody__Else__
@__Somebody__Else__ 4 года назад
Hey Yannic. I am a big fan of your channel, thank you very much. You really hit the mark with your content: Current research papers reviewed with a focus on the core principles of the contributions and with some context added to the respective strain of literature. This is perfect for practitioners like me - for staying up to date and to get inspiration on where else you can apply deep/machine learning. A question regarding the current paper: I don´t get why the notion of graph (network) is important here. They just seem to stack and combine neural networks and other computation in an intelligent way. For me the graph seems only to be a nice illustration for the independence of the forces, but I don´t get why graph principles are relevant here. Probably I am missing a point. Do you have an explanation?
@YannicKilcher
@YannicKilcher 4 года назад
You're right, ultimately it's just a weight sharing scheme and a computation flow. but that's conveniently described as a graph, so people call it graph network.
@sun1908
@sun1908 2 года назад
Thank you Yannic. Nicely explained.
@niraj5582
@niraj5582 3 года назад
Interesting paper indeed. You explained it really well. Thanks a lot.
@jabowery
@jabowery 4 года назад
Discovery of the planet Neptune was a latent identity (not to be confused with the article's "latent vector") imputed from gravitational perturbations. This, of course, would require regressing the topology of the GNN itself.
@spsharan2000
@spsharan2000 4 года назад
Really good video! Something new everytime :) On which device and app are these recorded on btw? 🤔
@YannicKilcher
@YannicKilcher 4 года назад
I do recording with OBS
@ScriptureFirst
@ScriptureFirst 2 года назад
41:30 it looks like the basic structure of this equation was determined in the nn setup & the coefficients were the nn output gold
@cmtro
@cmtro 4 года назад
Excellent ! Good explanations.
@crimythebold
@crimythebold 4 года назад
Very interesting. Very nice summary of the paper
@user-kg7ex2dm8g
@user-kg7ex2dm8g 4 года назад
Great Explanation👍
@frun
@frun 4 месяца назад
Physicists could do this for QM if wavefunction did not represent an ensemble of similarly prepared systems.
@youngjin8300
@youngjin8300 4 года назад
You’re something 👏
@AaronKarper
@AaronKarper 4 года назад
That still sounds like symbolic regression with extra assumptions. Is the neutral network actually necessary, it seems that fitting the symbolic equation against the NN could be symbolic regression against the data directly. Or am I missing something?
@marcovirgolin9901
@marcovirgolin9901 4 года назад
I am wondering the same, but I'd say it's smart to use grad desc to have NNs encode the edges, as to provide info on how symb.reg. should find the inner formulas. This work also reminds of Schmidt and Lipson's Science paper "distilling free-form natural laws from experimental data". Still gotta read this paper myself though
@malse1234
@malse1234 4 года назад
Thanks, this is a great question. It comes down to the fact that symbolic regression (with current techniques) scales very badly with increasing number of features, so breaking it down with a NN makes problems tractable. I explain this in more detail in the reddit thread: www.reddit.com/r/MachineLearning/comments/hfmqnx/d_paper_explained_discovering_symbolic_models/. Cheers! Miles
@malse1234
@malse1234 4 года назад
​@@marcovirgolin9901 Thanks for the question. We actually use Schmidt and Lipson's algorithm "eureqa" for our symbolic regression. One way of thinking of this work is an extension of their algorithm to high-dimensional spaces.
@marcovirgolin9901
@marcovirgolin9901 4 года назад
@@malse1234 thank you for your answer Miles, and congrats for this beautiful work.
@malse1234
@malse1234 4 года назад
@@marcovirgolin9901 Thank you!
@ifernandez08
@ifernandez08 4 года назад
Thank you!
@eladwarshawsky7587
@eladwarshawsky7587 5 месяцев назад
I was thinking that we could try to learn a graph neural net that relates X, Y coordinates to the corresponding RGB values. If we could create A relationship between the two modeled as a single simple equation, that could make for huge compression gains. Please tell me why or why not it wouldn’t work.
@freemind.d2714
@freemind.d2714 3 года назад
very good paper!!! and very good video as well!!!
@drdca8263
@drdca8263 4 года назад
I’m a bit confused about how the directionality of the information on the edges works. If the information produced for an edge is the force, then, when adding it on one object or the other, the force vector should be in opposite directions, right? How does this system account for that? Does it subtract the edge’s value in one of the sums and add it in the other?
@rcpinto
@rcpinto 4 года назад
Each bidirectional edge is actually 2 unidirectional edges, so there is no ambiguity.
@drdca8263
@drdca8263 4 года назад
Rafael Pinto thanks!
@n.lu.x
@n.lu.x 4 года назад
Thanks! Any chance of going through "Memory transformers" soon?
@ScriptureFirst
@ScriptureFirst 2 года назад
QUESTION. Does the graph somehow make the solution more examinable? Are graphs potentially an answer to black box nn? Cross reference paper LM’s are graphs. Thank you for considering.
@cameron4814
@cameron4814 4 года назад
damn thats some crazy shti
@dreznik
@dreznik 4 года назад
what tool do you use to draw next to the paper? and how about capturing your screen
@YannicKilcher
@YannicKilcher 4 года назад
I use OneNote and OBS
@sarvagyagupta1744
@sarvagyagupta1744 4 года назад
Hey. Great video like always. I have a question though: The datasets that they are using, is in itself a simulator and using different formulas to simulate the particles. And, in the end, the neural networks are outputting the same formula that is already being used in these simulators, according to which we are calculating the loss function. So we are not doing anything new here, it's more of a reinventing the wheel problem. Right?
@EyeIn_The_Sky
@EyeIn_The_Sky 4 года назад
I believe he said that the loss function was from actual observed data in the real world rather than simulations by other neural networks or some other technology.
@YannicKilcher
@YannicKilcher 4 года назад
I think the simulations are run with other equations than what comes out.
@sarvagyagupta1744
@sarvagyagupta1744 4 года назад
@@EyeIn_The_Sky did he? I think it's the simulation
@sarvagyagupta1744
@sarvagyagupta1744 4 года назад
@@YannicKilcher really? Two different equations lead to the same simulation? It could be very much possible but that's interesting. So do you think that with different initial settings, we might be able to get different results from the GNN?
@uyenhoang1780
@uyenhoang1780 3 года назад
Sorry, but I find the most important problem is that what the components in L1 are not clear yet, and the details of applying standard deviation and it is a bit confusing and not clearly described. Can you explain over that part
@elsayedmohamed9706
@elsayedmohamed9706 4 года назад
Thank you ❤
@jonathansum9084
@jonathansum9084 3 года назад
on 42:05, I think (r-1) can not become (1-1/r).
@vishalpachpande5921
@vishalpachpande5921 2 года назад
Sir , where can I find these types of papers
@herp_derpingson
@herp_derpingson 4 года назад
The "From Graph to Equation" part of this paper is a bit disappointing. I was expecting some differentiable method. Also, I doubt much of this work can be generalized to non-particle system problems. . Isnt reducing a neural network to a equation, an extreme form of network pruning and distillation? . Symbolic regressions are more than just equations. Technically computer code is also a symbolic regression from user input to computer output. If we can build an automation to automate all automatons, that would be AI complete. It would also make every human unemployable.
@malse1234
@malse1234 4 года назад
These are good questions, thanks. "[symbolic] differentiable method" I wish there was a differentiable method of symbolic regression as well. Currently it seems like embedding discrete search spaces in a differentiable function is difficult. For now it seems best to learn subcomponents of the model using a NN, then approximate those subcomponents with traditional genetic programming. "[doubt generalization] to non-particle system problems" The Dark Matter simulation is not particle-based, it is a grid of densities. We look for "blobs" of dark matter (dark matter halos) and consider those to be the nodes in the GNN where the integrated density of that blob is the mass. More generally (which our work in the near future will show), the symbolic regression strategy can be applied to NNs other than graph networks, so long as you have a separable internal structure. We try to explain this in a bit more detail in the paper. "network pruning" The symbolic form gives you a few advantages: (i) interpretability, (ii) generalization, (iii) compactness. I think pruning could arguably only give you (ii)? Though using ReLU activations, you still only have linear extrapolation. We do have L1 regularization in our GNN, yet the symbolic form still generalizes better. It's very curious how simple symbolic equations generalize so well. Let me know if you have any other questions. Thanks, Miles
@herp_derpingson
@herp_derpingson 4 года назад
@@malse1234 It is rare for paper authors to visit. Thanks for clarifying.
@malse1234
@malse1234 4 года назад
@@herp_derpingson No problem, feel free to email me if you have any other questions. Cheers!
@teslaonly2136
@teslaonly2136 4 года назад
Please review this paper: Locally Masked Convolution for Autoregressive Models
@cw9249
@cw9249 Год назад
this is insane
@Tehom1
@Tehom1 4 года назад
Is co-author David Spergel the astronomer David Spergel? I thought it must be two guys with the same name until the topic of dark matter came up (astronomer Spergel's specialty).
@YannicKilcher
@YannicKilcher 4 года назад
I can imagine, but I don't know
@bluesky3149
@bluesky3149 4 года назад
Who helps you to read these papers?
@YannicKilcher
@YannicKilcher 4 года назад
I have a bunch of undergrads in the basement and they get a cookie for each video-script they produce
@bluesky3149
@bluesky3149 4 года назад
@@YannicKilcher haha i meant like are some professors involved who can give you more context or fill in some knowledge gaps?
@jryanconnelly
@jryanconnelly 4 года назад
Simply awesome, thank you. Tangential thought.... Evolution of ideas in humanity is an ordered heuristic dialogue (yeah I just made that up) that is sort of Bayesian in nature...? I dunno, seems like a way to frame Hegel's dialectic in a mathematical way sorta...
@herp_derpingson
@herp_derpingson 4 года назад
Are you a GPT?
@Phobos11
@Phobos11 4 года назад
Herp Derpingson exactly 🤣
@jryanconnelly
@jryanconnelly 4 года назад
@@herp_derpingson GPT?
@herp_derpingson
@herp_derpingson 4 года назад
​@@jryanconnelly Generative Pre-trained Transformer
@jasdeepsinghgrover2470
@jasdeepsinghgrover2470 4 года назад
Did anyone try this on double pendulum??
@herp_derpingson
@herp_derpingson 4 года назад
It wont work on a double pendulum as the as the acceleration components are not independent.
@jasdeepsinghgrover2470
@jasdeepsinghgrover2470 4 года назад
@@herp_derpingson right... But maybe the NN learns some approximation instead as normal reactions at joints are also forces and free body diagrams consider them. Maybe it learns some combined force instead.
@AbgezocktXD
@AbgezocktXD 4 года назад
Your Deltas are an abomination xD
@aBigBadWolf
@aBigBadWolf 4 года назад
Just before the Neurips blind-review process starts, the authors of this paper go through great lengths to publicize their work on social media with pretty pictures, an interactive demo, a nice blog post, and lots of vitamin-b people retweeting or sharing their work. Miles Cranmer, Alvaro Sanchez-Gonzalez, Peter Battaglia, Rui Xu, Kyle Cranmer, David Spergel, and Shirley Ho, you are trying to sway and influence the reviewers and are doing this shamelessly with full intention. This is unscientific and I will remember your names.
@malse1234
@malse1234 4 года назад
It is very true that double blind review and preprint servers seem to be opposites, despite being such a common practice to post to arxiv after submission. I wouldn't say preprint submission by researchers is generally to sway reviewers, but just to publicize work earlier. It's important to note that with only ~20% papers accepted to big ML conferences, annual submission deadlines, and the very fast pace of research, work might be out of date if one waits until work is finally published. I think this is why posting to arxiv before acceptance is so common in ML research. And in posting we would like many people to read it, hence the blog/etc. I'm not sure if there are solutions to the preprint trend in ML given the slow publication process contrasted against the fast research pace, but I'm curious if there are options.
@YannicKilcher
@YannicKilcher 4 года назад
Just my personal opinion: Double-blind review is half broken and I would like to see it completely broken and move to a new world where research happens in the open. So I'm more than happy when authors publicize their work and I appreciate them for sharing it as soon as it's ready, rather than six months in the future after three random people looked at it for 5 minutes on the toilet and decided on "weak accept"
@IamLupo
@IamLupo 4 года назад
I use Eureqa, this software has been made a long time ago and is based on evolution to give me formula on data. en.wikipedia.org/wiki/Eureqa
@jabowery
@jabowery 4 года назад
They actually used Eurequa: "We score complexity by counting the number of occurrences of each operator, constant, and input variable. We weight ^, exp, log, IF(·, ·, ·) as three times the other operators, since these are more complex operations. eureqa outputs the best equation at each complexity level, denoted by c."
@BlakeEdwards333
@BlakeEdwards333 4 года назад
Thank you!
Далее
Interpretable Deep Learning for New Physics Discovery
24:08
Review: Symbolic regression (Miles Cranmer)
1:05:18
Просмотров 1,9 тыс.
Why Does Diffusion Work Better than Auto-Regression?
20:18
Deep learning with dynamic graph neural networks
15:08
Hopfield Networks is All You Need (Paper Explained)
1:05:16
ChatGPT: 30 Year History | How AI Learned to Talk
26:55
Learning Symbolic Equations with Deep Learning
59:01
Просмотров 1,3 тыс.