Тёмный

Diffusion and Score-Based Generative Models 

MITCBMM
Подписаться 55 тыс.
Просмотров 70 тыс.
50% 1

Yang Song, Stanford University
Generating data with complex patterns, such as images, audio, and molecular structures, requires fitting very flexible statistical models to the data distribution. Even in the age of deep neural networks, building such models is difficult because they typically require an intractable normalization procedure to represent a probability distribution. To address this challenge, we consider modeling the vector field of gradients of the data distribution (known as the score function), which does not require normalization and therefore can take full advantage of the flexibility of deep neural networks. I will show how to (1) estimate the score function from data with flexible deep neural networks and efficient statistical methods, (2) generate new data using stochastic differential equations and Markov chain Monte Carlo, and even (3) evaluate probability values accurately as in a traditional statistical model. The resulting method, called score-based generative modeling or diffusion modeling, achieves record performance in applications including image synthesis, text-to-speech generation, time series prediction, and point cloud generation, challenging the long-time dominance of generative adversarial networks (GANs) on many of these tasks. Furthermore, score-based generative models are particularly suitable for Bayesian reasoning tasks such as solving ill-posed inverse problems, yielding superior performance on several tasks in medical image reconstruction.

Наука

Опубликовано:

 

30 июн 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 43   
@Blueshockful
@Blueshockful 9 месяцев назад
recommended to anyone who wants to understand beyond the mere "noising/denoising" type explanations on diffusion models
@opinions8731
@opinions8731 Год назад
What an amazing explanation. Wish there was an AI/authors explaining their papers so clearly.
@moaidali874
@moaidali874 10 месяцев назад
This one of the best presentations I have ever attended
@StratosFair
@StratosFair Год назад
Thank you guys for making this talk available on your RU-vid channel. This is pure gold
@jacksonyan7346
@jacksonyan7346 7 месяцев назад
It really shows how good the explanation is when even I can follow along. Thanks for sharing!
@binjianxin7830
@binjianxin7830 Год назад
What a pleasant insight to think of gradients of the logits as score function! Thank you for sharing the great idea.
@simong1666
@simong1666 Месяц назад
This is the best summarizing resource I have found on the topic. The visual aids are really helpful and the nature of the problem and series of steps leading to improved models, along with the sequence of logic are so clear. What an inspiration!
@LifeCodeGame
@LifeCodeGame Год назад
Amazing insights into generative models! Thanks for sharing this valuable knowledge!
@xuezhixie1640
@xuezhixie1640 10 месяцев назад
Really amazing explanation for the entire diffusion model. Clear, great, wonderful work.
@ck1847
@ck1847 Год назад
Extremely insightful lecture that is worth every minute of it. Thanks for sharing it.
@user-ux3wg1xj9s
@user-ux3wg1xj9s 7 месяцев назад
9:52 intractable to compute integral of exponential of neural networks 12:00 desiderata of deep generative models 19:00 the goal is to minimize fisher divergence between abla_x log(p(x_data)) and score function s(x), we don't know ground truth log(p(x_data)) but score matching is equivalent to fisher divergence up to the constant, thus same in the optimization perspective. 23:00 however, score matching is not scalable, greatly due to the Jacovian term. the term requires many times of backpropagation computations. Thus before computing fisher divergence, project each terms with vector v to make the Jacobian disappear, and thus become more scalable, this is called sliced score matching. 29:00 denoised score matching. The objective is tractable because we design the perturbation kernel by hand(the kernel is easily computable). However because of added noise, the denoised score matching can't estimate noise free distributions. Also the variance of denoising score matching objective becomes bigger and bigger eventually explodes when the smaller the magnitude of the noise. 31:20 in case of Gaussian perturbation kernel, denoising score matching problem takes more simpler form. Optimize the objective with stochastic gradient descent. Be careful to choose appropriate magnitude of sigma. 36:00 sampling from langevin dynamics, initialize x0 from simple(gaussian, uniform) distribution and z from N(0, 1), and then repeat the procedure. 37:20 naive version of langevin dynamics sampling not working well in practice because of the low density region
@CohenSu
@CohenSu Год назад
16:54 all papers referenced... This man is amazing
@mm_Tesla_mm
@mm_Tesla_mm 10 месяцев назад
I love this talk! amazing and clear explanation!
@peterpan1874
@peterpan1874 8 месяцев назад
a very accessible and amazing tutorial that explained everything clearly and thoroughly!
@coolguy69235
@coolguy69235 4 месяца назад
khatarnaak aadmi hain !! Very good explanation !!! sab ka saath, sab ka vikaas
@xiaotongxu1528
@xiaotongxu1528 5 месяцев назад
Very clear! Thanks for this amazing lecture!
@johnini
@johnini Месяц назад
I am 44 second in the talk and already wanna say thank you! :)
@hasesukkt
@hasesukkt 5 месяцев назад
Amazing explation for me to understand the diffusion model!
@uxirofkgor
@uxirofkgor Год назад
damn straight from yang song...
@YashBhalgat
@YashBhalgat Год назад
46:30 was a true mic-drop moment from Yang Song 😄
@dy8576
@dy8576 5 месяцев назад
Such an incredible talk, i was just curious about how everyone here keeps track of all this knowledge, would love to hear from you all
@tomjiang6831
@tomjiang6831 2 месяца назад
literally the best explaination!!!
@chartingwithliv
@chartingwithliv Год назад
Crazy this is available for free!! ty
@hoang_minh_thanh
@hoang_minh_thanh 7 месяцев назад
This is amazing!
@user-zr4ns3hu6y
@user-zr4ns3hu6y 2 месяца назад
Such a great explanation
@huseyintemiz5249
@huseyintemiz5249 10 месяцев назад
Amazing tutorial
@twistedsector
@twistedsector 4 месяца назад
actually such a fire talk
@gaspell
@gaspell 5 месяцев назад
Thank you!
@adrienforbu5165
@adrienforbu5165 6 месяцев назад
Thank you so much song
@user-ki1jl1fb5d
@user-ki1jl1fb5d 9 месяцев назад
Oh, very clear explanation! Would it be possible to share this slide?
@JuliusSmith
@JuliusSmith 3 месяца назад
Excellent overview of excellent work, thank you! I am worried about simplified CT scans, however. Wouldn't we get bias toward priors when we're looking instead for anomalies? There needs to be a way to detect all abnormalities with 100% reliability while still reducing radiation. Is this possible?
@beluga.314
@beluga.314 10 месяцев назад
When we can obatin equivalence between DDPM(training network to obtain noise) and score based training in DDPM, then shouldn't both give same kind of results?
@RoboticusMusic
@RoboticusMusic 10 месяцев назад
What is now utilizing this, is it still SoTA? Did this improve OpenAI, MJ, Stability, etc.? It sounds promising but I need updated information.
@julienblanchon6082
@julienblanchon6082 8 месяцев назад
Wow that was cristal clear
@pengjunlu
@pengjunlu Месяц назад
why solving "maximize likelyhood" problem is equal to solve "explicit score matching " problem? For example, once you get S(x, theta), you do get corresponding p(x); but is it the same P(x; theta) where theta maximize likelyhood?
@mariuswuyo8742
@mariuswuyo8742 3 месяца назад
I have a question about Part "prpo. evaluation": does it mean to use ODE to calculate the likelihood (P_theta(x_0)), but how to input the original data x_0 to the diffusion model?
@yuxinzhang4228
@yuxinzhang4228 11 месяцев назад
Why annealed langevin dynamics from the highest noise level to the lowest noise level instead of langevin dynamics sampling directly from the score model with the lowest noise level?
@DrumsBah
@DrumsBah 6 месяцев назад
You use the perturbed noise to traverse and converge to high density areas via Langevin dynamics. Due to the manifold hypothesis large areas of the data space have no density and thus no gradient for langevin to traverse. The large noise is traverse these spaces. Once closer to the higher density, the noise schedule can be decreased and the process repeats
@johnini
@johnini Месяц назад
Now I need some GPUs
@jimmylovesyouall
@jimmylovesyouall 6 месяцев назад
stanford还是牛逼,谢谢。
@timandersen8030
@timandersen8030 3 месяца назад
How to be good in math like Dr. Song?
@user-xb8xz2oo5c
@user-xb8xz2oo5c 2 месяца назад
27:44
@clairewang8370
@clairewang8370 21 день назад
Далее
Why Does Diffusion Work Better than Auto-Regression?
20:18
HOW it Works: Deep Neural Operators (DeepONets)
22:15
Просмотров 1,3 тыс.
The U-Net (actually) explained in 10 minutes
10:31
Просмотров 84 тыс.
How I Understand Diffusion Models
17:39
Просмотров 23 тыс.
Diffusion Models | Paper Explanation | Math Explained
33:27
Diffusion Models for Inverse Problems
42:09
Просмотров 14 тыс.
Will the battery emit smoke if it rotates rapidly?
0:11
Mac Studio из Китая 😈
0:34
Просмотров 173 тыс.