Тёмный

Why Does Diffusion Work Better than Auto-Regression? 

Algorithmic Simplicity
Подписаться 24 тыс.
Просмотров 109 тыс.
50% 1

Have you ever wondered how generative AI actually works? Well the short answer is, in exactly the same as way as regular AI!
In this video I break down the state of the art in generative AI - Auto-regressors and Denoising Diffusion models - and explain how this seemingly magical technology is all the result of curve fitting, like the rest of machine learning.
Come learn the differences (and similarities!) between auto-regression and diffusion, why these methods are needed to perform generation of complex natural data, and why diffusion models work better for image generation but are not used for text generation.
The following generative models were featured as demos in this video:
Images: Adobe Firefly (www.adobe.com/products/firefl...)
Text: ChatGPT (chat.openai.com)
Audio: Suno.ai (suno.ai)
Code: Gemini (gemini.google.com/app)
Video: Lumiere (Lumiere-video.github.io)
Chapters:
00:00 Intro to Generative AI
02:40 Why Naïve Generation Doesn't Work
03:52 Auto-regression
08:32 Generalized Auto-regression
11:43 Denoising Diffusion
14:19 Optimizations
14:30 Re-using Models and Causal Architectures
16:35 Diffusion Models Predict the Noise Instead of the Image
18:19 Conditional Generation
19:08 Classifier-free Guidance

Опубликовано:

 

27 май 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 155   
@doku7335
@doku7335 4 дня назад
At first I thought "oh, another random video explaining the same basics and not adding anything new", but I was so wrong. It's an incredibly clear explanation of diffusion, and the start with the basic makes the full picture much clearer. Thank you for the video!
@jupiterbjy
@jupiterbjy 6 дней назад
kinda sorry to my professors and seniors but this is the single best explanation of logics behind each models. About dozen min vid > 2 years of confusion in univ
@algorithmicsimplicity
@algorithmicsimplicity 3 месяца назад
Next video will be on Mamba/SSM/Linear RNNs!
@benjamindilorenzo
@benjamindilorenzo 2 месяца назад
great! Also maybe think about the Tradeoff between scaling and incremental improvements, in case your perspective is, that LLM´s also always approximate the data set and therefore memorize rather than any "emergent capabilities". So that ChatGPT also does "only" curve fitting.
@harshvardhanv3873
@harshvardhanv3873 10 дней назад
I am student who is pursuing a degree in ai and we want more of your videos for even simplest of the concepts in ai, trust me this channel will be a huge deal in the near future, good luck!!
@QuantenMagier
@QuantenMagier 11 часов назад
Well take my subscription then!!1111
@user-my3dd4lu2k
@user-my3dd4lu2k Месяц назад
Man I love the fact that you present the fundamental idea with an Intuitionistic approach, and then discuss the optimization.
@pseudolimao
@pseudolimao 4 дня назад
this is insane. I feel bad for getting this level of content for free
@user-fh7tg3gf5p
@user-fh7tg3gf5p 3 месяца назад
This genius only makes videos occassionally, that are not to be missed.
@justanotherbee7777
@justanotherbee7777 3 месяца назад
absolutely true
@pw7225
@pw7225 5 дней назад
The way you tell the story is fantastic! I am surprised that all AI/ML books are so terrible at didactics. We should always start at the intuition, the big picture, the motivation. The math comes later when the intuition is clear.
@yqisq6966
@yqisq6966 11 дней назад
The clearest and most concise explanation of diffusion model I've seen so far. Well done.
@Veptis
@Veptis 3 дня назад
This is a great explanation on how image decoders work. I haven't seen this approach and narrative direction yet. This now makes my reference for explaining it to people that got no idea.!
@jasdeepsinghgrover2470
@jasdeepsinghgrover2470 16 дней назад
This is a much better explanation than the diffusion paper itself. They just went all around variational inference to get the same result!
@rafa_br34
@rafa_br34 13 дней назад
Such an underrated video, I love how you went from the basic concepts to complex ones and didn't just explain how it works but also the reason why other methods are not as good/efficient. I will definitely be looking forward to more of your content!
@Jack-gl2xw
@Jack-gl2xw 10 дней назад
I have trained my own diffusion models and it required me to do a deep dive of the literature. This is hands down the best video on the subject and covers so much helpful context that makes understanding diffusion models so much easier. I applaud your hard work, you have earned a subscriber!
@RicardoRamirez-dr6gc
@RicardoRamirez-dr6gc 11 дней назад
This is seriously one of the best explainer videos i've ever seen. I've spent a long time trying to understand diffusion models and not a single video has come close to this one
@benjamindilorenzo
@benjamindilorenzo 2 месяца назад
Very good job. My suggestion is that you explain more about how it actually works, that the model learns to understand complete sceneries just from text prompts. This could fill its own video. Also it would be very nice to have a video about Diffusion Transformers like OpenAIs Sora probably is. Also it could be great to have a Video about the paper "Learning in High Dimension Always Amounts to Extrapolation". best wishes
@algorithmicsimplicity
@algorithmicsimplicity 2 месяца назад
Thanks for the suggestions, I was planning to make a video about why neural networks generalize outside their training set from the perspective of algorithmic complexity. That paper "Learning in High Dimension Always Amounts to Extrapolation" essentially argues that the interpolation vs extrapolation distinction is meaningless for high dimensional data, and I agree, I don't think it is worth talking about interpolation/extrapolation at all when explaining neural network generalization.
@benjamindilorenzo
@benjamindilorenzo 2 месяца назад
@@algorithmicsimplicity yes true. It would be great also because this links back to the LLM´s discussions, wether scaling up Transformers actually brings up "emergent capabilities", or if this is simple and less magical explainable by extrapolation. Or in other words: either people tend to believe, that Deep Learning Architectures like Transformers only approximating their training data set, or people tend to believe, that seemingly unexplainable or unexpected capabilities emerge while scaling. I believe, that extrapolation alone explains really good why LLM´s work so well, especially when scaled up AND that LLM´s "just" approximate their training data (curve fitting). This is why i brought this up ;)
@Frdyan
@Frdyan 2 дня назад
I have a graduate degree in this shit and this is by far the clearest explanation of diffusion I've seen. Have you thought about doing a video running over the NN Zoo? I've used that as a starting point for lectures on NN and people seem to really connect with that paradigm
@shivamkaushik6637
@shivamkaushik6637 5 часов назад
Never knew youtube could give random suggestion to videos like these. This was mind blowing. The way you teach is work of art.
@HD-Grand-Scheme-Unfolds
@HD-Grand-Scheme-Unfolds 15 дней назад
You truly understand how to simplify... to engage our imagination... to employ naive thought or ideas to make comparisons to bring across a deeper more core principles and concepts to make the subject for more easier to grasp and get an intuition for. Algorithmic Simplicity indeed... thank you for your style of presentation and teaching. love it love it... you make me know what question I want to ask but didn't know I wanted to ask. RU-vid needs your contribution in ML education. please don't forget that.
@karlnikolasalcala8208
@karlnikolasalcala8208 9 дней назад
This channel is gold, I'm glad I've randomly stumbled across one of your vids
@justanotherbee7777
@justanotherbee7777 3 месяца назад
A person with very less background can understand what he describes here.. commenting to make youtube so it gets recommended for other .. wonderful video! really good one
@CodeMonkeyNo42
@CodeMonkeyNo42 7 дней назад
Great video. Love the pacing and how you distiled the material into such an easy to watch video. Great job!
@MeriaDuck
@MeriaDuck День назад
This must be one of the best and concise explanations I've seen!
@Matyanson
@Matyanson 6 дней назад
Thank you for the explanation. I already knew a little bit about diffusion but this is exactly the way I'd hope to learn. Start from the simplest examples(usually historical) and progresivelly advance, explaining each optimisation!
@jcorey333
@jcorey333 3 месяца назад
This is an amazing quality video! The best conceptual video on diffusion in AI I've ever seen. Thanks for making it! I'd love to see you cover RNNs.
@banana_lemon_melon
@banana_lemon_melon 7 дней назад
bruh, I loved your contents. Other channel/video usually explain general knowledge that can be easily found on internet. But you're going deeper to the intrinsic aspects of how the stuff works. This video, and one of your video about transformer, are really good.
@anthonybernstein1626
@anthonybernstein1626 21 день назад
I had a good idea how diffusion models work but I still learned a lot from this video. Thanks!
@mrdr9534
@mrdr9534 6 дней назад
Thanks for taking the time and effort of making and sharing these videos and Your knowledge. Kudos and best regards
@JordanMetroidManiac
@JordanMetroidManiac 6 дней назад
I finally understand how models like Stable Diffusion work now! I tried understanding them before but got lost at the equation (17:50), but this video describes that equation very simply. Thank you!
@xaidopoulianou6577
@xaidopoulianou6577 10 дней назад
Very nicely and simply explained! Keep it up
@iestynne
@iestynne 6 дней назад
Wow, fantastic video. Such clear explanations. I learned a great deal from this. Thank you so much!
@ecla141
@ecla141 2 дня назад
Awesome video! I would love to see a video about graph neural networks
@abdelhakkhalil7684
@abdelhakkhalil7684 8 дней назад
This was a good watch, thank you :)
@tkimaginestudio
@tkimaginestudio День назад
Great explanations, thank you!
@1.4142
@1.4142 3 месяца назад
Some2 really brought out some good channels
@RobotProctor
@RobotProctor 11 дней назад
I like to think of ML as a funky calculator. Instead of a calculator where you give it inputs and an operation and it gives you an output, you give it inputs and outputs and it gives you an operation. You said it's like curve fitting, which is the same thing, but I like thinking the words funky calculator because why not
@user-yj3mf1dk7b
@user-yj3mf1dk7b 9 дней назад
nice explanations, although, i've already knew about diffusion. examples from simplest to final diffusion -- were a really nice touch.
@sanjeev.rao3791
@sanjeev.rao3791 2 дня назад
Wow, that was a fantastic explanation.
@iancallegariaragao
@iancallegariaragao 3 месяца назад
Great video and amazing content quality!
@akashmody9954
@akashmody9954 3 месяца назад
Great video....already waiting for your next video
@ShubhamSinghYoutube
@ShubhamSinghYoutube День назад
Love the conclusion
@anatolyr3589
@anatolyr3589 Месяц назад
Great explanation!👍👍, I personally would like to see a video observing all major types of neural nets with their distinctions, specifics, advantages, disadvantages etc. the author explains very well 👏👏
@user-er9pw4qh6j
@user-er9pw4qh6j 18 дней назад
Soooo Good!!! Thanks for making it!!!!
7 дней назад
I think it would help to mention that the auto-regressors may be viewing the image as a sequence of pixels (RGB vectors). Overall excellent video, extremely intuitive.
@algorithmicsimplicity
@algorithmicsimplicity 7 дней назад
In general, auto-regressors do not view images as a sequence. For example, PixelCNN uses convolutional layers and treats inputs as 2d images. Only sequential models such as recurrent neural networks would view the image as a sequence.
7 дней назад
@@algorithmicsimplicity of course, but I feel mentioning it may help with intuition as you’re walking through pixel by pixel image generation
@Mhrn.Bzrafkn
@Mhrn.Bzrafkn 11 дней назад
It was too easy understanding👌🏻👌🏻
@paaabl0.
@paaabl0. 7 дней назад
Great video! Focus on the right elements.
@vijayaveluss9098
@vijayaveluss9098 9 дней назад
Great explanation
@RobotProctor
@RobotProctor 11 дней назад
Thank you. This video is wonderful
@zephilde
@zephilde 13 дней назад
Great visualisation! Good job! Maybe next video on LoRA or ControlNet ?
@algorithmicsimplicity
@algorithmicsimplicity 13 дней назад
Great suggestions, I will put them on my TODO list.
@banana_lemon_melon
@banana_lemon_melon 7 дней назад
+1 for LoRA
@marcinstrzesak346
@marcinstrzesak346 16 дней назад
Very good video. Thank you
@khangvutien2538
@khangvutien2538 9 дней назад
Thank you very much. I enjoyed the first part, the first 10 seconds. After, there are too any shortcuts in the explanations that I struugled to understand and be able to explain it again to myself. Still, I subscribed. As for suggestions for other videos, I'll check whether you have explained the U-Net already. If not I'd appreciate to have the same kind of explanation about it.
@psl_schaefer
@psl_schaefer День назад
Amazing video!
@joaosousapinto3614
@joaosousapinto3614 14 дней назад
Great video, congrats.
@ollie-d
@ollie-d 3 дня назад
Solid video!
@mojtabavalipour
@mojtabavalipour 7 дней назад
Well done!
@demohub
@demohub 8 дней назад
Just subscribed. Great video
@meanderthalensis
@meanderthalensis 9 дней назад
Great video!
@AurL_69
@AurL_69 11 дней назад
thanks for explaining
@hmmmza
@hmmmza 3 месяца назад
what a great rare content!
@pon1
@pon1 10 дней назад
Still feels like magic to me 🙌🙌
@ArtOfTheProblem
@ArtOfTheProblem 13 дней назад
great work
@johnbolt2686
@johnbolt2686 7 дней назад
I would recommend reading about active inference to possibly understand the role of generative models in intelligence.
@oculuscat
@oculuscat 10 дней назад
Diffusion doesn't necessarily work better than auto-regression. The "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction" paper introduces an architecture they call VAR that upscales noise using an AR model and this currently out-performs all diffusion models in terms of speed and accuracy.
@winstongraves8321
@winstongraves8321 10 дней назад
Great video
@ChristProg
@ChristProg 15 дней назад
Thank you So much Sir. Really interesting video. But i will like you to create a video on how the generative model uses the text promt during training. Thank you Sir. I subscribed !😊
@infographie
@infographie 8 дней назад
Excellent.
@mallow610
@mallow610 9 дней назад
Video is a banger
@aydr5412
@aydr5412 7 дней назад
Thank you for the video. Imao curve fitting is oversimplification, it destructs us from real problem - what and how being optimized. Also there is different perspective on cases there we prefer computational efficiency over training quality: with efficiency you can train model on more data and for longer time using same amount of computational resources which actually results in better model
@johnmorrell3187
@johnmorrell3187 5 дней назад
Curve fitting is optimization so I'd say the two explanations are equivalent. While it's true that a more efficient method -> longer training -> better behavior, it's also true that if compute and time really were not a limiting factor then these less efficient methods would give better final performance.
@craftydoeseverything9718
@craftydoeseverything9718 16 часов назад
This was genuinely such a great video. I honestly feel like I could come away from this video and implement an image generator myself :) /gen
@kubaissen
@kubaissen 3 месяца назад
Nice vid thx
@zacklee5787
@zacklee5787 5 дней назад
Not sure I agree with some of your analysis here. The strength of diffision models doesn't come from the lower depedence of objects/pixels the model generates at once. In fact, as you mention, the model actually predicts a whole image, in practice, at every step. Even when you use the trick of predicting the noise, the noise is unintuitively not random, that is, not randomly generated, but actually depends completely on the noise or lack there of in the input. It is after all equivalent to predicting the whole image. The real strength comes from the incremental nature, that is, a step of the model further down the line can "fix" a mistake it made previously by interpreting the previous generation as noise. In the space of all say 1024x1024 pixel value combinations, there is a manifold (essentially a subset of close together images) of all target images we want to generate. The diffusion model learns to take incremental steps toward that subset of "reasonable" images from any random starting point.
@algorithmicsimplicity
@algorithmicsimplicity 5 дней назад
The noise is absolutely randomly generated. The reason the model can predict the noise (or equivalently image) is because it receives both the noise and image as input. If it was the case that the incremental nature helped, then I would expect diffusion models to generate higher quality outputs than auto-regressors, but this isn't the case. Auto-regressors generate higher quality outputs (e.g. arxiv.org/abs/2205.13554 ), they just take longer to run. If it was the case that NN are unable to give correct predictions on the first go, we would see the opposite, that diffusion models can correct previous generations and thereby achieve higher quality. Also see LLM which have no difficulty generating perfect outputs in one pass. Diffusion models only learn to take steps toward the data distribution starting at the standard normal distribution (origin).
@frommarkham424
@frommarkham424 10 дней назад
That was exactly how i guessed they did
@IceMetalPunk
@IceMetalPunk 9 дней назад
And the newest/upcoming models seem to be tending more towards diffusion Transformers, which from my understanding is effectively a Transformer autoencoder with a diffusion model plugged in, applying diffusion directly to the latent space embeddings. Is that correct?
@HyperFocusMarshmallow
@HyperFocusMarshmallow 8 дней назад
A funny thing about watching a video like this is that you see an artificial neural network produce an image and then you have another layer of neural network in the brain that tries to figure out if it was a good match or not. The so called “blurry noise” could in principle look like a good match to someone and a bad match to someone else depending on how their own categorization works. It could also be good for everyone and bad for everyone of course or some arbitrary mix along that scale. The point is that “looks like blury noise” risks being a quite unobjective statement. I mean, people see images in the clouds and so on.
@Blooper1980
@Blooper1980 11 дней назад
Finally I understand!
@MilesBellas
@MilesBellas 10 дней назад
via Pi "Diffusion models and auto-regressive (AR) models are two popular approaches for generating images and other types of data. They differ in their fundamental techniques, generation time, and output quality. Here's a brief comparison: **Diffusion Models:** * Approach: Diffusion models are based on the idea of denoising images iteratively, starting from a noisy input and gradually refining it into a high-quality output. * Generation Time: Diffusion models are generally faster than AR models for image generation, especially when using optimizations like "asymmetric step" or Cascade models. * Output Quality: Diffusion models are known for generating high-quality and diverse images, especially when trained on large datasets like Stable Diffusion or DALL-E 2. They can capture various styles and generate coherent images with intricate details. **Auto-Regressive (AR) Models:** * Approach: AR models generate images pixel by pixel, conditioning each new pixel on previously generated pixels. This sequential approach makes AR models computationally expensive, especially for large images. * Generation Time: AR models tend to be slower than diffusion models due to their sequential nature. The generation time can be significantly longer for high-resolution images. * Output Quality: While AR models can produce high-quality images, they may struggle with capturing diverse styles or maintaining coherence across different image regions. They might require additional techniques, like classifier-free guidance or super-resolution, to achieve better results. In summary, diffusion models generally offer faster generation times and better output quality compared to AR models. However, both approaches have their strengths and limitations, and the choice between them depends on the specific use case, available computational resources, and desired generation speed and output quality."
@yk4r2
@yk4r2 День назад
Hey, could you kindly recommend more on causal architectures?
@algorithmicsimplicity
@algorithmicsimplicity День назад
I haven't seen any material that cover them really well. There are basically 2 types of causal architectures, causal CNNs and causal transformers, with causal transformers being much more widely used in practice now. Causal transformers are also known as "decoder only transformers" ("encoders" uses regular self-attention layers, "decoders" use causal self-attention). If you search for encoder vs decoder-only transformers you should find some resources that explain the difference. Basically, to make a self-attention layer causal you mask the attention scores (i.e. set some to 0), so that words can only attend to words that came before them in the input. This makes it so that every word's vector only contains information from before it. This means you can use every word's vector to predict the word that comes after it, and it will be a valid prediction because that word's vector never got to attend (i.e. see) anything after it. So, it is as if you had applied the transformer to every subsequence of input words, except you only had to apply it once.
@muhammadaneeqasif572
@muhammadaneeqasif572 4 дня назад
can you please share the code that ubused for generation of the images in the demo. it will be very helpful
@recklessroges
@recklessroges 2 дня назад
Could you explain why the YOLO image classify is/was so effective? Thank you.
@hjups
@hjups 3 месяца назад
Do you have a citation that supports your claim for eps vs x0 prediction? It's true that the first sampling step with x0 tends to produce a blurry / averaged result, but that's a result of the loss function used when training DDPMs. If you were to use something more complex or another NN, then you'd have a GAN, which don't produce blurry or averaged results on a single forward pass. Also, if you examine the output of x0 = noise - eps for the first step, it's both mathematically and visually equivalent to the first x0 prediction sample - a blurry / averaged result. The same thing is also true when predicting velocity, but velocity is arguably harder for a network to predict due to the phase transition.
@alirezaghazanfary
@alirezaghazanfary День назад
thanks to very good video I have a question: can't we make a model that decrease the resolution of a picture (for example a 4*4 picture to a 2*2 and to 1*1 picture) and run it reverse (generate a 2*2 from 1*1 and 4*4 from 2*2) ? would this model works?
@algorithmicsimplicity
@algorithmicsimplicity День назад
Yes you absolutely could, and according to this paper: arxiv.org/abs/2404.02905v1 it works pretty well.
@alex65432
@alex65432 3 месяца назад
Can you make a video about the loss landscape.Like what effects do different weight inits. Optimizers or architectures like resnet have.
@algorithmicsimplicity
@algorithmicsimplicity 3 месяца назад
Thanks for the interesting suggestion! I was already planning to do a video about why neural networks generalize outside of their training set, I should be able to talk about the loss landscape in that video.
@IsaOzer-lx7sn
@IsaOzer-lx7sn 19 часов назад
I want to learn more about the causal architecture idea for auto regressors, but I can't seem to find anything about them anywhere. Do you know where I can read more about this topic?
@algorithmicsimplicity
@algorithmicsimplicity 19 часов назад
I haven't seen any material that cover them really well. There are basically 2 types of causal architectures, causal CNNs and causal transformers, with causal transformers being much more widely used in practice now. Causal transformers are also known as "decoder only transformers" ("encoders" uses regular self-attention layers, "decoders" use causal self-attention). If you search for encoder vs decoder-only transformers you should find some resources that explain the difference. Basically, to make a self-attention layer causal you mask the attention scores (i.e. set some to 0), so that words can only attend to words that came before them in the input. This makes it so that every word's vector only contains information from before it. This means you can use every word's vector to predict the word that comes after it, and it will be a valid prediction because that word's vector never got to attend (i.e. see) anything after it. So, it is as if you had applied the transformer to every subsequence of input words, except you only had to apply it once.
@iwaniw55
@iwaniw55 16 часов назад
Hi @algorithmicsimplicity, I am curious which papers/material did you reference for the general autogressor? I cannot seem to find any info on using random spaced out pixels to predict the next batch of pixels. Any help would be appreciated. Also great videos!!!
@algorithmicsimplicity
@algorithmicsimplicity 15 часов назад
It is more widely known as "any-order autoregression", see e.g. this paper arxiv.org/abs/2205.13554
@iwaniw55
@iwaniw55 15 часов назад
@@algorithmicsimplicity Thank you so much! This is exactly what I was missing.
@EricPham-gr8pg
@EricPham-gr8pg 8 дней назад
Use lense projector and -zoom will save all the msthematical brain picking In video we use ccd cell in camera instantly illuminate LED pixel then zoom it down to tiny dot then send to ram and display on monitor by zoom factor corespond to resolutiom allow and zoom it back down when store it in time line of each coordinate and add all up with address and time then when unfold all we need is tiny dot first frame and last frame then start by last frame unfold into buffer subtract time but must adjust to phase angle of time at closest to last frame and just less tine drive with appropriate speed of each time axis so memory is so small
@quickdudley
@quickdudley 5 дней назад
My brain misinterpreted the title as "Why diffusers work better than autoencoders" (I believe because the noising process works rather like data augmentation)
@duytdl
@duytdl 9 дней назад
So why isn't diffusion better for text? Also are you saying that auto-regression is only bad because it's expensive to do (serially)? Or is diffusion fundamentally better for images?
@algorithmicsimplicity
@algorithmicsimplicity 9 дней назад
Auto-regression is only bad because it is slow, it produces better generations for both text and images. For text, there aren't that many tokens that you need to generate, so you can just use auto-regression: it gives better results. For images, you are forced to use something faster, and diffusion is much faster while producing nearly as good generations.
@turhancan97
@turhancan97 12 дней назад
Is the idea at the beginning of the video (auto regression image generation) self supervised learning?
@algorithmicsimplicity
@algorithmicsimplicity 12 дней назад
Technically yes, self supervised learning just means that the labels used to train the model were created automatically from the data itself, instead of by a human. So yes both auto-regression and diffusion are self-supervised learning, since they automatically create masked/noised inputs and use the clean image as labels. Though usually when people refer to self-supervised learning specifically they mean self-supervised but not generative, so things like simCLR or contrastive learning.
@turhancan97
@turhancan97 11 дней назад
@@algorithmicsimplicity I understand. Thanks a lot :)
@hamzaumair7909
@hamzaumair7909 Месяц назад
I love your eplanations especially transfomers. Although this one imo could have been better, I think you are missing some ideas that should have been explained.
@algorithmicsimplicity
@algorithmicsimplicity Месяц назад
Thanks for the feedback, any ideas in particular that you think should have been explained?
@agustinbs
@agustinbs 9 дней назад
This video is better than go to the MIT for machine learning degree. Man this is gold, thank you so much
@akashmody9954
@akashmody9954 3 месяца назад
Can you recommend some sources that i can follow if i want to do deeper into diffusion models and transformers?
@akashmody9954
@akashmody9954 3 месяца назад
I tried to go through the research papers but the math is overwhelming
@algorithmicsimplicity
@algorithmicsimplicity 3 месяца назад
​@@akashmody9954 If you just want to learn how to train/use them, I'd highly recommend the fast.ai course by Jeremy Howard, it will give you practical experience using them. If you want to do research/develop new methods then I'm afraid there isn't any better option than just reading the papers. Although if code is available I sometimes find it easier to just read the code than the paper lol.
@akashmody9954
@akashmody9954 3 месяца назад
@@algorithmicsimplicity alright.....thanks a lot man, and loving your videos as always
@joshjohnson259
@joshjohnson259 3 дня назад
If this explanation is too advanced for me how would you recommend I learn enough to be able to grasp these concepts? Can you direct me to some content that is one level down in complexity so I can see if that would be my starting point in understanding how these models work? I don’t really have any CS background.
@algorithmicsimplicity
@algorithmicsimplicity 3 дня назад
If you just want to learn how to train/use these models, I would highly recommend the fast.ai course by Jeremy Howard (course.fast.ai/ ). You can also look at 3blue1brown's videos on neural networks and transformers which are aimed at a general audience, and Andrej Karpathy's videos on implementing a transformer from scratch for a more detailed walkthrough of the models.
@bj_
@bj_ 4 минуты назад
Wait... so that meme of the missile guidance system that only knows where it is by first calculating all the places it isn't, actually applies to diffusion image generation too?
@klaushermann6760
@klaushermann6760 7 дней назад
Now we know they're not only predictors.
@sichengmao4038
@sichengmao4038 6 дней назад
can you explain why for diffusion model, there's no causal architecture? 16:26
@algorithmicsimplicity
@algorithmicsimplicity 6 дней назад
Basically its because NN layers accumulate information from multiple input features into one feature's vector. By making the layer only take in information from features before it in the AR order, you get a causal architecture with the same size as the original model. For diffusion, you could in principle make a causal architecture, but you would need to make a feature vector for every feature in every step of the noising process. i.e. the size of the model would need to be increased by a factor equal to the number of denoising steps, which isn't practical.
@sichengmao4038
@sichengmao4038 6 дней назад
@@algorithmicsimplicity don't quite understand why "the model size is increased by the number of denoising steps". What I imagine is, if we make an analogy to language model like Transformer, we now have a series of tokens (where each token is indeed a noisy image in the noising process), then we can still parallelize along the sequence dimension, isn't it?
@algorithmicsimplicity
@algorithmicsimplicity 6 дней назад
@@sichengmao4038 You could do that, the problem is how you convert the entire image into a token. Usually in order to convert an image into a feature vector, you need to apply a full-sized neural network. So to get your noisy image tokens you need to apply a NN for each noising step.
@akashmody9954
@akashmody9954 3 месяца назад
Can you make a video on how SORA by OpenAI works, what kind of architecture does it follow
@algorithmicsimplicity
@algorithmicsimplicity 3 месяца назад
Unfortunately OpenAI does not publicly release details on their architectures, they only said it was a transformer based diffusion model. This thread had some speculation on the exact architecture though: threadreaderapp.com/thread/1758433676105310543.html
@assgoblin3981
@assgoblin3981 2 месяца назад
Assgoblin approves of this content
@JoeJoeTater
@JoeJoeTater День назад
18:10 This is wrong. The average of a bunch of noisy images is a less-noisy image. (See "regression towards the mean") You'd have to normalize that averaged image.
@algorithmicsimplicity
@algorithmicsimplicity День назад
Right, I should have been more careful with my usage of the word "noisy". If you average a bunch of samples from a normal distribution, the result is a sample with less variance (i.e. less noisy). What I meant to say was the probability of the average under the normal distribution is higher (i.e. the result is closer to the origin). So the average still lies within the data manifold (as opposed to images, where the average moves outside the data manifold).
@fayezsalka
@fayezsalka 13 часов назад
Yes, that was very confusing to me too. The average of a bunch of random noise samples is 0.5, which is the mean. You would literally get a smooth grey image. Not “noise” image as shown in the video
@craftydoeseverything9718
@craftydoeseverything9718 16 часов назад
17:58 btw, you wrote "nose", instead of "noise"
@algorithmicsimplicity
@algorithmicsimplicity 16 часов назад
So I did. Surprised no-one else mentioned it yet lol.
@dubfather521
@dubfather521 7 дней назад
So denoising models work by predicting the clean image, and then to get the next step you noise its already clean output??? That doesn't make any sense. If it predicts the final image already why do you have to keep predicting.
@algorithmicsimplicity
@algorithmicsimplicity 6 дней назад
The first time it predicts the clean image, it will not produce a good image, it will produce a blurry mess (because it will average over all of the training images). You then add noise to this blurry mess and you get an image that is almost pure noise, with a little but of structure from the original blurry mess. Then you use that as input and predict a clean image again, this time the produced image will be slightly sharper, because now the model is only averaging over all inputs which are consistent with the blurry structure from the first step. You repeat this many times, at each step the produced image gets sharper because more detail is left from the previous step.
@dubfather521
@dubfather521 6 дней назад
@@algorithmicsimplicity ohhhhhh
@glaubherrocha2935
@glaubherrocha2935 19 часов назад
a fixed pixel with random color wouldn't make it work?
@algorithmicsimplicity
@algorithmicsimplicity 19 часов назад
I'm not sure what you are asking, can you elaborate?
@cognitive-carpenter
@cognitive-carpenter 10 дней назад
Enjoyed I think is the wrong output
Далее
But what is a neural network REALLY?
11:17
Просмотров 61 тыс.
Я ДРОЖАЛ ПОСЛЕ ЭТОГО...
16:24
Просмотров 401 тыс.
And this year's Turing Award goes to...
15:44
Просмотров 87 тыс.
I Made a Neural Network with just Redstone!
17:23
Просмотров 180 тыс.
Why Photorealistic And Stylized Graphics Are The Same
35:00
Cones are MESSED UP - Numberphile
18:53
Просмотров 60 тыс.
The Next Generation Of Brain Mimicking AI
25:46
Просмотров 73 тыс.
The Most Important Algorithm in Machine Learning
40:08
Просмотров 216 тыс.
Has Generative AI Already Peaked? - Computerphile
12:48
Stable Diffusion explained (in less than 10 minutes)
9:56
Я ДРОЖАЛ ПОСЛЕ ЭТОГО...
16:24
Просмотров 401 тыс.