Тёмный

How Diffusion Models Work 

Leo Isikdogan
Подписаться 27 тыс.
Просмотров 6 тыс.
50% 1

Опубликовано:

 

13 сен 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 31   
@user-uc8nn9kf8l
@user-uc8nn9kf8l 6 месяцев назад
you have explained so well, i have seen so many videos , but, the way you explain from start to end is vary relative to what we are learning. very very good explanation about stable diffusion work-flow.
@hardslaps
@hardslaps 11 месяцев назад
I'm sure you get a lot of comments like this, but I've been binging MKBHD vids and saw him recommend your video about compression. I watched it and was so impressed by how well you explained it and am equally impressed with this one. Especially to someone like me who has very basic understanding of the concepts. Can't wait to binge more of your videos now! Subscribed 😃
@leoisikdogan
@leoisikdogan 11 месяцев назад
Thanks for your kind words and for subscribing! Being recommended by MKBHD was such a nice surprise. I hope you enjoy the rest of the content! 😃
@nguyengiorno9026
@nguyengiorno9026 9 месяцев назад
​@@leoisikdoganhow can i donate to support
@LilianBoulard
@LilianBoulard Год назад
Cristal clear explanation, thanks a lot! With the recent release of Meta's SAM, I was wondering it was feasible to make an improved text embedding model (i.e., CLIP) by, instead of classifying the image with a sentence, creating bounding boxes and applying a mask with different weights to indicate exactly where's a specific object in the image. For example, in the image with the white dog on the beach, for the description "samoyed dog", pixels "making up" the dog would have a weight of 1.0, while others would have a weight of 0. I'd be interested to know what you think, I'm quite unfamiliar with how these embedding models work :)
@leoisikdogan
@leoisikdogan Год назад
Thanks! That's an interesting idea! Given the scale of training data, my guess is that it wouldn't make much of a difference for CLIP. It may be useful when training domain-specific models with limited data though. With dense, segmentation labels, we would get more information from fewer images.
@Roman-ki9dv
@Roman-ki9dv Год назад
Awesome stuff man! I really wanted to know how these work but was to lazy to look it up myself. The video makes it much easier.
@leoisikdogan
@leoisikdogan Год назад
Thanks! I'm glad you found the video helpful!
@ShrirangKanade
@ShrirangKanade 4 месяца назад
LOVED THE EXPLANATION
@muhamadmagableh2891
@muhamadmagableh2891 8 дней назад
Leo, you absolutely Amazing. May you give a video how to interpolate mathematically text classification: Single task multitask classification Multitask- transfer learning Plz
@Opinionman2
@Opinionman2 4 месяца назад
Awesome explanation dude.
@ge926
@ge926 Год назад
added to watch later! im sure its amazing as usual
@leoisikdogan
@leoisikdogan Год назад
I hope you enjoy the video when you get a chance to watch it!
@aloglute
@aloglute 8 месяцев назад
Çok iyi açıklamışsın valla daha önce izlemediğim için üzüldüm ❤
@Hirenpatel-cx8xh
@Hirenpatel-cx8xh Год назад
Great Explanation!!
@leoisikdogan
@leoisikdogan Год назад
Thanks!
@karthik8972
@karthik8972 Год назад
Thanks Leo for the video, the concept of converting noised image to a clear image is understood. How does it creates a image which doesn't exist in its training ? It is understood that the model doesn't understand the concepts of the image and only focuses on the patterns. But how is the below operations performed, 1. Creating a cartoon image of cat based on caption ex: Place a hat on top of cat How does it creates a cartoon image of cat ? How does it know the exact location of cat's head ? How does it know to place the hat exactly at the head ? 2. A closeup shot of a dog facing the sun How does it knows to create a close shot of a dog ? How does it know to place the sun in the background ? How it makes the the object to turn towards the sun ? No videos exist to explain this concept. It would be of great help if you could make a video on this.
@leoisikdogan
@leoisikdogan Год назад
Sure :) the short answer to your questions is cross-attention. The U-Net based generator is conditioned on text embeddings. Spatial attention softmax(K Q) determines where input text (e.g. cat, hat, etc) attends to.
@baluasonnhavlog7267
@baluasonnhavlog7267 10 месяцев назад
Xin chào bạn nhé, cảm ơn bạn đã chia sẻ, chúc sức khỏe bạn nhé, lúa chúc cả nhà xem video vui vẻ nha, chúc sức khỏe cả nhà ▶️👍👉🔔👈🤝🥰🥰🥰🥰🥰🥰🥰🥰🥰
@peterl7175
@peterl7175 Год назад
Great explanation Leo
@leoisikdogan
@leoisikdogan Год назад
Thanks!
@canxkoz
@canxkoz Год назад
Great explanation! Would you consider covering “DreamFusion: Text-to-3D using 2D Diffusion“ on your next video?
@leoisikdogan
@leoisikdogan Год назад
Thanks! DreamFusion is indeed a good one. I probably won't have time to make new videos anytime soon though.
@north1037
@north1037 Год назад
Merhabalar hocam bir sorum olacak sizlere Türkiyede bir devlet ünide yazılım mı okusam bitirdikten hemen sonra yurtdışına çıksam (pasaportum var) mantıklı mı yani gelecegi var mı yoksa diş hekimliği mi mantıkı
@samabd4998
@samabd4998 Год назад
Your videos are amazing and well prepared. Please could you guide me how can I becoming expert in computer vision?
@leoisikdogan
@leoisikdogan Год назад
Thanks! There are many paths to becoming an expert in computer vision or any other field, including taking courses, reading books and research papers, and practicing coding. If you already have some background in the field, I would recommend looking at papers with code. Good luck!
@samabd4998
@samabd4998 Год назад
@@leoisikdogan thank you for you responses. As you are expert and you have time please prepare videos about future trending and appliations of Deep learning. Thanks!
@user-wr4yl7tx3w
@user-wr4yl7tx3w Год назад
I think more context would be helpful. Some parts could help with more explanation
@leoisikdogan
@leoisikdogan Год назад
Thanks for the feedback. This video was indeed a bit denser than usual since I tried to fit a lot of information in 10 minutes. You can check out my Deep Learning Crash Course and Image and Video Processing series for more introductory videos.
@SnoopyDoofie
@SnoopyDoofie Год назад
This video doesn't show up in your RU-vid channel. I got here from a web page that embeds your video. I assume you have the video as "unlisted". If you want more views, you should change it to "public", otherwise very few people will find it.
@leoisikdogan
@leoisikdogan Год назад
Yes, I unlisted it for the time being due to a situation outside my control. Hopefully, I’ll be able to make it public soon.
Далее
Why Does Diffusion Work Better than Auto-Regression?
20:18
How AI 'Understands' Images (CLIP) - Computerphile
18:05
How diffusion models work - explanation and code!
21:12
Image Filters Explained
8:57
Просмотров 7 тыс.
How I Understand Diffusion Models
17:39
Просмотров 30 тыс.
How Super Resolution Works
9:29
Просмотров 70 тыс.
Diffusion models from scratch in PyTorch
30:54
Просмотров 248 тыс.
How Stable Diffusion Works (AI Image Generation)
30:21
Просмотров 147 тыс.