Stable Diffusion Deep Dive Notebook Run-through

Подписаться 4,6 тыс.

Просмотров 11 тыс.

50% 1

In this video/notebook Johno shows us what is happening behind the scenes when we create an image with Stable Diffusion, looking at the different components and processes and how each can be modified for further control over the generation process.
The notebook is available in this repository: github.com/fastai/diffusion-nbs
00:00 - Introduction
00:40 - Replicating the sampling loop
01:17 - The Auto-Encoder
03:55 - Adding Noise and image-to-image
08:43 - The Text Encoding Process
15:15 - Textual Inversion
18:36 - The UNET and classifier free guidance
24:41 - Sampling explanation
36:30 - Additional guidance
This was made as a companion to lesson one of the new FastAI 2022 part 2 course (aka Lesson 9) by Jonathan Whitaker (his channel: / @datasciencecastnet )
Errata: there should be some scaling done to the model inputs for the unet demo in cell 49 (19 minutes in) - see scheduler.scale_model_input in all the loops for the code that is missing. And in the autoencoder part the 'compression' isn't exactly 64 times since there are 4 channels in the latent representation and only 3 in the input.

Игры

Опубликовано:

3 авг 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 14

@reneeliu1648 Год назад

Thank you Jonathan! This is so amazing. Really appreciate the way you break it down into easy-to-digest pieces. The walkthrough is truly a badass black magic.

@yunijkarki9088 8 месяцев назад

WOW, Johno's video is great. This video deserves at least a 1 million views.

@jean-michelperraud4899 Год назад

This is a great dive in after Jeremy's live whiteboard drawings to convey the key ideas, thanks. I am starting to "grok" things a bit better after two sessions. Fascinating.

@sheikhshafayat6984 Год назад

Wow! Wow!! This was a hell of a walkthrough! I rarely comment on any video, but I just wanted to stop here and say a thanks for making this

@melonkernel Год назад

Awesome explanations. Great work. Thanks.

@datasciencecastnet Год назад

Errata: there should be some scaling done to the model inputs for the unet demo in cell 49 (19 minutes in) - see scheduler.scale_model_input in all the loops for the code that is missing.

@datasciencecastnet Год назад

Oh and in the autoencoder part the 'compression' isn't exactly 64 times since there are 4 channels in the latent representation and only 3 in the input :)

@ssw4m Год назад

I was also thinking that the input would be 3 channels of bytes (24 bit colour), while the latent might use 4 channels of 32 bit floats. That would be 12 times compression overall. Still pretty good, and I think we can optionally use float16 with the latents which would be 24 times compression. I read that JPEG typically achieves 10 times compression.

@californiaxfresh Год назад

This was indeed a run-through haha. I think I'll need to walk-through this notebook line by line :) Nevertheless, this video and especially the sampling portion helped a lot with my intuition behind some of the concepts so thank you!

@philtoa334 10 месяцев назад

Very Good.

@asheeshmathur 11 месяцев назад

Outstanding explanation, cleared the mist around process. However, I am unable to find roots for q(xₜ|xₜ₋₁) = N(xₜ; sqrt{1-βₜ}xₜ, βₜI). All discussions start with it as a base, but I would like to understand how mean and variance were derived. Could you please demystify this formula.

@seanriley3121 5 месяцев назад

when approaching a manifold, what would happen if the approach was aligned the norm of the manifold surface?

@zhshen7981 Год назад

👍

@manindermaan3695 3 месяца назад

hello i have a confusion regarding one topic in diffusions, can i get your contact info Jonathan, like your mail ID or any other contact info please, i am working on my last year project anything would help