Week 8 - Practicum: Variational autoencoders

Alfredo Canziani (冷在)

Подписаться 39 тыс.

Просмотров 23 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

29 окт 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 99

@harishlakshmanapathi1078 4 года назад

This is one of the best intuitive lectures that I have come across recently for VAE, the visualizations were really great and pretty concise with what needs to be said visually. Thanks a lot!!!

@alfcnz 4 года назад

Wow, thank you! 😍

@harshavardhanasrinivasan3125 2 года назад

Thanks so much Professor Alfredo all these lectures are pure Diamond

@alfcnz 2 года назад

💎💎💎

@xzl20212 2 года назад

Amazing questions from students ; awesome explanation from Alfredo. Thanks to release the video.

@alfcnz 2 года назад

🥳🥳🥳

@MatTheBene 4 года назад

I really enjoyed this lecture, I think the calmer environment works well.

@alfcnz 4 года назад

“Calmer environment” → my bedroom? 😁

@JackSPk 4 года назад

Great class! Love the visualizations and students participating as crazy! Thank you as always! ❤️

@alfcnz 4 года назад

Yeah, they have been very vocal during these remote practica. I had to ask for consent (to include their voices) to so many of them!

@mariamatveyeva4511 3 года назад

amazingly simply and clearly exlained!

@alfcnz 3 года назад

😊😊😊

@rbr951 3 года назад

You are a born teacher, Alfred!!! Good show, great lecture.

@alfcnz 3 года назад

Check out the 2021 edition. This lecture has been improved. 😇😇😇

@vincentmichael089 Год назад

this one well-crafted lecture! thank you very much!

@alfcnz Год назад

You’re welcome! Check out the latest version of this lecture for better quality of visuals and explanations.

@encoride3312 4 года назад

Goodness you are amazing! Thanks for sharing this. I have learned so much, I want to give you a hug!

@alfcnz 4 года назад

🤗🤗🤗

@anrilombard1121 Год назад

Thanks again Alfredo!

@alfcnz Год назад

You're welcome 🤗🤗🤗

@amalenduiyer5084 4 года назад

This was an intuitive explanation yet grounded in math. Such a delicate balance! Thanks for doing this! Also, @24:48 I agree the bubble-of-bubbles is indeed cute. 😄

@alfcnz 4 года назад

Glad you enjoyed it!

@charlesbaudinet189 Год назад

Thank you for the excellent class. I am curious as to why there is such a significant difference in the distribution between epoch 5 and epoch 10 in the last Jupyter notebook cell (at 43:40 in the video). Is this variation a result of the TSNE or the VAE training?

@alassanndiallo 3 года назад

Thanks a lot, alfredo. This lecture was very helpful.

@alfcnz 3 года назад

Glad to hear that 😇😇😇 The new version is out! Check it out!

@mareguassefa5223 3 года назад

Thanks, Sir !! For the bubbles

@TimvonHahn 4 года назад

Great explanation. Thanks Alfredo for sharing this!

@alfcnz 4 года назад

Of course 😊 There would be no sense in keeping it for few only 😜

@rakshithv5073 4 года назад

Woww, this is probably the best lecture for me to understand VAE to a very good extent. Little more about the derivation of loss function and back propagationwould have been fantastic. Thanks a lot for these lecture series

@alfcnz 4 года назад

I'm glad it was helpful. Every other VAE explanation out there is so confusing for me, so I tried to explain it in simpler terms, without neglecting too much.

@rakshithv5073 3 года назад

@@alfcnz If possible can you please provide me the link to the derivation of the penalty term

@alfcnz 3 года назад

@@rakshithv5073 that was a question on the exam. I recommend you trying it out yourself. You won't gain much understanding in just seeing it. Moreover, there are two approaches. One involving integrals the other manipulates implicit operators. Let me know if you need more suggestions to get started.

@rakshithv5073 3 года назад

@@alfcnz Sure I will give it a try

@rakshithv5073 3 года назад

Hi My doubt was specific to the expansion of relative entropy between z and n(0,1) which leads to penalty term. As you suggested I gave it a try but I stuck at a point , can you give me a hint to move forward drive.google.com/file/d/1p8MWL9B60h6nTuRNkGo6ZHTPOpNvWhkp/view?usp=drivesdk

@spinity8468 3 года назад

Very good class Alfredo!

@florianro.9185 3 года назад

Really nice :) You explain very well

@songsbyharsha 4 года назад

I loved the way how you are using the concepts of Linear Algebra (@19:32) - at the end it's all vectors and transformations :) You are a great mentor! Note that I did not say "coach", because you are equipping each of us with the skills that can solve most problems, not just one :) Huge Fan of your lectures & advice :)

@alfcnz 4 года назад

Glad you like it! I generally have to "fight 👊🏻 y'all" at the beginning. But if you trust me, then we can move forward together. 🤗 I'm simply trying to "rotate 🔄" you such that you can view things from my perspective. 😋

@songsbyharsha 4 года назад

@@alfcnz Haha, that's true and you are succeeding at it :) Can I use this model on my own image dataset ? If yes, how ? I really want to see - Unique images being generated from the images around me.. I saw the training data being loaded from torch library.. A way out to use own data ?

@alfcnz 4 года назад

@@songsbyharsha of course you can use your images. Check out the ImageFolder data set provided by TorchVision. pytorch.org/docs/stable/torchvision/datasets.html#imagefolder

@harshamusunuri1924 4 года назад

@@alfcnz Thank you so much.. Will work with it :)

@harshamusunuri1924 4 года назад

@@alfcnz This feeling is priceless. I just used my own image dataset and could see stunning results.. Machine Learning is love! I can't thank you enough!!! I have just one doubt, though this might be trivial, the images that I have are RGB ones. While I am reading them into PyTorch, why are they turning into Lemonish and Dark Gray shades ? Am I reading them wrong ? I observed this with MNIST dataset you showed us as well.. Please please let me know :)

@TheAIEpiphany 3 года назад

Hey, Alfredo awesome lecture! Needed to brush up on my VAE knowledge. Combing the paper with the online resources is the way to go since the original paper is super abstract. It would probably be beneficial to note that we don't have to deal with Gaussians in the general case. The original paper just gave some examples using Gaussian and Bernoulli distributions but mentioned many other possibilities. I'm afraid people will start "overfitting" to that particular form of the KL div when we're dealing with Gaussian priors - although I am strongly for giving a concrete example as you did! Just noting the abstraction (so the opposite approach of how the paper introduced it). Secondly, did you do some simple ablation studies on how does computing logvar instead of directly regressing s.d. impact the performance? (properties of the latent space and the reconstruction loss).

@NikhilDhanda1 3 года назад

This was a great explanation, however I don’t understand why at 52:00 we don’t want to do the reparameterization trick during testing and only return the mu? I would assume we would want to always sample from the latent distribution for passing it to the decoder? Making the encoder give deterministic output (just the mu) during testing will defeat the purpose of variational auto encoders right?

@sarvagyagupta1744 3 года назад

This is honestly the best explanation of VAEs. SO what I understand is instead of just training one latent distribution z in AEC (which has a mean and variance), we are training two parameters E and V which represent features of the images (digit 1 will have it's own E and V and digit 2 will be different) and using e (epsilon), we are able to interpolate different shapes of a digit. Did I get it right?

@alfcnz 3 года назад

I'm glad you liked the video. I *highly* recommend checking out the 2021 version ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-PpcN-F7ovK0.html Back to your question, we don't have labels… so we don't have a specific distribution per digit. Nevertheless, the network will indeed have a continuous set of distribution per given input. For example, there will be a smooth probability family modelling anything that goes from a 1 to a 7 (since they look _very_ similar, especially if handwritten). The decoder will do its best to try to reproduce the input, while the latent space is packed with internal representations / code of the input data, which lives on a manifold, embedded in ℝⁿ.

@dojjy5798 3 года назад

thank you for your lecture, may I ask in the reparameterise function, in eps = std.data.new(std.size()).mormal_() why do we need the .data? If we only want a noise of std.size , why would you copy the data from std before overriding it?

@alfcnz 3 года назад

The new operator will create a new tensor of the same type as std's data, without interfering with the autograd. There's no copy involved.

@dojjy5798 3 года назад

@@alfcnz so if I used std.new(std.size()).normal_(), the autograd will be interfered?

@alfcnz 3 года назад

The behaviour is not clear and may depend on the version of PyTorch itself. If you access the data, then you'll know for sure you'll not mess with the gradients. Anyhow, I think now the recommended way is to pass std.type() to torch.rand's dtype. This didn't exist when I created the notebook.

@dojjy5798 3 года назад

@@alfcnz thank you very much

@parekhvivek7779 3 года назад

You have given great lecture. Thanks for that. I have a question,If my dataset is numerical features(like housing price dataset) than how should I construct loss and network?

@alfcnz 3 года назад

Using fully connected layers and a squared Euclidean distance for reconstruction?

@shravyakuldeep1839 3 года назад

Assume that my batch size is 256 and latent space dimension is 2, the size of learnt mu is [256,2]. Why is the model learning 256x2 different mu when all my training examples belong to the same distribution. I understand that it learns 2 different mu because my latent space dimension is 2 and each mu here corresponds to one of the learnt features, but why 256?

@alfcnz 3 года назад

The encoder generates a μ per input x. There is a distribution *per* input sample.

@shashankshekhar7052 3 года назад

Hi Alfredo, Amazing lecture. One question I had for image retrieval task should I go for feature space extracted from vgg or latent space trained from variational auto encoders.

@alfcnz 3 года назад

Try both and choose the method which gives you a higher validation score.

@shashankshekhar7052 3 года назад

@@alfcnz Thanks for suggesting. One last question linear interpolation between two images in the input (image) domain does not work as it produces a cross-dissolve effect between the intensities of the two images. Can you suggest further on interpolation which doesn't have this issue.

@alfcnz 3 года назад

What do you mean the interpolation does not work? The translucent composite is the expected result.

@shashankshekhar7052 3 года назад

@@alfcnz Sorry for not being clear, here is the link for what i meant github.com/aiwithshekhar/gladiators/blob/main/VAE.JPG

@alfcnz 3 года назад

The repo is private 😬😬😬

@asilepolitiqueasilepolitiq6943 Год назад

Question : what GPU do you use with your Mac ? Its it an Intel Mac ? Do you run linux on your Mac ?

@alfcnz Год назад

I do own an Intel Mac with an Nvidia GPU. Currently I’m on ARM but I haven’t really been using its GPU.

@asilepolitiqueasilepolitiq6943 11 месяцев назад

Sorry i hadn’t seen your answer. Thank you very much for this and for the videos !

@junowhut7486 3 года назад

Alfredo, this is amazing! I've been trying to learn about VAE on my own and this is by far the best lecture and implementation I have found. One question, in your definition of the loss function, you have the Beta term like in the paper from deepmind. In the code is your Beta defined as .5 when computing KLD in loss_function()? Thank you for the epic tutorial!!

@junowhut7486 3 года назад

I also asked on the Reddit if that's a better place :)

@alfcnz 3 года назад

Yeah, I've already replied there. «Hi u/yupyupbrain, thanks for asking. Indeed we should have had `return BCE + β * KLD`, and added `β=1` in the function definition. Feel free to send a pull request on GitHub with such correction. Yes, cross-validation is how hyperparameters are selected.» Next time, plz include the time stamp, so that I can easily see what you're talking about here.

@AbhijitGuptamjj 4 года назад

Great, just like the previous lecture in the series. I must say that your visualization skills are just superb! The way you elucidated reconstruction loss is really elegant and intuitive at the same time. I have a minor question - Was it necessary to use nn.Sigmoid() at the end? I mean you could have used BCEWithLogitsLoss and KLD, which uses log-sum-exp trick and is said to be more numerically stable thab using sigmoid followed by BCE + KLD? I understand that images pixels were in (0,1) but that could be rescaled right?

@alfcnz 4 года назад

I'm glad you've enjoyed the lecture. I believe you're correct. Let's continue this discussion on an issue / PR on GitHub. As I keep saying, I'm always up to improving the material I've been making! Thanks for your feedback! ❤️

@harishlakshmanapathi1078 4 года назад

Since I am new to the field of AI domain I have studied that sigmoid goes through the vanishing gradient problem. The same applies to tanh too, then I have a doubt, what other activation function do you prefer to use apart from the above 2 that are mentioned. Do you think that we can use relu and rescale the pixel values between 0 and 1, or are there any other activation functions that I haven't heard of.

@adhokshajapradeep5907 3 года назад

These lectures are heaven-sent for those who are looking to upskill! I had a question about the encoder; Can the encoded vectors (mu+epsilon*std_dev) for images be used to measure image similarity using cosine distance? Thanks a ton! Cheers!

@anuranjankumar2904 3 года назад

Hi Alfredo, the bubbles intuition was mind blowing. Thanks a ton for making this so intuitive! One question, can we take any other distributional assumption for the hidden space, like exponential etc.? If yes, how will it affect the hidden representations and the final decoder output?

@alfcnz 3 года назад

The only other option I'm aware of is using a categorical one, where the 1 embedding (out of K) closest to the encoded input is sent to the decoder. I'm talking about the VQ (vector quantised) VAE.

@anuranjankumar2904 3 года назад

Thanks! I'll look into it.

@sekfook97 3 года назад

Superb lecture, btw I am curious that could we capture the relationship of Z and output of decoder? For example higher Z value will produce image with darker colours. Besides, I think we can put a classifier on top of output of encoder of autoencoder but not sure we can do the same for variational encoder.

@rbr951 3 года назад

Actually after training, we can attach a classifier instead of decoder to find out if the test image fits into the latent neighbourhood distribution or not and hence classify normal or abnormal.

@mathiasparisot5326 3 года назад

Great video! Thanks a lot! I have a question about the visualization of z as bubbles: As I understand z follows a normal distribution with mean E(z) and variance V(z), so the spread of those bubbles is actually "infinite" right (there is no clear separation between "bubbles")? In that case, wouldn't they overlap anyway ?

@Учитељица 3 года назад

In stats.stackexchange.com/a/60699/228453 there is -d part, but formula atcold.github.io/pytorch-Deep-Learning/en/week08/08-3/ uses -1. Is there any tips how to get from general formula for multivariate Gaussian relative entropy into more specific one from your site.

@alfcnz 3 года назад

There's a sum, on the left. So you have d × -1 = -d. To get the formula you compute the integral 🙂

@Учитељица 3 года назад

We can use sums because d dimensions of latent space are independent because z covariance matrix is diagonal. In the image you draw just z1 single dimension of latent space, but in fact there should be z1, z2, ... and zd dimensions of the latent space. Right?

@alfcnz 3 года назад

@@Учитељица yeah, sure. There are identical charts per each dimension.

@Учитељица 3 года назад

@@alfcnz you were able to suppress 80 pages paper arxiv.org/pdf/1906.02691.pdf into few nice diagrams and 30 minutes of theoretical talk that has sense. Possible the only few things that you maybe have in some other videos, is to connect paper terminology with your work. That would be awesome! Say what would be the prior distribution, what is posterior distribution, encoder distribution, decoder distribution, what is that ELBO? I guess ϵ is the prior, but I am not sure. w are NN parameters (one part for decoder and other for encoder). In the paper they used φ and θ. q_φ(z|x) for the encoder and p_θ(x|z) for the decoder. Hard to swallow paper. 🙂

@alfcnz 3 года назад

@@Учитељица haha, I'm glad you found it useful. Regarding the notation, it's irrelevant. I'm not explaining a given paper. I'm explaining its contributions. Once you understand how it works, the way it's written is irrelevant. I would also highly recommend watching the 2021 edition of this lecture (coming out in two weeks). It's rather different. The version 2022 will have the prior / posterior nomenclature and derivation. As you've already noticed, I did not use anything from the variational inference field. Just explained how the model works.

@ЗакировМарат-в5щ 4 года назад

I do not understand why on 28:04 N(0, Id) has 0 as E()? According to picture you said Lkl is enforcing z to be in small bublle with different centers, but center must be in some point not in 0. So your KL loss must construct something like in the picture ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-bbOFvxbMIV0.html but actually no bubbles

@alfcnz 4 года назад

The 0-vector is what we choose as prior, namely a normal distribution. The bubbles' centre are attracted to the origin, but the bubbles won't overlap, since the reconstruction error would increase otherwise. Furthermore, the "bubble" is simply a d-dimensional Gaussian, which parameters are given to you by the encoder. You didn't share any picture in your comment, so I'm not sure what you're talking about.