Its reaallly nice but it would be a very nice addition to include variational autoencoders and Generative adversial networks as well :). Maybe they can be helpful to many struggling with class imbalance during classification
Hey Patrick a really informative and concise video! Thoroughly enjoyed it :DD Just a small correction at 12:51, you used the word dimension while explaining the Normalize transform, whereas the two attributes are just the mean and standard deviation of the resultant normalized data.
It should be noted that the performance difference between Linear and CNN as shown here comes from the chosen compression factor. Linear chose 12 Byte per image, CNN chose 256 Byte per image, where an original image is 784 Byte. So, the CNN code does not compress enough, less than PNG actually! You need two more linear layers to compress 64 down to 16 and then 4.
Great video! Could you provide the same walkthrough for a variational autoencoder? Or point point me to a good walkthrough on the theory and implementation of a variational autoencoder?
Great animations my suggestion is to add in more animations not only in theory but also in the working of the code . Just my suggestion but great video thanks for Ur teaching.
Hey Patrick I used your exact code to train the CNN based autoencoder but couldn't get it to converge without Batch Normalization, after adding BatchNorm2d after every ReLU it works fine, but without it, it doesn't, tried different values for lr from 1e-2 to 1e-5. I was training on MNIST dataset only. the loss becomes NaN or ranges between 0.10 to 0.09.
I don't understand a little bit the sintaxis. Why do you define the method 'forward' but never call it explicitly ? Maybe the line "recon = model(img)" is where you are using it, but I didn't know that it could be done like this. I would had written "recon = model.forward(img)", is it the same ?
hi, I have a question: if we pass the image as input of the model, it will put the weights to zero and then the output will be exactly the same of the input image. So, why the image is given as input of the model? It doesn't make sense to me. Could yu explain this to me?
If you normalize the input image which is also the label, the values will be between -1 to +1 but your output since passed through sigmoid will be between 0 and 1. How will you decrease loss for pixels that are between -1 to 0 as your predictions will never be less than 0?
you used recons and img as input for loss function, however if we want to train my model and test it we should use "recon" and "labels" as an input for loss function. but the labels are 3D, how we can do that?
Since, AutoEncoder is an unsupervised technique, so, recons and img are used as input to loss function. But, in semi-supervised or supervised methods, yo got labels, so we use them against the predicted values in loss function.