No video :(

Week 8 - Lecture: Contrastive methods and regularised latent variable models

Alfredo Canziani

Подписаться 39 тыс.

Просмотров 14 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

28 авг 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 21

@AbhijitGuptamjj 3 года назад

One year after I last completed this course, I’m revisiting it again to watch specific portions to refresh my understanding. These lectures have been immensely helpful both professionally and academically. Prof YLC tutorial on energy based learning (2006) provided me with an entirely new perspective on why certain loss functions work well and why some don’t.

@alfcnz 3 года назад

Watch the new edition! I explain everything from the energy perspective! 🤓🤓🤓

@chinghoelee9031 Год назад

this course is very good! Thanks for making it public. It will be nice if you could make other stuff public as well! Thanks!

@alfcnz Год назад

I’m editing and uploading the Fall 2022 edition these days! 😀

@anrilombard1121 Год назад

Super fun studying this course as a Sophomore!

@alfcnz Год назад

🥳🥳🥳

@adithyakamath640 3 года назад

At 1:03:00 I don't understand how z and z_bar can be made similar using the D function. Are they distributions and you're applying KL divergence? I mean if z is a latent variable, how is it's value know for comparison?

@user-we5so5xm5y 4 года назад

Thanks

@alfcnz 4 года назад

No problem 🤗

@guillaumevermeillesanchezm2427 4 года назад

When using methods like Contrastive Divergence (or GANs, for that matters), how can we make sure that the low energy points founds aren't actually legit?

@YannLeCunPhD 3 года назад

We don't. In the end, the learning will stabilize when the low-energy contrastive points are on the data manifold: the loss will push down and push up on them equally hard.

@AbhijitGuptamjj 4 года назад

I have a question - when we are doing transfer learning using a model trained on imagenet(1000 classes) and using that model on a smaller multiclass classification problem, unless we are careful prolonged training with relaxation of early layers will distort the features learnt for our new task. Now, if we intend to use these features for other purpose like interpretability then we will get distorted features which are becoming degenerate as we keep training. How do we circumvent this problem besides using a very small learning rate and freezing the earlier layers?

@YannLeCunPhD 4 года назад

A simple trick is to use an L2 regularizer that attracts the weight vector W towards the solution obtained from ImageNet pre-training W0: R(W) = alpha*||W-W0||^2 The system will then try to adapt to the new dataset while minimizing the deviation from the initial configuration. A slightly refined version is to make alpha a diagonal matrix, so that lower layers are more regularized than higher layers.

@AbhijitGuptamjj 4 года назад

Yann LeCun Thanks professor. I’ll try that.

@sushilghimire2406 4 года назад

I am following all the lectures...really enjoyed every video, I lost many time in this video.. is there any alternative source to learn the same content? Thank you, sir, for uploading the course and open-sourcing the notebooks

@alfcnz 4 года назад

All the videos are transcribed in English and translated in several languages. Have you checked the video description here at all? There's *plenty* of material I've been uploading and have linked there.

@alikassem812 3 года назад

I have a question - can we take the produced filters or kernels of AE ( auto-encoder ) or VAE and use them as a filter in ConvNet and start training instead of training from scratch or from pre-trained weights like imagnet ?

@alfcnz 3 года назад

Of course, training an AE is one possiblity to learn feature extractors useful for later tasks. Recently, we've seen that joint embedding techniques are also effective strategies to perform self-learning.

@alikassem812 3 года назад

@@alfcnz Thanks