No video :(

05L - Joint embedding method and latent variable energy based models (LV-EBMs)

Alfredo Canziani

Подписаться 39 тыс.

Просмотров 24 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

28 авг 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 67

@damnit258 2 года назад

gold, ive watched the last year's lectures and i'm filling the gaps with this year's ones.

@alfcnz 2 года назад

💛🧡💛

@oguzhanercan4701 5 месяцев назад

The most important video all around the internet for comp. vis. researchers. I watch the video several times in a year regularly.

@alfcnz 5 месяцев назад

😀😀😀

@COOLZZist 2 года назад

Love the energy from Prof. Yann LeCun, just from his excitement on the topic and the small smiles he has when he is talking about how fresh this content is, is amazing. Thanks a lot Prof. Alfredo!

@alfcnz 2 года назад

😄😄😄

@gonzalopolo2612 2 года назад

Again @Alfredo Canziani thank you very much for making this public, this is an amazing content. I have several questions (I refer to the instant(s) in the video): 16:34 and 50:43 => Unconditional model is when the input is partially observed but you dont know exactly what part. - What is test/inference in these unconditional EBM models? Is there a proper split between training and inference/test in the unconditional models? - How does models like PCA or K-means fit here, what are the partially observed inputs Y? For example in K-MEans you receive all the components of Y, I dont see that they are partially observed 25:10 and 1:01:50 => With the joint embedding architecture - What would be inference with this architecture, inferring a Y from a given X minimizing the cost C(h, h')? I know that you could run gradient descent to the Y backward the Pred(y) network but it is not clear to me the purpose of inferring Y given X in this architecure. - What does the "Advange: no pixel-level reconstruction" in green means? (I suspect that this may have something to do with my just above question) - Can this architecture also be trained as a Latent Variable EBM? or it is always trained in a Contrastive way

@lucamatteobarbieri2493 Год назад

A cool thing about prediction systems is that they can be used also to predict the past, not only the future. For example if you see something falling you both intuitively predict where is going and where it came from.

@RJRyan 2 года назад

Thank you so much for making these lectures public! The slides are very difficult to read because of being overlaid over Yann's face and the background image. I imagine this could be an accessibility issue for anyone with vision impairments, too.

@alfcnz 2 года назад

That's why we provide the slides 🙂🙂🙂

@buoyrina9669 2 года назад

I guess i need to watch many times to get what Yann was trying to explain :)

@alfcnz 2 года назад

It's alright. It took me ‘only’ 5 repetitions 😅😅😅

@hamedgholami261 2 года назад

so that is what contrastive learning is all about!

@alfcnz 2 года назад

It seems so 😀😀😀

@kalokng3572 2 года назад

Hi Alfredo thank you for making the course public. It is super useful especially to those who are self-learning cutting-edge AI concept and I've found EBM a fascinating one. I have a question regarding EBM: How should I describe "overfitting" in the context of EBM? Does that mean the energy landscape have very small volume surrounding the training sample data points?

@alfcnz 2 года назад

You're welcome. And yes, precisely. And underfitting would be having a flat manifold.

@user-co6pu8zv3v 3 года назад

Thank you, Alfredo! :)

@alfcnz 3 года назад

Пожалуйста 🥰🥰🥰

@cambridgebreaths3581 3 года назад

Perfect. Thank you so much :)

@alfcnz 3 года назад

😇😇😇

@anondoggo 2 года назад

Dr. Yann only mentioned this in passing at 20:00 , but I just wanted to clarify, why does EBM offer more flexibility in choice of scores and objective functions? It's from page 9 on the slides. Thank you!

@anondoggo 2 года назад

nvm, I should have just watched on, at 1:04:27 Yann explained how probabilistic models are EBM where the objective function is NLL.

@anondoggo 2 года назад

then by extension, the scoring function for a probabilistic model is probably restricted to a probability.

@anondoggo 2 года назад

the info at 18:17 is underrated

@MehranZiadloo 7 месяцев назад

Not to be nitpicking but I believe there's a minus missing @49:22 in denominator of P(y|x) at the far end (right side of screen) behind the beta.

@alfcnz 7 месяцев назад

Oh, yes indeed! Yann is s little heedless when crafting slides 😅

@MehranZiadloo 7 месяцев назад

@@alfcnz These things happen. I just waned to make sure that I'm following the calculations correctly. Thanks for confirmation.

@alfcnz 7 месяцев назад

Sure sure 😊

@hamidrezaheidarian8207 7 месяцев назад

Hi Alfredo, which book on DL do you recommend that has the same sort of structure as the content of this course?

@alfcnz 7 месяцев назад

The one I’m writing 😇

@hamidrezaheidarian8207 7 месяцев назад

@@alfcnz Great, I think it would be a great companion to these lectures, looking forward to it.

@my_master55 2 года назад

Hi, Alfredo 👋 Am I missing something, or in this lecture there is no "non-contrastive joint embeddings" methods Yann was talking about at 1:34:40 ? I also briefly checked the next lectures but didn't find something related to this. Could you please point me out? 😇 Thank you for the video, btw, brilliant as always :)

@anondoggo 2 года назад

If you open the slides for lecture 6 you can find a whole page on non-contrastive embeddings.

@arcman9436 3 года назад

Very Interesting

@alfcnz 3 года назад

🧐🧐🧐

@arashjavanmard5911 2 года назад

Great lecture, thanks a lot. But it would be also great if you could tell us a reference book or publications for this lecture. Thanks a lot in advance.

@alfcnz 2 года назад

I'm writing the book right now. A bit of patience, please 😅😅😅

@Vikram-wx4hg 2 года назад

@@alfcnz Looking forward to the book Alfredo. Can you give a ball park estimate of the 'patience' here? :-)

@alfcnz 2 года назад

End of summer ‘22 the first draft will see the light.

@anondoggo 2 года назад

@@alfcnz omg, I'm so excited

@iamyouu 2 года назад

Is there any book that i can read from to know more about these methods. thank you.

@alfcnz 2 года назад

I'm writing the book. It'll take some time.

@iamyouu 2 года назад

@@alfcnz thank you so much!

@alfcnz 2 года назад

❤️❤️❤️

@SnoSixtyTwo 2 года назад

Thanks a whole bunch for this lecture, after two times I think I'm starting to grasp it :) One thing that confuses me though is: in the very beginning, it is mentioned that x may or may not be adapted when going for the optimum location. I cannot quickly come up with an example where I would want that? Wouldn't that mean I am just discarding the info in x and - in the case of modeling with latent variables - now my inference becomes a function of z exclusively?

@alfcnz 2 года назад

You need to write down the timestamp in minutes:seconds if you want me to be able to address any particular aspect of the video.

@SnoSixtyTwo 2 года назад

@@alfcnz Thanks for taking the time to respond! Here we go, 15:20

@vageta008 2 года назад

Interesting, Energy based models do something very similar to metric learning. (Or am I missing something?).

@alfcnz 2 года назад

Indeed metric learning can be formulated as an energy model. I'd say energy models are like a large umbrella under which many conventional models can be recast.

@pratik245 2 года назад

French language seems to be more suited for misic.. Has a sweet tonality..

@alfcnz 2 года назад

🇫🇷🥖🗼

@ShihgianLee 2 года назад

I spent some time to derive the step mention in 1:07:44. I made my best effort to get the final result. But, I am not sure if my steps are correct. I hope my fellow students can help to point out my mistakes. Due to the lack of LaTex support in RU-vid comment, I try my best to make my steps as clear as possible. I use partial derivative for log to get to the second step. Then, I use Leibniz integral rule to move the partial derivative inside the integral in the third step. The rest is pretty straightforward, hopefully. Thank you! ∂/∂w (1/β) log[ ∫y′ exp[−βFw(x, y')] ] = (1/β) [1/∫y′ exp[−βFw(x, y')] ∂/∂w ∫y′ exp[−βFw(x, y')] = (1/β) [1/∫y′ exp[−βFw(x, y')] [∫y′ ∂/∂w exp[−βFw(x, y')]] = (1/β) [1/∫y′ exp[−βFw(x, y')] [∫y′ exp[−βFw(x, y')] ∂/∂w −βFw(x, y')] = - [1/∫y′ exp[−βFw(x, y')] [∫y′ exp[−βFw(x, y')] ∂/∂w Fw(x, y')] = - [∫y′ exp[−βFw(x, y')/∫y′ exp[−βFw(x, y')] [∂/∂w Fw(x, y')] = - ∫y′ Pw(y'|x) ∂/∂w Fw(x, y')

@hamedgholami261 2 года назад

can you put a link to a latex file? I did the derivative and maybe able to help.

@aljjxw 3 года назад

What are the research papers from Facebook mentioned around 1:30?

@alfcnz 3 года назад

All references are written on the slides. At that timestamp I don't hear Yann mentioning any paper.

@bmahlbrand 3 года назад

How do you use autograd in pytorch for "nonstochastic" gradient descent?

@shiftedabsurdity 3 года назад

probably conjugate gradient

@alfcnz 3 года назад

If the function I have is not approximate (not like the per-batch approximation of the dataset loss), then you're performing non-stochastic GD. The stochasticity comes from the approximation to the objective function.

@mpalaourg8597 3 года назад

I tried to calculate the derivative Yann said (1:07:45), but probably I am missing something because in my final result I don't have the integral (only -P_w(.) ...). Is there any supplementary material with these calculations? Thanks again for your amazing and hard work!

@alfcnz 3 года назад

Uh… can you share your calculations? I can have a look. Maybe post them in the Discord server, maths room, so that others may be able to help as well.

@mpalaourg8597 3 года назад

@@alfcnz It was my bad. I... misunderstand the formula of P_w(y/x) and thought that was an integral at the numerator (over all y's), but that didn't make any sense to me and checked again your notes and ...voilà I got the right answer. Is the discord open to us too? I thought only for students of NY. I definitely join then (learning alone, isn't fun :P).

@alfcnz 3 года назад

Discord is for *non* NYU students. I have another communication system set up for them.

@pratik245 2 года назад

Yanic kilcher is asking questions it seems

@alfcnz 2 года назад

Where, when? 😮😮😮

@pratik245 2 года назад

Meta helicopter

@Vikram-wx4hg 2 года назад

:-))