TransGAN: Two Transformers Can Make One Strong GAN (Machine Learning Research Paper Explained)

Yannic Kilcher

Подписаться 261 тыс.

Просмотров 33 тыс.

0% 0

Видео Поделиться Скачать Добавить в

Опубликовано:

27 авг 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 88

@Ronnypetson 3 года назад

Yannic is like that nurse that mashes the potato with a spoon and gives it to you so that you toothless nerds can get fed

@YannicKilcher 3 года назад

This made me laugh so hard :D

@cerebralm 3 года назад

LOL

@theoboyer3812 3 года назад

That's a funny summary of what a teacher is

@cerebralm 3 года назад

@@theoboyer3812 I heard concise explanation described as "you have to make an ikea furniture without using all the pieces, but it still has to be sturdy when your done"

@swordwaker7749 3 года назад

Ahh... more like a chef. The papers in the original form can be hard to digest without... some help. BTW, the paper is like dragon meat.

@finlayl2505 3 года назад

Relationship ended with conv nets, transformers are my best friend now

@hoaxuan7074 3 года назад

A Fast Transform fixed filter bank neural network trained as an autoencoder works quite well as a GAN. Noise in image out. I guess with filter in the title...

@lunacelestine8574 3 года назад

That made my day

@dasayan05 3 года назад

<a href="#" class="seekto" data-time="1557">25:57</a> convolutions are for loosers, we're all for locally applied linear tranformation .. 😂

@xtli5965 3 года назад

They actually updated the paper so that: they no long use super-resolution co-training and locality-aware initialization, but instead using relative positional embedding and modified normalization. Also they tried larger images with local self-attention to reduce memory bottleneck. The most confusing part in this paper for me is the UpScale and AvgPool operation, since outputs from a transformer are suppose to be global feature, so it feels strange to directly upsample or pool as we do to convolution features.

@puneetsingh5219 3 года назад

Yannic is on fire 🔥🔥

@dasayan05 3 года назад

<a href="#" class="seekto" data-time="71">1:11</a> "which bathroom do the TransGANs go to ?"

@wilsonthurmanteng9 3 года назад

Hi Yannic, fast reviews as usual! I would just like your thoughts on the loss functions of the recent Continuous Conditional GAN paper that was accepted into ICVR 2021.

@MightyElemental Год назад

I was attempting to build a TransGAN for a university project and ended up with a very similar method. Only thing that was missing was the localized attention. No way was I gonna get that 💀

@rallyram 3 года назад

Why do you think they go with the wgan grad penalty instead of the spectral normalization as per Appendix A.1?

@minhuang8848 3 года назад

Dang, you learned some Chinese phonemes, didn't you? Pronunciation was pretty on point!

@dasayan05 3 года назад

YannicNet has been trained for several years now on AuthorName dataset. No wonder output quality is good

@minhuang8848 3 года назад

@@dasayan05 That's the good-ass Baidu language models lol

@hk2780 3 года назад

So why should we not use the conv when we use the locally linear function? I do not get any point from that. Also why they use 16 crop things. To be honest it is almost same as 16 stride 16 x 16 kernel size conv. And then they said that we do not use the convolution. Well they use the same thing to do with convolution. Sounds like it becomes more con artist tihg.

@dl9926 2 года назад

but that would be so expensive isn't it ?

@florianhonicke5448 3 года назад

I like your jokes a lot. It is much easier to me to learn something when it is fun!

@tnemelcfljdsqkf9529 3 года назад

Thank you a lot for your work, it's helping me a lot ! Which software are you using to take some notes on top of the paper like this ? :)

@Snehilw 3 года назад

Great explanation!

@nguyenanhnguyen7658 3 года назад

There is no high-res benchmark for TransGAN vs StyleGANV2 so we do not know if it is worth trying.

@raunaquepatra3966 3 года назад

I didn't get the point of data agumentatios in generators. Isn't the number of input samples practically infinite? I mean I can feed as many random vectors and get as many samples as needed?

@romanliu4629 3 года назад

An arithmetic question: what are the parameter size of the linear transforms before the "Scaled Dot-Product Attention" which produce Q, K, and V, when synthesizing 256^2 images? If we reverse the role of the "flattened" spatial axes and channel axis, how is it related to or different from 1×1 convolution? Why flattening and reshaping features and upscaling images via pixel-shuffle, which can disrupt spatial information and lead to checkerboard artefacts?

@syslinux2268 3 года назад

What is your opinion on MIT's new "Liquid Neural Network" ?

@YannicKilcher 3 года назад

Haven't looked at it yet, but I will

@syslinux2268 3 года назад

@@YannicKilcher Similar to an RNN but instead of scaling into billions of parameters, focuses more on higher quality neurons. Less parameters but with good or even better results than average sized neural networks.

@dasayan05 3 года назад

@@syslinux2268 paper link ? is it public yet ?

@shivamraisharma1474 3 года назад

Do you mean the paper liquid time constant neural networks?

@syslinux2268 3 года назад

@@shivamraisharma1474 Yep. It's just too long to type.

@WhatsAI 3 года назад

Hey Yannic, love the video! May I ask what tools are you using to read this paper, highlighting the lines and for recording it? Thanks! :)

@CosmiaNebula 2 года назад

likely a Microsoft Surface with a pen. Then any of the pdf annotator would work, even Microsoft Edge's pdf reader has that. As for screen recording, try OBS Studios. Would you make your own paper reading videos?

@WhatsAI 2 года назад

@@CosmiaNebula Thank you for the answer! I would indeed like to try that style of videos sometimes, but my initial question was mainly because I would love to use something similar in meetings to show explanations, math and etc

@tylertheeverlasting 3 года назад

What would have been the issue with an ImageGPT like Generator? .. Would it be too slow to train due to serial generation?

@YannicKilcher 3 года назад

Apparently, transformer generators have just been unstable for GANs so far

@tedp9146 3 года назад

How exaclty is the classification head attached to the last transformer head?

@chuby35 3 года назад

Could this be used with vq-vae2, so the lower res "images" that fed into this TransGAN are actually the latent space representations produced by vq-vae2?

@Kram1032 3 года назад

Can't help but think that the upsampling stuff is kinda like implicit convolutions... Not that it'd be particularly reasonable to not do this but it's setting up a similar localized attention type deal.

@MrMIB983 3 года назад

Great video

@user-on9bl2gi5h Год назад

Sir can you please provide Transgan training code and testing code

@etiennetiennetienne 3 года назад

cool bag of tricks! instead of this hardcoded mask, could it be just a initialization problem? if the probability to predict a positional-encoding vector aggreeing with far-away vector is low at beginning of training?

@G12GilbertProduction 3 года назад

<a href="#" class="seekto" data-time="771">12:51</a> Wait... 3 samples for the 1 × 156 pixel upsampled patch of data is corigates between the r² (alpha) and r² (beta) + ... r² (omega) channel transformers, or even 156 layer architecture base to finitely decoding he was recreating themself upper to 9 samples, right?

@lucasferreira7654 3 года назад

Thank you

@raunaquepatra3966 3 года назад

How in the SuperResolution auxiliary task the LR image is calculated? especially how the not of channels is matched with the input channel of the Network? eg SR image 64x64x3 , LR image 8x8x3???? (but the network need 8x8x192)

@raunaquepatra3966 3 года назад

couldn't find anything in the paper also😔

@user-qu2oz2ut2h 3 года назад

what if we change feedforward layer in transformer to another transformer? like a nested transformer

@simonstrandgaard5503 3 года назад

Impressive

@JFIndustries 3 года назад

The joke about the name was really unnecessary

@pastrop2003 3 года назад

For the generator network do I understand correctly that when you are using an example of a 4-pixel image that starts the generation and then say that every pixel is a token going into a transformer, you imply that each of this token has an embedding with the dimensionality equal to a number of channels? I.E. if one starts with a 2x2 image with 64 channels, every pixel (token) has 64-dimensional embedding going into the transformer?

@nasenach 3 года назад

Actually FID score is also kinda wrong, since the D stands for distance here... Ok, nerd out.

@timoteosozcelik 3 года назад

How do they give LR image as input of Stage 2? What I’ve understood so far that over the stages number of channel is decreasing (which means that more than 3 channel will be in the Stage 2) but LR will have only 3 channel.

@chuby35 3 года назад

Probably using the same trick used for the upsample, the other way around, so scaling down the image by moving the information to more channels. (Since this aux task is used only for teaching the upsample to work properly, I don't think the LR images losing any information just rearranging it to these "super-pixels") But I haven't looked at the code yet, so my guess is as good as any. :)

@timoteosozcelik 3 года назад

@@chuby35 It makes sense, thanks. But applying directly what you said made me questioning the necessity (meaning) of Stage 3 for such cases. I checked the code, but couldn’t see anything about that.

@pratik245 2 года назад

Ai papers are like news articles now.. So many and so similar

@phoenixwithinme 3 года назад

Yannic gets them, the ml papers, like in the targeted distance. 😂

@paiwanhan 3 года назад

Gan actually means the act of copulation in Mandarin. So TransGAN is even more unfortunate.

@array5946 3 года назад

<a href="#" class="seekto" data-time="1035">17:15</a> - is cropping a differential operation?

@udithhaputhanthri2002 3 года назад

I think what he says is, if cropping is a differentiable operation, we can use it.

@seonwoolee6396 3 года назад

Wow

@bluestar2253 3 года назад

Convnets are dead, long live transformers! -- reminded me of the late 80s "AI is dead, long live neural nets!" Karma is a bitch.

@SequoiaAlexander 3 года назад

Thanks for the video about this paper. Just what I was looking for. I will kindly suggest that the comment about the bathrooms would likely make some trans people uncomfortable. It is an unfortunate name for this research. Maybe best to leave it at that. Cheers and thanks for your work.

@UglyFatDwarf 3 года назад

yannic içimiz dışımız transformer oldu ya artık moruk, başka yok mu keyifli teorik kağıt; üzüyorsun."

@alpers.2123 3 года назад

Ahahaha

@pyroswolf8203 3 года назад

transformerler bitirdi sektörü

@G12GilbertProduction 3 года назад

Discriminator sounds like more cancelingly. ×D

@SHauri-jb4ch 3 года назад

Ich mag deine Videos, aber wenn du das Wort "Trans" liest und deine erste Assoziation ist, dass das etwas mit WCs zu tun hat, solltest du mal über deine Vorurteile nachdenken. Wenn du so viele Leute erreichst, sollte dir klar sein, dass du ein diverses Publikum haben 'könntest'. Leider sind wir da im Bereich ML noch nicht so weit und solche Mikroaggressionen tragen meiner Meinung nach dazu bei, dass es so bleiben wird.