Тёмный

Generate long form video with Transformers | Phenaki from Google Brain explained 

AI Coffee Break with Letitia
Подписаться 49 тыс.
Просмотров 11 тыс.
50% 1

Опубликовано:

 

18 сен 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 26   
@automatescellulaires8543
@automatescellulaires8543 Год назад
This will revolutionize the meme market.
@DerPylz
@DerPylz Год назад
Thank you for also explaining Phenaki! I was curious about a non-diffusion model for video generation! 🎊
@barberb
@barberb Год назад
thank you letitia
@googIe.com.
@googIe.com. Год назад
Phenaki video generation looks like a dream that was recorded & replayed
@Handelsbilanzdefizit
@Handelsbilanzdefizit Год назад
Maybe there will be a way to visualize memories and dreams, by using Electroencephalography (EEG) and Neural Networks. So you can see what others think. Or see what others see, through their eyes.
@mrinmoybanik5598
@mrinmoybanik5598 Год назад
Good luck collecting training dataset🙂
@johnkintner
@johnkintner Год назад
researchers have already used fmri to do something similar! This was a while ago :D
@davidyang102
@davidyang102 Год назад
Because it resumes generation from a few frames it will lose context. Imagine generating a paragraph and then the next one using only the last word you generated. Luckily images captured a lot of information so it's not that obvious. But for example you can't do a video that looks around 360 degrees is it's generated with two iterations. Very dreamlike.
@rewixx69420
@rewixx69420 Год назад
i want so much infinite video generation on diffusion models
@AICoffeeBreak
@AICoffeeBreak Год назад
Soon. Just give Google some time to mount more TPUs in their racks. 😅
@AICoffeeBreak
@AICoffeeBreak Год назад
twitter.com/_akhaliq/status/1595645248243650560?t=PHepVXOP40pPdc5q3upUbQ&s=19 what about this? Didn't look into it.
@elev007
@elev007 Год назад
Great explanation- thank you 🙏
@federicolusiani7753
@federicolusiani7753 Год назад
Thank you for your video, great content as always! One question: in the video, you say that the video encoder is auto-regressive, so that it can be used on arbitrary number of video patches. But aren't standard transformer encoders already able to process inputs of arbitrary length? Usually the auto-regressive architecture is used in the decoder, because at inference time, we need it to generate the output causally. Am I missing something?
@AICoffeeBreak
@AICoffeeBreak Год назад
Thanks for this great question. Transformer sequence length is an interesting topic, which we've discussed here already: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-Xxts1ithupI.html Basically, even if it can generate / take in variable length input, it still has a predefined maximum input / output length due to practical limitations (compute time and memory). You are asking whether a causal model could not generate infinitely long video and -- for practical reasons -- the answer is no. Unmodified causal attention means that one attends to the whole generated past and for very long sequences. This means that the attention window increases linearly and computation time and memory increases quadratically. So because of limited compute time and memory, we cannot generate indefinitely, unless one applies such tricks as the Phenki authors with MaskGIT, to only attend to a small fraction of the tokens of the past generated output.
@sadface7457
@sadface7457 Год назад
Hello MsCB 👋
@AICoffeeBreak
@AICoffeeBreak Год назад
Hello Sad Face! Did not see you in the comments in a long time! 👋
@DerPylz
@DerPylz Год назад
Woohooo! Sad Face is back! 🎊
@summary7428
@summary7428 Год назад
great video, but i think it was wrongly placed in your (awesome) diffusers playlist =)
@AICoffeeBreak
@AICoffeeBreak Год назад
You are right, it is not a diffusion model. It's about content generation. 😅 I was more comfortable with it being in this playlist (especially as the last video in the row) rather than being nowhere close to it's fellow competition. But sure, I do not have the Paella video in the list, although Paella can be argued to be a diffusion model. I need to clean up.
@TheGatoskilo
@TheGatoskilo Год назад
I wonder how do they pad the video tensors with variable sequence length.
@AICoffeeBreak
@AICoffeeBreak Год назад
Do you see this as problematic?
@TheGatoskilo
@TheGatoskilo Год назад
I just wonder to the implementation level, these padding values as well as masking the tokens, did someone decide that we will fill these tensors with 0s? Does it matter what we are going to fill those vectors with? What if these padded/masked values of 0s overlap with actual data, how do we effectively instruct the model to disentangle masked values from 0s corresponding to the actual data?
@TheGatoskilo
@TheGatoskilo Год назад
@@AICoffeeBreak No, I just wonder how it works in the implementation
@TimScarfe
@TimScarfe Год назад
Amazeballs ❤
@AICoffeeBreak
@AICoffeeBreak Год назад
Thanks, Tim! 😅 Happy to see MLST release content since what it feels a long time!
@VivaLaRevoMW3PS3
@VivaLaRevoMW3PS3 Год назад
Boring video there's already like 100 of this kind, people want to know where they can use it or how they can use it
Далее
MAMBA and State Space Models explained | SSM explained
22:27
[1hr Talk] Intro to Large Language Models
59:48
Просмотров 2,2 млн
Transformers explained | The architecture behind LLMs
19:48