Text-to-video models explained

Google Research

Подписаться 398 тыс.

Просмотров 6 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

28 сен 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 25

@GoogleResearch Год назад

Subscribe to the Google Research Channel → goo.gle/GoogleResearch

@sabaokangan Год назад

Thank you Laurence for making this accessible to dummies like me

@LaurenceMoroney Год назад

You're not a dummy! But I'm happy you can enjoy it! :)

@xidchen Год назад

Wonderful video! But why the orchestration order is two space then two time, instead of one space then one time?

@stefan-bayer Год назад

Awesome explanation- just the right lenght and good abstraction. The host did a really great job. I am subscribed now!

@LaurenceMoroney Год назад

That's wonderful, thanks!

@user-wr4yl7tx3w Год назад

What’s confusing is how do you get an image that is sensible when you denoise the image. That part I don’t quite see.

@MattiaCeccopieri Год назад

The magic of the "black box"

@LaurenceMoroney Год назад

Check the earlier episode.

@kevinbuehler9213 Год назад

The second episode of Hidden Layers, “Text to video models explained,” maintains the same high standard as the first episode. Many thanks once again to Laurence Moroney and Google Research! Any chance we could cover Google’s LaMDA next? Perhaps there is another breakthrough conversation model you might touch upon as well. The whole idea of RLHF (Reinforcement Learning from Human Feedback) would be a great topic to dive into.

@LaurenceMoroney Год назад

At some point, definitely. I filmed these back in December, and things have been moving fast since then :)

@kevinbuehler9213 Год назад

Fair enough! Question for you: you described Google’s Imagen and Parti in episode 1. I was curious as to how Google’s Muse fits into the picture.

@SMASH_REVIEWS Год назад

REALLY AWESOME almost Unbelievable 💯💯💯

@LaurenceMoroney Год назад

Thanks!

@sotasearcher Год назад

Awesome! I'm reacting to this live. I feel that these 2 hidden layer videos now beg the question: have we tried the auto regressive approach for text-to-video?

@LaurenceMoroney Год назад

There's so many techniques in-flight and at various stages. It's hard to keep up!

@tomoki-v6o Год назад

cool supper resolution models also trained with text or just labels?

@asatorftw 11 месяцев назад

I would love to dive deeper into this to learn how it works!

@avi12 Год назад

That's pretty cool, though the last few models of the upscaling and time lengthening sound very inefficient Like, it would be much better to have a single model that upscales the video to resolution X×Y @ Z FPS

@LaurenceMoroney Год назад

Models are generally very static in their operation -- data of one shape in, and data of one shape out. Thus using multiple ones each for a specific task, and pipelining them together is generally more efficient than trying to do a single model to be more generic.