The second episode of Hidden Layers, “Text to video models explained,” maintains the same high standard as the first episode. Many thanks once again to Laurence Moroney and Google Research! Any chance we could cover Google’s LaMDA next? Perhaps there is another breakthrough conversation model you might touch upon as well. The whole idea of RLHF (Reinforcement Learning from Human Feedback) would be a great topic to dive into.
Awesome! I'm reacting to this live. I feel that these 2 hidden layer videos now beg the question: have we tried the auto regressive approach for text-to-video?
That's pretty cool, though the last few models of the upscaling and time lengthening sound very inefficient Like, it would be much better to have a single model that upscales the video to resolution X×Y @ Z FPS
Models are generally very static in their operation -- data of one shape in, and data of one shape out. Thus using multiple ones each for a specific task, and pipelining them together is generally more efficient than trying to do a single model to be more generic.