You can try out Luma AI's Dream Machine here! luma.1stcollab.com/bycloudai I am really good at having great timing. MovieGen came out when I nearly finished the video. I'm sad. So here's a quick definition of DiT: A diffusion transformer (DiT) is a model that combines elements of diffusion models and transformers to generate data like image synthesis, audio generation, or text generation. Diffusion models are a class of probabilistic generative models that create data by iteratively denoising a latent variable, which starts from pure noise and is gradually transformed into a coherent sample. Transformers on the other hand, are neural network architectures known for their ability to model long-range dependencies in data, primarily through self-attention mechanisms. You could ultimately say that, a diffusion transformer is just a transformer with the goal of denoising. Yum. Here's MovieGen's paper: arxiv.org/abs/2410.13720 it contains a better run down to crafting the latest near SoTA video generation
Thanks a lot for this video, it was really helpfull to start to understand how all these ai technology works. All the people working behind this is literal geniuses.
This video is so cool, a literal gold mine of information on how modern AI models work Bread analogy was extra nice - I finally understand why diffusion models struggle with different resolutions
You are my fav AI channel so it would be great to hear your take on Yann LeCun idea on how to build human level intelligence. He held a talk about this on the Hudson forum recently. Instead of LLM:s he wants to build models that truly models works by predicting the state of the world given some action. I can see how that would be a very effective model, but I suspect it will be easier to get around all the short falls of LLM, than to build this fancy model LeCun suggests. What do you think?
> I am really good at having great timing - cloud,By on making videos about an area with research speed bonus modifiers correlated to the number of youtube videos about it =P
@@raspberryjam well its easy to Kind of guess! Its clearly a LLM and maybe some tts like sovits... The llm will prolly be something like Mistral as Qwen needs commercial and Llama the 'Built with Llama' etc. He said there is an LLM as a filter and a way for the Ai to feel emotions. He said something about watching movies and having feelings.