My five year old is sitting next to me watching the panda play the guitar and his eyes got big then ran off. I'm obliged to explain AI video tools to him now. What a strange world to grow up in.
This definitely beats Sora in most aspects. Anyone who says otherwise is just anti-chinese. But the fact that Sora got dethroned before it was even released is quite impressive. I think that 2024 is going to be remembered as the year when China started to surpass the US in AI development and robotics.
The quality does not look definitively better than Sora’s to me, although it’s in the same ballpark. The larger point is that it would be extremely naive to blindly trust any results coming out of China as being fully authentic. We can barely even trust our own companies not to doctor their results (Google was just caught doing it last year), so the idea that we blindly trust results coming out of China is pretty foolish IMO.
Researchers are training text-video models at least 3 years ago. There is a website to download those models, I couldn't remember the name, but it was a professional website for researchers. Sora doesn't look special, but they have the money and equipment to train high resolution videos.
This model absolutely does NOT “beat Sora in most respects,” and it is not remotely “anti-Chinese” to say that. It’s an impressive model for sure. It’s not quite at Sora’s level, based on what we’ve seen. That’s just a fact.
There is something really powerful about being the first to put it out there and even if it’s second-best, it might get a lot of adoption and people will build workflows around it, and it will win in the long run
@TeamLorie Prompting into an AI system is not the same as mastering an application like Autocad, Premiere, Protools, Maya, Excel , etc etc ... If you get a more powerful data node that can be feed with your history ( for those attached to their past) , just migration and prompt in the new platform will increment the productivity.... that thing of staying with the first developer is a thing of times where/when technologies develop slow and require a set of specific skills to master the resources... slow humans are animals of habits, so changing their habits is difficult for most of them.. but Prompting is an habit that doesn't make a single platform an exclusive necessity ... Too, when you got systems that can make from the Prompting up to the final outcome... what people will take will be the better and cheaper... If the cheaper is the best, eventually, everyone will tend to migrate into those.... the issue is that greedy Westerners would try to set their platforms for preventing users migration of data... but that will make their platforms obsolete compared with fully open and compatible open source platforms.... With time that thing of "branding " will disappear because Prompting is not about who makes the outcomes, Prompting is about who enjoys those outcomes.... If you enjoy AI outcomes, it doesn't matter where they come from... If you like more the AI outcomes from new platforms , you don't need to get attached to obsolete platforms.
if it was VHS vs Betamax, sure. But things are moving at such pace now the competitor will be realasing bluray in a couple months and not betamax to compete with your VHS.
What a capable and incredibly convincing depiction of realistic reasoning, showing a very consistent ability to output remarkable accurate impressions of this impressively consistent video.
IMO people aren't talking about the most important factor about AI video generator that is how much customizable the videos are and the ability of following a prompt, it can create perfect videos but if it's as limited as image generators, then it's useless
Documentary Lifestyle 60p is an untapped market atm. Most of the current models appeal for that cinematic movie feel. So this is timely, lots of potential for long form storytelling. Exciting times! Let the games begin 🏹
@@a.nobodys.nobody 60 frames per second-which is what live tv, most documentaries, sports, vlogs etc. use. Though it looks like it can only output 30p atm. Movies use 24p for the cinematic look. So this model opens up an entirely new market of lifestyle storytelling.
Good to see some real competition in the field. I’m sick of OpenAI not releasing more useful models at a consistent clip. And let’s not even talk about flops like Gemini. Claude has also been shit lately, at least for my use case. The beauty about AI is that you don’t have to be married to a model. If a better one comes along, it should be a plug and play replacement.
@@charlesromelus3 Claude is not shit, it seems nerfed for some use cases. I’m talking about Opus specifically in the health field. GPT 4-0 is better than Opus at certain tasks, not even close. Opus is better at conversation, feels more natural. At least that’s been my experience.
I was at the WAIC scene, and the head of Kling said that they were iterating 3 times a month, indicating that the learning ability of the video large model is still very strong, and the emergence rate seems to have exceeded the large language model and the picture large model. In addition to the launch of the HD version and the first and last frame control, there are also camera control, a single generation of more than 10s, online web side. The Kling official at the site also said that some other functions have been developed, but have not been released yet. These functions include character ID retention, voice face matching, screen structure control, etc
What I liked about the Sora demonstration is that along with their best ones, they showed its flaws too. That way we could see it's limitations, potential, and ways they can improve it. Kinda wish the Chinese Sora did this too.
Yes, kling shocked me so much. In addition to text input commands, it also supports image input commands. The generated effects are very shocking and can’t be seen to be fake at all.
The main eating Ramen is AI generated. Look at his left hand. His finger morphs around the bowl and you can see the wrinkles on his right hand fingers change when he moves the chopsticks. Pretty amazing stuff China.
That's very impressive. It looks diffusiony from some of the morphing but the composition is outstanding. It makes me wonder if they're constructing a 3d model and from that, creating/combining edge & depth maps with ray tracing to create light maps, maybe with a fine contrast map for intricate details, then plumbing that through a diffusion model for finishing. If there's no 3d model involved, this AI clearly understands the world better than we give it credit for. I especially like the train, the night sky keeping maintaining all the stars and the guy eating noodles, wow!
It's because models like Sora and those from China are all Video Diffusion Transformers, a new kind of model, they will basically output the same Diffusion images style but in videos.
The plate behaves like it is on a table, completely static, while it is being hold by his left hand. The level of quality is remarkable enough to deceive us at first glance, but a huge part of it is due to the fact that we are not used to question whether a video is real or not. When you start to observe every detail you spot many inconsistencies, but they're completely ignored by our brains when we're not paying close attention to them, which is really shocking!
Also there shouldn't be any yellowish liquid on his mouth after he has eaten the noodles..beacuse when they are being pulled off the plate..there is no liquid stuck to themm..the are just white..so thats another hint
Are these video's being written by AI? The tend to be getting more pretty, pretty, pretty gibberish, hyperbolic and filled TRULY silly hallucinations.... Sora stomps this on every level, from fidelity to consistency... not one did I need to double take on this video as to whether it might be real... the was not TRULY SHOCKING about this. I guess Wes is the last man standing of the first wave of successful AI channels that hasn’t gone total cringe.
If you can't notice that hallucinations are 10 times less than SORA, you must've pretty shitty eyes. Sora was unable to respect dimensions and the consistency was highly lacking, visual glitches and impossibilities were omnipresent and the respect to the prompt only worked for very short videos. Those problems have been greatly reduced. You claim that it's laughably bad? Alright buddy. Ler's take one random video: The astronaut. What's clearly visible during your first watch which gives away the fact that it's AI?
@@Vitaphone Then go ahead and tell me what I asked for. If you can't, your point is BS... Bro you're attacking their videos but can't give examples of why. All I do is point it out. Who's really wrong here?
@@Vitaphone It feels like comparing Stable Diffusion 3 with Dall-E 3, just they have different styles, they're the same kind of model (Video Diffusion Transformer).
So it beats SORA in the AI category of "things we show people, but you will apparently never have access to". Honestly, I'm beginning to really HATE these demos, because we cannot use them. What is the point? Give me cool new AI tech I can actually use.
How can we be sure that Sora is really capable of what we were showed? Its been months since presentation and still none can use it. The same is about 4o
It’s not as good as Sora - not far way but it’s less stable. Check the paving under the bike in the 2m video. Or the typers and sides of the car. It has flaws yet that Sora doesn’t have.
Lol, training Sora is like training GPT-2, it's just a new kind of models (Diffusion Transformer), no secret sauce behind, of course there will be plenty of competitors catching up.
This caught up majority of things that come out of China anyhow are usually fake but even if they did end up generating really good video what does that supposed to mean a video generator? Yeah it’s impressive but the funny thing is is you’re also gotta look at who is producing the product it’s like saying wow the devil can really play some music
Yeah even sora from months ago is better. Compare to the music videos/ transportation video/ dog in Italy sora is better. Not to mention sora has progressed and is far more capable than we are aware.
Consistently consistent in a consistential consistentiality. Truly. Jk. Very impressive. But Sora isn’t even out, was a simple capability demo in an extremely limited release. I guess this isn’t out either, but the fact that it’s up to 2 mins long is really impressive. I can’t wait for the time when we’re able to create our own home movies with ourselves as the main characters. Who needs actors when you can be the actor?
IMAO SORA has nothing to sweat about. This one has train tracks appearing out of nowhere. (5:62) and lots of other eye jarring artifacts. The desert between the rider's body, the horse, and the reins was much grayer than the rest of the desert. This is good. It's not up to great yet. {^_^}
The buildings and cat's ears reflected off the car in the driving one... I have to be skeptical because these companies usually put their best forward, or hide any tailoring they had to do. But even still that's impressive.
My Ai-Asian isekai desire is the real thing. Hmmm case in point! That rabbit isn't myopic. I am enjoying the ultra effects. I am so happy. I don't have to "toy with fantasy" for the sake of your humanity. Strong Concept combination Ability is my pleasure now.
Leon Ding helped them ? "The FBI estimated in its report that the annual cost to the U.S. economy of counterfeit goods, pirated software, and theft of trade secrets is between $225 billion and $600 billion."
most likely scenario would be pre-training using video that closely matches, or even worse, is filmed specifically to match, the specific prompts. Without a release of any of these ai tools (chinese or not) there is no way to trust these announcements. In depth analysis if the marketing results is pointless.
I think it just better that more countries join, so it kills that America 1st. trend which lately has been taken over, companies first realising US and then only later to the rest of the world.
Alright.. A little bit of geography lessons. The red sunset looks really cool, but it's impossible to see it like that BOTH in front of you AND in the car mirror, because we only have one sun in our solar system. So yeah, the colors are indeed impressive, but the video can be seen as generated right in the first second when you watch it. And don't ChatGPT me now; there are some rare exceptions, but this is not it :)
Not at all, you just look at that bicycle boy video, that's the reality(I think their mothed is basically cobble images together), however this generator proves scaling law works again.
The Chinese guys are missing a finger. And the cat one, the people on the sidewalk are walking backwards. It didn’t translate relational movement properly.
Honestly, I hope we have access to some legacy software. Like the stuff that made the whacky will Smith eating spaghetti. That kind of mental stuff. Though I suppose you could just include that in the prompt, huh?
"China just went ahead and released" Wrong. If it's not available to use, then It has not been released. You don't consider a video game to be "released" when it's in closed alpha or beta testing, do you? It's released when it's ready to be used by everyone.
It would be great if the Chinese trolled and showed real videos mildly processed to appear simulated and said they are AI 😄 BTW I play your videos at 1,75x speed and these demos look pretty pretty real.
sorry but we are looking to different videos then, are you sure this is better than Sora? you still want to extract "extraordinary" from all your videos
Given how things can be faked in demonstrations, I wouldn't be surprised if there were real videos in here and told they were AI generated. An example being 9:35.
That is AI generated. Look at his left hand. His finger morphs around the bowl and you can see the wrinkles on his right hand fingers change when he moves the chopsticks.
Kuaishou is a short-video platform , rival of TikTok. But it mainly operates in China. So they can use abundant videos as reference to generate short clips
This is not about catching up. They just steal models and adapt them for themselves. This makes it so they are consistently just 1 year or so behind. We are not seeing the latest from Sora etc
It already is unfortunately you need a Chinese phone number its only available on china but a couple of twitter accounts have been accepting prompts and showing the results
You haven't even tested Sora, besides Sora is like the LLM of video generation, you can train your own if you have the compute, its training process is freely available in some research papers