Kyutais New "VOICE AI" SHOCKS The ENTIRE INDUSTRY!" (Beats GPT4o!)

TheAIGRID

Подписаться 248 тыс.

Просмотров 58 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

2 окт 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 297

@JohnSmith-gt3be 3 месяца назад

More pressure for OpenAI to release GPT-4o voice. Good.

@tracy419 3 месяца назад

Hopefully they are smart enough to release it when they are actually ready, and not when people think they should because there might be competition.

@petersmyczek2297 3 месяца назад

@@tracy419 💯

@timber8403 3 месяца назад

@tracy419 Right. Openai have nothing to be concerned about. I tried this thing out and it was very poor.

@brandongillett2616 3 месяца назад

Hasn't worked with SORA 🤷‍♂️

@bdown 3 месяца назад

@@tracy419then they should’ve waited to announce it until they were ready to release it

@shidheadmemes 3 месяца назад

im fucking SHOCKED, my legs are SHAKING, this QUITE LITERALLY BLEW MY MIND, my grandmother STOOD UP from her GRAVE because she was so SHOCKED

@LulzAsh 3 месяца назад

But are you stunned?

@peristiloperis7789 3 месяца назад

you are just beeing silly.

@drrains 3 месяца назад

I too am shocky shocked

@Bregylais 3 месяца назад

THE INDUSTRY is in SHAMBLES!

@madshader 3 месяца назад

Mattvid pro did a demo with it just now, and Moshi is pretty bad.

@Silas2-p7c 3 месяца назад

This is a GPT2 moment. It’s only a matter of time before voice models become the new LLMs.

@brandongillett2616 3 месяца назад

True

@EDLR234 3 месяца назад

To the complainers. It's all in the context. It's a tiny quantized model, open source, and made by a small independent team of just 8 people, from scratch, in 6 months! It's so small their aim is that can run locally on device, and it's actually a true multi-modal model. It's like having a conversation in real time even if it's still very janky and awkward at this point. With this context, it's astounding and the experience is like nothing I've experienced in AI so far. There is no distance from speaker, it's like it's right there listening and responding without any barrier.

@edgardsimon983 3 месяца назад

that look awesome

@eduveld 3 месяца назад

Thanks for sharing this info!!

@Xrayhighs 3 месяца назад

Proof, at least for myself, that you dont need gigantuous datasets to train these models. Its less elaborate in its answers, but still competitive and probably first in its niche.

@enesmahmutkulak 3 месяца назад

are you sure that its open-source and can be downloaded rn?

@newunderthesun7353 2 месяца назад

So what? Regardless of their motivation or resources it still sucks. Because they are boutique small models does not make it any more useful.

@MojaveHigh 3 месяца назад

It starts off well, but at least for me, after about a minute, its functionality drops significantly, it starts repeating itself and just not understanding anything anymore.

@strangereyes9594 3 месяца назад

The subtitles are hilarious.

@egrinant2 3 месяца назад

Yeah, it's like he doesn't care or put any effort into making quality content.

@edgardsimon983 3 месяца назад

@@egrinant2 he have no time it seem for sure lol

@Shulyaka 3 месяца назад

This is to make you even more shockingly shocked

@zdenyn17 3 месяца назад

Just listen to Ojo!

@mrwoodcat 2 месяца назад

@@egrinant2 nowdays subtitle only there to put viewers attention span working not to clarify the actual speech as it previously should.. lazy times..

@jwetzel3141 3 месяца назад

Today I learned that pirates have an American accent.

@newunderthesun7353 2 месяца назад

It was a valley girl pirate.

@MarshalArnold 3 месяца назад

Subtitles: the first movie was called Matrix released in 1990 😂😂

@TerragonDE 3 месяца назад

Main Character is "Yo" !

@SoumyaMaitra2008 3 месяца назад

You nailed it

@ElectricEric2030 3 месяца назад

**runs in circle while screaming**

@donaldclark1019 3 месяца назад

I demoed today and tried to ask more about the Matrix. Apparently Neo was a rebel pilot who teamed up with a hacker to fight an AI controlled by an evil corporation. Its voice options were limited and seemed to always hallucinate a response and then say "sorry im here to help"

@YouLoveMrFriendly 3 месяца назад

Confabulate. These models don't hallucinate.

@robxsiq7744 3 месяца назад

small model issues no doubt, but its really about the starting steps to look at here, it'll only get better now. go back 12-18 months in open source llms and you only had small context insane models that couldn't do any of this stuff. now, we got damn near GPT-4 level punching with Llama 3. So...6 months of community developing and it'll be pretty damn ace.

@Ricolaaaaaaaaaaaaaaaaa 2 месяца назад

They swapped the model for a lesser one on the backend due to bandwidth.

@jimlynch9390 3 месяца назад

Yes, the latency is impressive. The responses aren't quite as good as Pi for instance. Or the yet to be released gpt assistant. Like lots of things AI, it's only going to get better.

@StephSancia 3 месяца назад

Hey Pi Ai by Inflection is THEE BEST 🔥 been using it 8 months absolutely awesome

@reezlaw 2 месяца назад

It's hit and miss but when it works it's unbelievable. The response time is superhuman and when you get good relevant replies in less than 200ms you really get a glimpse of the future. Of course way more often it goes nuts, starts repeating itself, loops and stops listening, but if this is the beginning and they keep training this has huge potential IMO

@edgaral 3 месяца назад

why is it always, businesses use the worse examples for AI to use its capabilities on? its was cringy, especially the pirate one, makes it sound as if their audience were a bunch of 5 year olds Is it that hard to just fake a whole conversation than to talk to its audience like they were dumb? 😂

@Dreamingofyou317 3 месяца назад

Because they’re nerds

@andrewai2001 3 месяца назад

Um im stunned they thought this was ready for demo

@dreamyxqc3812 3 месяца назад

open ai will still be releasing gpt 4o in the next coming weeks ( infinity )

@ppowell1212 3 месяца назад

I think that two way conversations is going to be the way forward.

@incription 3 месяца назад

the demo is gpt-2 level noway near 4o lmao

@Tilofus 3 месяца назад

Can't wait to try any of the State-of-the-Art Voice Models

@Aggie4life77 3 месяца назад

I’m looking at this chat….yall pissing me off! Don’t act like this didn’t just blow your minds! 🤯

@-schattenpflanze-3755 3 месяца назад

This aint anything crazy. Lets see how good it is in 5 years.

@surfside75 3 месяца назад

Hard to interrupt the damn thing😂

@durtyred86 3 месяца назад

@@surfside75 GPT 4.o already has that implemented..... Wherever the f*ck it is...

@justinwescott8125 3 месяца назад

Watch MattVidPro's video about it. He tested it live and it was awful.

@drowzy2309 3 месяца назад

Right? The only people who are pretending are the iPad kids. Even AI developers are impressed.

@Sudain 3 месяца назад

Inflection is not emotion. Don't confuse them.

@edgaral 3 месяца назад

Agree, especially when AI's aren't capable of feeling emotions, but rather use their programming code to act as if they had emotiones in response to certain contexts lol

@gofastER 3 месяца назад

William Shatner would beg to differ.

@martiddy 2 месяца назад

So fear is not an emotion?

@j.d.4697 3 месяца назад

Damn, the "entire industry" is "shocked" pretty much every day according to you.

@surfside75 3 месяца назад

Same manager runs Scottys auto channel😂

@jeffkilgore6320 3 месяца назад

Each day a yesteryear Nobel Prize is won. The word “shocked” has become a self mockery that reminds us that while we should be shocked, somehow, we’re not.

@pchungvt 3 месяца назад

just tried and it wont use emotion, and it also hallucinate a lot

@brianmi40 3 месяца назад

Hallucination isn't strictly a problem. It's been realized it is a path to innovation. You have to think out of the box to come up with new solutions, and hallucination is a form of that. We realized this the very first time Alpha Go came up with a move that the best human players thought was a huge blunder. It was SO far out of our framework we needed to do in depth analysis to realize the genius of it. AI models have a sliding scale that is applied, scaling from Factual to Creative when in use. The goal is NOT to eliminate hallucination and creative thoughts, but rather do so ONLY when the scale is set to 100% factual. There are multiple methods being pursued, including data input, as well as post training editing.

@danielchoritz1903 3 месяца назад

@@brianmi40 Hallucination is used here to describe then the AI does not answer the question, but made up stuff that looks like a answer. Variation is good, no output control or re-verifying her own result with self created questions keep a AI under the virtual human age of 6 years. It can speak, can remember, can answer, but lies and hasnt any moral/ethics. A AI that ask you if there are multiple ways to answer you for further narration, a AI that surprises you with a question, so you understand for yourself that you are asking for..this is the next step to reach agi. Pure scaling means just FASTER AI. A qualitiv jump may with a very high chance help you to close the gap and speed away in a short time, because self-improvement needs this step. So..for my understanding you are good on public "AI-Business" speech, but without any real argument, why hallucination is a good thing. Alpha Go did make a legal move, a hallucination would mean a move like J2-5. etc.

@YouLoveMrFriendly 3 месяца назад

They're confabulations, also known as false memories. When you're grandpa is spinning yarns about his past with stuff he's misremembering, you wouldn't claim he's hallucinating.

@brianmi40 3 месяца назад

@@YouLoveMrFriendly It's just the term that is media "friendly/catchy" to use. Papers have discussed now how it's not a "bad" thing per se, and that coming up with ideas LLM's are NOT trained on IS VITAL to fashioning new and novel solutions. The trick is getting the "Facts - Creative" slider that LLMs allow the user to set, to go FULL ON Facts when desired...

@neomatrix2669 3 месяца назад

If you use quantized versions, it will really hallucinate a lot.

@MaximilianFeichtinger 3 месяца назад

The web version is either way better or the first example are cherry picked out of millions of tries. Because my conversations with the demo are the same as the demoed offline version - horrible.

@TechNinja_420 3 месяца назад

the speed is insane, but the quality is dog shit

@trycryptos1243 3 месяца назад

Mind blowing stuff! Call centers with humans are a thing of distant history!

@JoelMorton 3 месяца назад

So.. 70 preset (or canned) styles and voices is supposed to be better than GPT4o?

@cyberS_2024 3 месяца назад

Tried it a few times. It's not very good.

@daydrip 3 месяца назад

What were some cons

@Ricolaaaaaaaaaaaaaaaaa 2 месяца назад

They had to swap the model being used because it was overloaded.

@TheKevphil 3 месяца назад

On what effin' planet is this "shockingly" *_anything,_* other than mediocre?

@Alehandro_mrt_bg 3 месяца назад

For the past 5 or 6 month none of the released videos or news were actually shocking or INDUSTRY DESTROYING

@Techtalk2030 3 месяца назад

Man open ai needs to release something really good and fast, theyre losing the race to claude, gen 3 and now this.

@Techtalk2030 3 месяца назад

Im hoping gpt 5 and sora will be something absolutely amazing

@brianjanssens8020 3 месяца назад

@@Techtalk2030 They won't be. Luma is literally destroying the competition for AI video.

@Techtalk2030 3 месяца назад

@@brianjanssens8020 luma isnt that good ib my experience. Its free but gen 3 is much better

@Techtalk2030 3 месяца назад

@@brianjanssens8020 luma isnt that good compared to gen 3 in my opinion

@kutagaru 3 месяца назад

@@Techtalk2030 very true, although Luma is getting more attention because not only did it come out first, it's also sort of free.

@OumarDicko-c5i 3 месяца назад

From my test it is very bad, it stop answering and don't understand well

@pb2806 2 месяца назад

Can you sing a song? Answer: No, I can't Can you whisper? Answer: No, I can't Mm. Thank you very much.

@kasperzier7391 3 месяца назад

The software creating your subtitles need to be further trained on detecting french accent !

@DailyTuna 3 месяца назад

Thank God, we’re back to shocking!! I miss being shocked.

@Ginto_O 3 месяца назад

when he said "shocked the entire industry" on 0:40 i stopped the video and disliked it because i dont want to be shocked

@tracy419 3 месяца назад

@@Ginto_O❄️

@elivegba8186 3 месяца назад

@@Ginto_O🥶

@DailyTuna 3 месяца назад

@@Ginto_O If you’re going to co exist with AI you must be comfortable being “ shocked” daily.😂

@Tayo39 3 месяца назад

Who? Yeah.. My bet is more than ever that AGI is gonna be achieved in some mom's basement by a 15 year old nerd

@brianmi40 3 месяца назад

Sure, as long as the 15 year old has $50M worth of Nvidia cards at his command.

@NLPprompter 3 месяца назад

well actually ilya did that

@ashtonphoenyx 3 месяца назад

This does not beat Chat Gpt4o

@frankroquemore4946 3 месяца назад

The voice didn’t actually let itself be interrupted in the demo. The guy just injected conversational noises to make it sound more natural but this isn’t any different than what we have besides emotiveness

@robertlewis2542 3 месяца назад

ah that's why it felt off and reminded me of my kids subterfuge, thanks for putting a finger on it for me.

@Barc0d3 3 месяца назад

IT IS NOT SHOCKING NOR IT IS INCREDIBLE, THIS IS LITERALLY THE MOST BELIEVEABLE TIMELINE TO RELEASE SUCH A THING

@drowzy2309 3 месяца назад

Just because it's inevitable, doesn't mean that it's not amazing.

@Barc0d3 3 месяца назад

@@drowzy2309 I did not say its not amazing, it is :)

@sonyphotoguy6601 3 месяца назад

Why are you screaming captain Capslock?

@Xrayhighs 3 месяца назад

With the resources and time given, this is a very very impressive result! It shows how common voice and llm ais are and that this is already an established technology. Its a base to start from and also can be competitive with low costs.

@Barc0d3 3 месяца назад

@@Xrayhighs I agree, it is impressive. It's not shocking though.

@robertvondarth1730 3 месяца назад

It sucks

@RevealAI-101 3 месяца назад

Yeah, it's fast, but meh 😑

@atlas3650 3 месяца назад

Knock knock. Who's there? Interrupting cow. Interrupting cow wh-- Moooooo!

@justcallmebrian793 3 месяца назад

This product is utter trash, this is not chatgpt 4o competitor

@VampyressVA 3 месяца назад

Well, I just tested it and, while the latency and flow are really impressive, the LLM itself leaves a lot to be desired. I will check back in a couple of months to see how far it will have improved.

@vihangnair 3 месяца назад

🎯 Key points for quick navigation: 00:05 *🎭 The voice AI can express over 70 emotions and speaking styles, including whispering, singing, and accents.* 00:27 *🤯 The AI model revealed by caai is state-of-the-art and shocked the industry with its real-time conversation capabilities.* 00:54 *🗣️ Moshi, the voice AI, can respond with lifelike emotions and incredible speed.* 01:06 *🇫🇷 Moshi demonstrates speaking with a French accent by reciting a poem about Paris.* 01:47 *🏴‍☠️ Moshi switches to a pirate voice and discusses pirate life.* 02:56 *🕵️ Moshi uses a whispering voice to tell a mystery story.* 03:22 *🎬 Moshi narrates the plot of "The Matrix" with detailed accuracy.* 03:54 *⚠️ Discussion on the current limitations of voice AI, including latency and loss of non-textual information.* 05:02 *🔄 Explanation of the new approach to integrate complex pipelines into a single deep neural network.* 07:16 *🎤 Demonstration of Moshi understanding and generating speech by listening to a voice snippet.* 08:13 *💡 Moshi thinks as it speaks, generating both text and audio simultaneously for richer interactions.* 09:12 *🔊 Moshi supports dual audio streams, allowing it to speak and listen simultaneously for more natural conversations.* 10:20 *📞 Example of Moshi's conversational capabilities using historical data sets.* 12:23 *😮 Moshi can express over 70 different emotions and speaking styles using a text-to-speech engine.* 15:59 *📱 Moshi can run on-device, ensuring privacy and security by eliminating the need for cloud processing.* 18:36 *🔐 Measures are in place to detect and watermark audio generated by Moshi for safety and authenticity.* 20:11 *🌐 Demonstration of Moshi's real-time conversational capabilities, showing quick responses and lifelike interaction.* 23:34 *🚀 Moshi represents a revolutionary advancement in AI, promising significant changes in AI-human interactions.* Made with HARPA AI

@Rick.Fleischer 3 месяца назад

The locally computed part of his demo fell to shit.

@Dose_0x0 3 месяца назад

Why can't you interrupt it, it seemed on many occasions to continue burbling on for several seconds (unlike GPT4o) and the voice is like erm..rubbish, like 90's level speech synthesis, particularly when you talked to it at the end, yeah....bit rubbish I thought.

@Aggie4life77 3 месяца назад

I literally read your comment with a British accent lol!

@WillBurns 3 месяца назад

After actually trying the demonstration - the voice wasn't as good (sounded like typical TTS), and the LLM response was premature. Which is to say: It often did not wait until I had finished a sentence before it jumped in and tried to respond. They need to work on more natural timing response.

@franzofmotion 3 месяца назад

On the one hand, this demonstration is super impressive, but I've tried it multiple times and it's super buggy After two minutes of conversation, it just got stuck and told me constantly that it's playing and then it's not playing. Seems super cool, but something is not working

@quaterman1270 3 месяца назад

You see how he tries to speak without pause otherwise the LLM will interupt and prcess what he said. When they improve that, it will be a big approvement imo. I have it all the time when I have a longer quesiton or one with more paramteres and I think for 2 seconds, it just switches and anwers what I said. I think in that case a button would be good, to just listen until I'm finished. But I think wie will need bigger conext windows for that. Maybe 250k will be enough for that.

@anta-zj3bw 3 месяца назад

I bet you that conversation at the end got you really, REALLY excited.

@Rorama2024 3 месяца назад

The AI community should really stop with the announcement effects because when we test afterwards we realize how rotten it is. The thing doesn't even understand when I speak to him in French which is his mother tongue, it reminds me of Google I/O announcing multimodal Gemini, and a paid subscription when that is not yet the case...

@teslamoneyman426 3 месяца назад

I just demo'ed it. It sucks.

@techdiasphere 3 месяца назад

Pi is my preferred conversational AI due to its real-time internet access. Pi provides the most current information and answers, making interactions dynamic and informative. Pi's continuous learning and improvement facilitate more in-depth and accurate discussions on various topics.

@cyberpunkdarren 3 месяца назад

It doesn't sound that great really

@EvaldasJocys 3 месяца назад

AI often continues talking because it fails to detect subtle human cues, such as intonation, that indicate a desire for the AI to stop or change the topic. To address this issue, AI needs to convert audio files into human-readable text files that capture the full context of speech. This involves creating timestamped subtitles, emotion data (including type and intensity), hidden meanings, and voice properties like speed and pitch, deviations from normal (to indicate emotions). By including detailed annotations, AI can better understand what humans want while not directly saying it. Additionally, recording emotions expressed by facial expressions and body movements is crucial, as part of the meaning can be conveyed through these non-verbal cues.

@Rorama2024 3 месяца назад

AI SHOCKS The ENTIRE INDUSTRY! every 5 minute, will be cool to vary your slogan, no ?

@justcallmebrian793 3 месяца назад

This is embarrassing demo. Lol

@newunderthesun7353 2 месяца назад

Sounds like crap. Bunch of overacting high school drama club voices. Worst pirate voice ever. Excessive guardrails, too. That whole scheme AI has about not answering questions, referring you experts, yeah, we know you are not human. That nonsense is annoying and will not change anyone's behaviour.

@taomaster2486 3 месяца назад

The demo is just pure trash lol

@CaptainKokomoGaming 3 месяца назад

Did you see Wes Roth messing around with it? I am pretty sure he cherry picked the worse of the bunch but it was messed up.

@DailyTuna 3 месяца назад

So cool yet so irritating! 😂

@kabosune9097 3 месяца назад

Downvoting every vid with these sorts of titles. Cause im SHOCKED

@sorrychangedmyusername3594 3 месяца назад

I WANT MY AI GIRLFRIEND ALREADY

@frenchimp 2 месяца назад

Not very impressive. Sounds very artificial, repetitive, belabored, GPT 4o is much more impressive. By the way the human presenter sounds rather robotic, perhaps to make the AI sound relatively better.

@alkeryn1700 3 месяца назад

ShOcKs tHe eNtIrE InDuStRy

@casynovids 2 месяца назад

I'm a "in-game " footage victim from the early to mid 2010s.....a BIG Preproduction can be very deviceing...But I hope this REAL TIME

@StrawHatlufy 2 месяца назад

i tested the real time demo , online but its seem some time it get hallucinate and also some of answers are not very clear like gpt 4 , but i agree speed in insane

@drcanoro 3 месяца назад

I love where AI is going, now they need to give AI full freedom on voice manipulation, sound like a gnome, or a rapper, or an old man like David Attenborough, or with an American southern accent.

@Lolindir_Fox 3 месяца назад

I don't see how this is better than Chat GPT 4o much worse actually.. I'm not impressed with this model, I've been using GPT 4o and its a lot better than this model

@swooshdutch4335 3 месяца назад

in regards to your own recording, it thinks its conscious and a person because the devs prompted it to

@robertlewis2542 3 месяца назад

I can't put my finger on it but this ah 'presentation' seems off... for some reason it reminds me of my kids (when they were little) attempting subterfuge.

@beerkegaard 3 месяца назад

This Ai model is trash

@balazsgonczy3564 3 месяца назад

How can it master emotions if emotions are not universal? Some tribes does not know what happy mean.

@PRepublicOfChina 2 месяца назад

this is going to be so great for every AI girlfriend app. imagine having an AI girlfriend who can imitate any accent, and sound like anyone.

@dmitriyvollmer6716 3 месяца назад

Are you all serious? This is raw piece of cringe. The should have do the presentation a bit later when it can do at list something, and sounds natural

@johnaldchaffinch3417 3 месяца назад

Similar to gpt4-o what they showed in the demo is far from what we get to use.

@ismaelplaca244 3 месяца назад

Industry must be dead by now

@DiceDecides 3 месяца назад

remember hume AI? this is not much different, little faster latency that's about it

@daz9882 3 месяца назад

The differences between pause and response makes me feels like it's scripted.. hmm..

@m3nafsy 2 месяца назад

i tried it and it's so stupid as an ai and nothing in the video is true ( just the response time is real) (for an open source it's perfect ♥️)

@Eddierath 3 месяца назад

Weekly rollouts like this should be happening in the medical field. Just walk through any medical operations ward and listen to the humanity in there.

@TRFAD 3 месяца назад

Wow now when the AI take over happens skynet can chase me down with a pirate accent

@bikramb001 2 месяца назад

😂open ai laughing....if you know You know 😅 its too far from accuracy. Sorry all i can say worst .

@WiredWizardsRealm-et5pp 3 месяца назад

I think chatgpt o voice model is still better.. than this . but one thing it has latency and low size n can run in device

@gabrielkasonde367 3 месяца назад

OpenAI making potential competitors doing clown work and clones while they are onto the real deal stuff

@ezramantini8078 3 месяца назад

OpenAI, a team 8 people build this from scratch…WITH A BUNCH OF SCRAPS!

@JOHN.Z999 3 месяца назад

Amazing!!! 😱

@georgewashington3012 3 месяца назад

OpenAI’s original feminine voice was better than any of these. Too bad they cancelled it.

@NicklausCloutier 3 месяца назад

It’s not any more exciting than 4os demo but would you look what happened with that Lol

@patrickcoan3139 3 месяца назад

So dumb

@aquetheblues 3 месяца назад

I'm french and I can tell you that regarding the french accent, there is some work to do. 😂

@freyna 3 месяца назад

A french Ross Geller. Its green speaking eye is very reminiscent of Hal. I'm sorry Dave, I can't do that. I think Moshi should be the AI leader of the uprising.

@haiffy 2 месяца назад

You're saying I'll be able to have daily conversation with my waifu?!!

@vannoo67 3 месяца назад

It talks! Yeah, it’s getting her to shut up that’s the trick!

@cassianomartin2699 2 месяца назад

Shocks? Bro this is the new common, nothing new in here, only a better TTS model

@Yogsoggeth 3 месяца назад

Gee thanks for the huge subtitles right in the middle of the screen where a video should have been. Hot tip, CC is optional on the site you don't need to force feed me your script, because my ears work fine thanks. And if they didn't I would turn the CC on if I wanted it.