The Secrets Behind Voice Cloning & AI Covers

bycloud

Подписаться 158 тыс.

Просмотров 73 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

30 сен 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 171

@bycloudAI Год назад

To plug the sponsor: try everything Brilliant has to offer free for a full 30 days, visit brilliant.org/bycloud . The first 200 of you will get 20% off Brilliant’s annual premium subscription! P.S. Nothing in this video is voiced by a real person. All the voices are fake (except for 12:32 lol) The first 1 min (0:00~0:58) is generated using voice2voice with my real voice as the reference. 0:58~12:47 is generated with the combo which I mentioned in 11:46. From 11:46 till the end is all ElevenLabs Pro Voice Cloning.

@bycloudAI Год назад

@@thelegendguyofficial dw the music and the content is not HAHAHA and will probably not be anytime soon here's the music yt link ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-0RygB-mapsk.html this person makes banger lofi, go support them

@NevelWong Год назад

@@bycloudAI So.... if it's ai generated, it cannot be copyrighted, right? So if I use this copyright-free voice to train a model of, and I then use that model to narrate my own videos, that would be legal, right? I am equal parts concerned and titillated.

@jamessharpe2630 Год назад

@@NevelWongvoices in general can't be copyrighted. If it was a slogan(arrangement of sounds) or roar/yell then yeah copyrightable.

@Mark_Rober Год назад

I was thinking to myself every so often 'his voice sounds a bit fake' but I swear it was just because this video was about cloning AI voices and if you had done anything else, like make a minecraft video for example, I wouldn't even have imagined it being AI.

@Deagan Год назад

based.

@__aceofspades Год назад

I didnt realize this was AI narrated until you said it was... I just assumed the scuff in the audio was due to using a worse mic like from a laptop or some screw up when editing, it sounded off but not AI off. As much as I believe AI is the future, we are clearly going to be in for a very very rough ride from here on out. You'll basically only be able to trust that something was real if you saw it in person, no audio, no pictures, and no video will be trustworthy.

@gh0stpyram1d Год назад

fr i had a whole mental picture of how this admin looked and i realize that was a mental picture of a robot lmaoooo

@asdfssdfghgdfy5940 Год назад

Nah there are relatively simple ways of digitally signing things to prove you said them or filmed them etc. It will become a problem for the masses for sure especially if people keep believing whatever they see on Facebook. It will be easy enough for the more tech savvy peeps, or people who are required to vet things (e.g. Reporters) to work out if they are real or not. Or at least if they have been signed or not.

@quazar-omega Год назад

Then the Matrix credits roll in inside your eyes

@Kmrn596 Год назад

WTF I thought that was your voice. I guess generative AI these days is something else.

@albertsitoe7340 Год назад

I I struggle to understand how society will even function in the next 50 to 100 years

@David.Alberg Год назад

@@albertsitoe7340Bro all the experts struggle if the society will function in 3-5 years 😂

@Kynatosh Год назад

I heard artifacts so I had doubts

@rotors_taker_0h Год назад

Nice information dump, good job on collecting all this info. To be honest, this tech is good enough that I wouldn't be surprised if any of your previous videos were voiced by AI too. As a random youtube viewer I have no idea if cartoon cloud's voice is a real person or totally generated anyway.

@trollenz Год назад

Everything you always wanted to know about speech synthesis* (*but you've never found). Thanks mate for this masterclass ! ❤

@phizc Год назад

First time viewer here. When this video showed up in my feed, that click-baity title almost made me skip it, but this is definitely the best video about different options for TTS and voice cloning I've seen yet. Well done. I'll definitely stick around and see what other videos you've made.

@juanjesusligero391 Год назад

Your videos are the best, seriously! Not only do you keep us in the loop about all the cool AI stuff, but you also manage to make it super entertaining. Big thumbs up, man! :D

@wuy4 Год назад

That Asmongold cameo lol

@krishp1104 Год назад

wtf this is the first time AI actually fooled me

@ojsef39 Год назад

i was eating while watching and only notices it because of the muffle and the red line im peripheral vision hahaha

@ojsef39 Год назад

oh damn, i wasn’t at the part where he revealed it yet. im shocked hahah

@handle__ Год назад

@@ojsef39same. When I first saw the comments when I haven't yet reached that part I thought people meant the red line parts, but then mind blown🤯😮

@wham7125 10 месяцев назад

Definitely not the first time, but you wouldn't know that of course.

@marian3248 Год назад

I was watching this video at 2x speed and got giga fooled by your ai voice, I really couldn't tell this wasn't you.

@netoeli Год назад

The fact that you have to let us know that was not an actual real discord call with asmongold, as if the intelligence in the choice of words did not give it away already

@Askejm Год назад

TRUE

@shadowrealms2676 Год назад

@@Askejm BIG W!

@maki_ligon Год назад

6:47 nope. It was sovits. They used my weeknd model. Sovits is pretty good at raw studio quality vocals assuming the dataset is good. Which my weeknd model isnt it lol

@LinkRammer Год назад

I had absolutely no idea that your voice was completely ai generated... WHAT?!?!?!

@quinnherden 6 месяцев назад

Definitely not. Just that one section :)

@BHBalast Год назад

Lol, on my smartphone i cant even tell a difference between your Real voice and fake ones!

@krishp1104 Год назад

At the end he says ALL audio in this video is AI generated

@BHBalast Год назад

@@krishp1104 NOT all, there was a Little fragment. :)

@krishp1104 Год назад

@@BHBalast no literally all audio in the video is AI generated

@BHBalast Год назад

@@krishp1104 I Dont get it, in his comment he says one fragment is not.

@gh0stpyram1d Год назад

Goated Ai channel

@SyntheticVoices Год назад

It's a masterpiece 🥲😀

@lod4246 Год назад

"It's not a mistake, ✨IT'S A MASTERPIECE✨" 💀

@mastermohit Год назад

I can't wait for asmin to react to this

@icedude_907 Год назад

Thanks so much for this - this is a great place to start for AI voice generation on local machines. I'm eager to experiment on mine

@krystiankrysti1396 Год назад

"most boring" bit you mention is actually the most useful info in this video, links to websites and what theyre for

@slime-smp Год назад

Can you please make a tutorial on how to do this its very confusing

@fueledbylofi7078 Год назад

Bycloud will soon be THE AI news source as this stuff gets more complicated and controversial and eventually will be completely self sufficient and ran by its own AI models trained on bycloud AI news videos 😶

@USBEN. Год назад

BRUH made whole video with this, EPIC!

@sujimatsubackupaccount194 Год назад

RVC retains to core trained voice meanwhile sounding smooth. The SO-VIST-SVC removes most of the trained voice personallity , makes it more based on the voice in the source audio and make the voice sound flat weirdly enough, Even for talking RVC has the better strengths . Tho it suffers from sharp note transitions like c2 to c5 which can cause issues.

@stephantual 9 месяцев назад

Exactly. And don't get me started about accents ;) My 'charming' french accent is the bane of these tools.

@jan-Juta Год назад

Just waiting for Live V2V to become viable in the open source space. Would be insane for tabletop RPGs and VA for solo projects. Live RVC is kinda working, but not very well.

@4.0.4 Год назад

VA for solo projects doesn't need to be live, why trade quality for speed in that case?

@Kisai_Yuki Год назад

It already is. You can use the RVC software to create an ONNX and then take the ONNX to MMVCServerSIO. It will work with very little tweaking. The problem is that RVC is more of an auto-tune. It will not change someone's gender, accent or age. It can only create a voice filter. And what is being passed off as "AI singing cover" is really just laundering someone elses singing through this pitch tuning. So taking one singer and using it to sing a different singer, tuned ON that singer, isn't actually a cover, at least not by what the term "cover" means. But it is useful for creating a character voice. So if one were so inclined, a D&D campaign could be made very interesting by using the RVC to train voices (eg a deeper voice for barbarian troll, and a higher pitch voice for a dwarf or halfling) and the GM could create unique NPC's for characters without having to strain their voice.

@zyxwvutsrqponmlkh Год назад

You knocked this one out of the park. A+ video.

@sharptrickster Год назад

Do we currently have any TTS pipeline with good enough quality for non-english languages?

@Askejm Год назад

your best bet is probably 11labs multilingual, which still only supports a handful of languages

@sneedtube Год назад

Wew lad, one of the best vids that I watched in months. God-tier quality!

@ShepoPL Год назад

At 0:09 I realized that was AI model of your voice. It's hilarious to listen to AI talking about how great voice deepfake is 😂

@Askejm Год назад

well thats funny because the first minute is his real voice

@ShepoPL Год назад

@@Askejm You're wrong my guy. Listen carefully when he talks with high pitch and compare it with his other videos where he talks this way. You will hear the slight difference

@Askejm Год назад

@@ShepoPL no, he did narrate it normally. the artifacts is probably because we added V2V for it to be consistent with the rest of the video. as this was done with RVC v1, it leaded to some artifacting despite a ground truth input

@quinnherden 6 месяцев назад

@@AskejmHe mentions at the end that this is AI

@zenu903 Год назад

I was actually fooled too and didn't realize it wasn't his voice until he pointed it out. Any imperfection you hear could be confused with his accent anyway and his monotone voice also helps so it makes it extra hard to spot

@dudedude-su7pt Год назад

There thousands of channels like this lol. Most people don't know what voice is robotic or real

@FenrirRobu 10 месяцев назад

What's up with skipping like a dozen webuis for audio. Not just for this video but many others on the audio AI also just end up showing some barebones default UI and completely miss the projects that are specifically improving the UI and UX.

@quinnherden 6 месяцев назад

Can you suggest some? :)

@FenrirRobu 6 месяцев назад

@@quinnherden I have forgotten a few but there's bark infinity, audio webui, tts webui, then for music there's also audiocraft-webui, Audiocraft plus. RVC has some specific additional UIs, there's also the tortoise RVC pipeline but I'm not sure if it's an UI. I watched the video again and I will say that it's well researched but it focuses on teaching about the technology, rather than showing the best ways to use it. If you want to hardcore go on tortoise, mrq might still be the best (although I think already during this video mrq was migrated to mrq's audio tools or something), RVC's original UI has the most buttons and unexplained options. I'm glad he didn't mention coqui because, at least 6 months ago it was just a closed source tortoise clone.

@0LexuZ0 Год назад

Is this just me or this vid had a different thumbnail?

@Askejm Год назад

he switches it a lot after release, as does him and other youtubers often do

@kw4093-v3p Год назад

wtf I was actually fooled. I thought this was your real voice

@mineralbunny8736 Год назад

Ah that “crappy” free RU-vid course Harvard let us have 😂 I actually took the Java CS50 class there and it was very good… I like that they record them so you can watch later!

@DrW1ne Год назад

12:32 My mind blew up.

@shApYT Год назад

Watching at 2x completely smooths out any bumps that rvc has. The cadence sounds off after pointing out that it is AI.

@zjihf Год назад

Thank you I ve been searching for this so long

@VaibhavShewale Год назад

i need this tts cause i need to make videos that are usually long and i have to keep moving so that means background noise earlier i use to record room and then start recording but it used to take me over 2 weeks just to create a 5 min audio and that is too damn long pperiod. i thing need to do research in all this ool cause i dont have that much of money to invest in any of the company is offering for

@GoharioFTW 11 месяцев назад

15:12 Is nobody absolutely terrified of this? We could get to the point that someone could grab a minute of you talking and be able to use it accurately anywhere for anything.

@jurandfantom Год назад

So if I get it right. 1) record voice 2) use whisper to get transcription (+some fixes of text) 3) use text-voice model that is similar to our voice 4) use voice-voice (that model need to be trained on our own) --- -Training of voice happens once. -we are doing all of that to make our dialog more smooth, but we still make voice over to video for correct speed and length of video (not a case when video is created after voice creation).

@thedementiapodcast 9 месяцев назад

Bar none the best video on the topic. If your mother's tongue is American english, the FLOSS path is the best (use a cloud GPU for speed). But accents are unique to the person (im native french and my english is hit and miss on certain words, which currently no ai can learn, no matter how much data i give it). Even in the best case scenario, it's far from 'perfect' and the affect is overall very flat, as we can hear in this video. But it will get better over time, i'm sure.

@CrashDeluxe Год назад

I'm stupid, I just heard lots of words jumbled together; RVC, VITS, VCS, JBC, RVC, BC?!?

@l.halawani 7 месяцев назад

super interesting, as an AI Product Owner i find your videos invaluable to quickly catching up with all tech at once.

@samriddhlakhmani284 2 месяца назад

Thank god, I skipped sleep, to click on this video. Awesome survey

@stephantual 7 месяцев назад

Non ironically still the best primer on the topic - 5 month on! (which is prehistory in AI) - 🤠

@max_s557 6 месяцев назад

This is the best video ive seen on this topic many thanks brother! I sent you a message on twitter but i couldnt DM because im not verified but i would like you to help me create a pipeline.

@absence9443 Год назад

Beautiful video! Really helpful :)

@pradachan Месяц назад

so, was confused a bit -> means do we have to 1st use tortoise tts (with our voice) & then rvc(with our voice) to achieve the same result?

@akshatgarg6635 8 месяцев назад

Can you please tell how did you train TorToise TTS in your voice. I saw the repo but it is not mentioned how to fine-tune it on your voice

@ceticx Год назад

amazing video, if this isn't a 1/10 confetti video just know it deserves to be

@bycloudAI Год назад

its a 10/10 bottom feeder lol rip

@pikaa-si9ie Год назад

@@bycloudAI I'll give you a like to try to push the algorithm 👍😁😁

@l.halawani 7 месяцев назад

[solved] this is a channel by fireship, but completely run by ai

@memegazer Год назад

Some of those songs that sound good have a lot of work put into them as well. A lot of post processing as well with other audio tools

@JohnDoe-nn5pj Год назад

the biggest problem with TTS is that you need to make a transcription file for all your audio files. So tacotron needing 1-3 hrs of transcribed audio and that can take a very long time to do. RVC and SVC doesn't need transcripts so it's much easier to make training data.

@Askejm Год назад

just use whisper

@beowulf2772 Год назад

Hey! your videos are very professional and well edited! You deserve this like and comment.

@iambinarymind Год назад

Fantastic overview. Much thanks, bycloud

@4.0.4 Год назад

Might be good to mention you can run Whisper locally to transcribe audio. The large-v2 model is better than whatever RU-vid uses, even if slow.

@Askejm Год назад

Well its included by default in MRQs tortoise ui and i think RVC uses it too

@KASSIND Месяц назад

Can i usefor Indonesian language?

@StrongzGame Год назад

asmongold will definitely be the most voice cloned streamer lol

@_Sepherial 8 месяцев назад

How do I use a cloned voice to read aloud a pdf file?

@L_tlu 5 месяцев назад

9:13 they made it so you can make your own

@MidvightMirage Год назад

the sponsor is not your real voice no way it is

@KW-jj9uy Год назад

I played with allot of these free tools, and 5he most difficult part (as usual) is installing them, lol

@Bazilisk_AU Год назад

OKay... I zoned out playing Genshin with this playing on my second monitor and I hear Asmongold and go "Wait wtf !?" and I went back and rewatched the whole thing for context and HOLY CRAP I DID NOT DOUBT THAT IT WAS YOUR VOICE THE WHOLE TIME ! Man... what a time to be alive. A tad too early to pilot mechs in space... just just in time for AI Waifus and have food delivered to your door while you watch anime, explore the stars with hyper-realistic games and argue with strangers on the other side of the world about made up problems.

@Siacourage 7 месяцев назад

Best video about AI voice cloning I've found so far on the internet. I'm saving it to revisit later when I have more powerful hardware to run the Tortoise and RVC combo. In the meantime I think Eleven Labs will suit my needs. Thanks for all the great info. Subscribed.

@GavrikCat Год назад

What about BARK? But I guess it's not so good. Also, what option would be the best in terms of inference speed?

@nunuarthas8680 Год назад

we're witnessing bycloud turning himself to an ai then he's gonna upload himself to a cloud and live forever

@alibahrami6810 Год назад

What I get from this video is EEC, VTC, CCT, VTC, and HIGHGAN. 😂

@krasen671 Год назад

Weird, I've always done Eleven Labs + RVC, not Tortoise

@Askejm Год назад

well imo 11labs is already good enough quality, its resemblance that it lacks. tortoise solves that, and RVC makes up for the subpar quality

@krasen671 Год назад

@@Askejm i mean for RVC i just set the index rate up super high and it sounds good enough to be the actual person lol

@Askejm Год назад

@@krasen671 well one should be a little cautious with just jamming the index rate up. the rvc v2 is a lot more intrusive tho in my experience while also sounding better, but i feel like the resemblance you can get is just lackluster since youre limited to only 1 minute

@OxidoPEZON Год назад

Can you have this narrate your weekly AI news videos? I loved that series, and I really would watch them all the same with this voice, I didn't notice until you exposed yourself.

@YoIomaster Год назад

another great video. keep it up brother! QUESTION: I want to wait until fall because AMD is gona enable shader conversion (basically allowing high end consumer cards to use CUDA coded AI tools) until i buy a new gfx card, I really struggle learnign new things with my 6gb 1660 Super but i aslo don't ant to support Nvidias incredible greed and market anipulation. Would your ecommend me to wait and support AMD or what would be the route you would go? I want to go full Audio synth setup and im already using Stable diffusion 1.5

@yuyiko Год назад

great video. really love all of this AI content (keywords for youtube ;P )

@Beyondarmonia Год назад

That "listening to right now" hit me like a freight train. Came to the comments and happy to see everyone else is having a simmilar reaction.

@AngryApple Год назад

Bark is also very interesting

@_Everything_is_Fine_ Год назад

are we only limit to voice cloning? any voice generator that generate new voice like changing parameters or combine two voice give one new voice?

@fnytnqsladcgqlefzcqxlzlcgj9220 Год назад

WOAH i didnt notice it was AI and I work with audio constantly. trippy!

@steve_jabz Год назад

Is RVC still better now that so-vits-svc 5.0 is out?

@MnlBnt Год назад

my god this is too many tools

@SongStudios Год назад

Neat

@blackcube4 Год назад

I would have loved more time spend on other languages then english.

@jurandfantom 9 месяцев назад

At last I managed that! Thank You ByCloud !

@minicup Год назад

After about 30 seconds I realised it was AI

@liam10000888 Год назад

I really like this type of video from you! The ai news was great, but as a layman it was too scattered

@lll-yq4hu Год назад

Great vid

@Crazybark Год назад

The tacotron one sounded better than the tortoise one

@bodyswapai Год назад

Love your videos!

@andreya.l.1270 Год назад

I missed your videos man, good work, keep it up

@mlcat Год назад

for just tts VITS is one of the best options

@gameb30232 Год назад

this is so cool i wanted to do this for so long! thank you!

@nils900 Год назад

How well does the TorToiSe + RVC combo work with other languages?

@Toliman. Год назад

It would be reliant on the RVC training of phoneme and language salience of the native recording. Accents are naturally difficult. Ie accents and pronunciation is usually not neutral, so if you use a TTS to generate the non-english version, RVC will interpolate the accent and pronunciation based on the native accent it was generated with. So, if you generate an Austrian voice first, then pass it to a Japanese RVC, it will struggle to find matching properties. But, if you use a Japanese speaker to create English phonemes, and the RVC has examples of these equivalent phonemes, it will substitute. The effect is weird, which is why accents are difficult to emulate.

@knoopx Год назад

us techno bros are not into karaoke xD

@benkrararara2185 Год назад

@Kisai_Yuki Год назад

The information in the video is pretty good, but when people make the claim "you can't tell this isn't a real person" I call that bluff. The reason some voices are "good enough" is simply because people can't tell a bad voice from a bad recording. GIGO. If you have a high quality input, it will produce a high quality output. When your sources are not high quality (which is why the tr*mp/Biden/Obama video sounds completely unrealistic) it's immediately obvious, but likewise, it will not fool someone who knows that person. The vernacular will be wrong, the use of filler words will be wrong, the lack of spacing in the audio will be wrong. Like the three biggest tells/fails for TTS and VC is: - Output is too clean and consistent (true human voices have variability to their pitch, speed and volume) - The vocabulary is incorrect for the accent or age given - The accent is foreign Like don't get me wrong, I think the advancements in TTS/VC will open doors to allowing people to use their own voice in foreign dubs of material that otherwise would only be available subtitled. However, the writing is on the wall for jack-of-all-trades voice actors, because instead of doing 25% of the voices in a cartoon, the studio will just use "good enough" AI for characters that aren't named.

@Askejm Год назад

Im the editor. I listen to byclouds voice for hours. I gotta say, this AI is scaringly close. Most of the time you can't hear it, and I forgot many times it was AI while editing. Only some things it pronounces a bit oddly, and the S sounds off. But I would say, if you know this may be AI you can hear it, but if you're just a casual viewer you wouldn't realize.

@Kisai_Yuki Год назад

@@Askejm I'd say that I could tell was from the 'w' sounds. When I watched a different video I also noticed that bycloud's natural voice has a lower F0. Not every voice is going to be equal. After I watched this video, I went and grabbed the RVC stuff and tried some of my own experiments. As expected, it's extremely GIGO (I've trained TTS voices on VITS before, so I know a lot of it's tells.) RVC is basically autotune. If you know what the input is, you can definitely tell. So RVC will not work to change the gender, accent or age of the input. It will however give you a much more consistent F0. So this is immensely useful for people who want to do character voices with their own voice and don't want to damage their voice doing them.

@Askejm Год назад

@@Kisai_Yuki I think the real and fake F0 is very close. His recent videos has a higher F0 tbh. Also, is that RVC v2 with pitch guidance disabled?