A Tip on Training Better Voice Models in Tortoise TTS

Подписаться 31 тыс.

Просмотров 15 тыс.

50% 1

Links referenced in the video:
Tortoise Installation - • Local AI Voice Cloning...
Hardware for my PC:
Graphics Card - amzn.to/3pcREux
CPU - amzn.to/43O66Ir
Cooler - amzn.to/3p98TwX
RAM - amzn.to/3NBAsIq
SSD Storage - amzn.to/42NgMFR
Power Supply (PSU) - amzn.to/430bIhy
PC Case - amzn.to/447499T
Mother Board - amzn.to/3CziMXI
Alternative prebuilds to my PC:
Corsair Vengeance i7400 - amzn.to/3p64r22
MSI MPG Velox - amzn.to/42MnJHl
Cheapest and PC recommended:
Cyberpower 3060 - amzn.to/3XjtZoP
Come join The Learning Journey!
Discord - / discord
Github - github.com/JarodMica
TikTok - / jarodsjourney
If you found anything helpful, please consider supporting me and the content I am trying to produce!
www.buymeacoffee.com/jarodsjo...

Наука

Опубликовано:

29 окт 2023

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 60

@jimb0z93 9 месяцев назад

saw this update release today on twitter, i was expecting a video from the A.I sound master - thanks for the video, good work!

@Mistercapi0 9 месяцев назад

I did exactly the same thing a few days ago and can confirm that re-whispering samples + smart padding at the end fixes the cutoff you are experiencing. Play around with it, I found that 0.2s was great on my data. (All depends of how quickly speaker transitions to next sentences and takes a breath)

@Dj-vt5gr 5 месяцев назад

Any chance you could release your "smart padding" code to github? THANK YOU!

@Mowgi 9 месяцев назад

Oh and shout out for the experimentation on accents🙏 thanks for your input on the discord

@Jarods_Journey 9 месяцев назад

Ofc :), thanks for you're input as well!

@randomyoutuber1078 9 месяцев назад

I have been looking into forced alignment & voice activity detection. I think its the key to fixing this problem. I have been trying to use it to process the training data with some success. But im not good at coding and haven't been able to test very many of the different methods that are out there.

@bernardthongvanh5613 8 месяцев назад

to do voice cloning I add 3 voice clip and use them with the --voice argument when using do_tts, but each time it produces 3 slightly different voice, is it not possible to freeze the behavior to always get the same voice? the problem is that If I want to read a text I'll need to make several generations and it's impossible to get the exact same voice for the multiple generations

@joshuashepherd7189 9 месяцев назад

Lmfao the guy sounds like he's spitting on us 9:01

@farizseptiananda7756 9 месяцев назад

i'm interest to use tortoise and i have done do basic generate with tortoise tts. but i have question, how pause and resume training ? because in my place, sometime power going down for few hour.

@JackpotFriends 3 месяца назад

i have like 8 2 hour live streams of myself i wanna use for training, is that overkill? can i just plug them in with whisperx & train off the whole sample? suggestions?

@SiddharthTripathi365 9 месяцев назад

Hi Jarods, i am big fan of yours! Can you please create a demo to create a TTS + RVC pipeline for Hindi?

@Mowgi 9 месяцев назад

Sorry if I'm premature with this comment as I'm only partially finished the video, but there are several programs designed to automatically remove breaths from vocal recordings. Or even a simple noise gate would help.

@Jarods_Journey 9 месяцев назад

If you have any, I would love to take a look. I just don't want it removing breaths from the middle of sentences, only at the ends which is the perhaps the issue here (which is why a noise gate wouldn't work)

@matthewfuller9760 8 месяцев назад

@@Jarods_Journey I am not sure about this particular use case, however, both amd and nvidia have background filtering software that filters voices in real time using the gpu for free.

@Vladimirytt 9 месяцев назад

can u make a tutorial on how to use rvc disconnected for colab?

@king-zu3ih 4 месяца назад

I am new with Tortoise . can i use model train with Tortoise on RVC or any way to convert to RVC model format

@kabirchawla2652 8 месяцев назад

Is it better with bark?

@ero1.097 9 месяцев назад

⁉️How can I keep the training of a Dataset that has ending in 300 epochs? Do I need put new audios in the Dataset folder together with the old audios and after go in the configs and put more epochs? EG. If it has stopped in 300, and i want keep training with this new audios, do i need now put 400 epochs to keep the training with the new dataset? and always that i want expanding my dataset training, i only need do it again angain increased in all new traning the number of the epoch?

@mr-s23 2 месяца назад

Can YOU share the Whisperx you are using?

@stevewarby12 17 дней назад

Hi. My train tab doesn’t show any training g files. Where do I get them please

@M4rt1nX 9 месяцев назад

We love the breathing. (Joke aside) I was watching a movie and got irritated because the actors portraying robots were breathing.

@Jarods_Journey 9 месяцев назад

Maybe they're breathing too 😅

@TheBibliographerSociety 11 дней назад

6:08 I've run into the same issue with XTTS-Finetune-WebUI, Whisper cuts the ends too short.

@srisir481 9 месяцев назад

Does it works only for english?

@zonas7915 7 месяцев назад

I see that you have an audio combiner script but it's not in the repo

@euphemisticukulele67 5 месяцев назад

jeremy clarkson?

@smackdown2479 Месяц назад

thank you for what you are doing, but when you do video think about noobs like me, try to make movements in camera slow and explain for begainers more, thanks again for your hard work apreciate it .

@MrAlsBundy 9 месяцев назад

I would to advice to take not the srt, I would take the json from whisperx, the alignments are much better and you can add offset and endsets. And don't forget to take a look on numbers, because the alignment-model can not handle numbers.

@Jarods_Journey 9 месяцев назад

I'll look into it, thanks for the advice!

@satyajitroutray282 9 месяцев назад

Few months back..i trained some models using mrq repo.. the problem i faced was with the generation..during testing when i input a small paragraph and check...these models ignore some sentences in the middle..or sometimes some words in the end of from the start of those sentences.

@Jarods_Journey 8 месяцев назад

I've found tortoise best if you give it smaller sentences or split paragraphs in sentences that are on their own line. Then also, it really depends on how well the model adapted to your training data too.

@shoaibvanu5194 5 месяцев назад

I am trying to train it for indian English accent can you guide me on this plss.

@Mehdi0montahw 9 месяцев назад

How do I change the language in Tortoise TTS and make it speak foreign languages?How do I add a special field to change the language I trained, Arabic, for example?

@Jarods_Journey 9 месяцев назад

You would need a custom tokenizer for training in Arabic in this case

@corbinangelo3359 3 месяца назад

Up till now, I still have my old n pathetic 1070 with 8GB vram. 😭 It took me 2 days to fiddle with the settings to get my machine to successfully train a voice model. 🤣 I'm really tempted to just go and get a 4060 super with 16GB but think ill wait till computex in June.

@MrDanINSANE 9 месяцев назад

Since you are the KING of audio clone AI (I'm a fan for a while) Maybe you can help me find the most up-to-date LOCAL clone audio that supports also Hebrew Language? I've tried So-Vits-Svc and RVC a long time ago, RVC can't run on my machine because I have only GPU with 4GB but So-Vits-Svc works... training is HELL lot of time even on a google drive cloud based. Anyhow, is there a NEW / BETTER way that you can direct me that will support Hebrew language? I hear good news about RVC v2 maybe - But training Locally is the real question... unless there are other new AI for voice clone which are better than RVC v2 of course. Thanks ahead, keep up the good work 💙

@Jarods_Journey 9 месяцев назад

RVC is better in my experience and should actually work on 4gb of vram, though I'm not too sure in this case. So-vits is good, but if you can try and get RVC running, it will probably be better. As for TTS, I'm not sure any that support the hebrew language atm, maybe facebooks seamless m4t

@MrDanINSANE 9 месяцев назад

@@Jarods_Journey Thank you! I will give RVC v2 a chance, people are very happy with it compare to So-Vits :)

@gonzalodijoux5953 7 месяцев назад

thanks for your video. is it possible to train french voice ? i have try to train french voice with your other video but the voice is in english. thanks

@janrappe 7 месяцев назад

You can also train non-english (french, german, spanish, etc) voices. But make sure to set the Text LR Ratio to 1 in the "Generate Configurations" tab as mentioned in the video. Otherwise the model would try to pronounce french sentences in english

@jeffreysabino6176 9 месяцев назад

Which techno song is your background music ?

@Jarods_Journey 9 месяцев назад

ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-slt_Bav8nsQ.html

@ahmetab06 7 месяцев назад

How much vram is required to run it? And which file do we need to run for the first setup?

@Jarods_Journey 7 месяцев назад

VRAM can be as low as 4gb I've heard from people. You might wanna check out my most recent video for an easy zero-code install: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-p31Ax_A5VKA.html

@ahmetab06 7 месяцев назад

I couldn't run it because I have a 4gb graphics card. Can you compare it with ElevenLabs? I couldn't find a video that explains it more clearly with clear examples.@@Jarods_Journey

@kodoqmc 9 месяцев назад

does tortoise support language translation?

@Jarods_Journey 9 месяцев назад

It does not

@johnyoung4409 7 месяцев назад

I'm facing the exact same issue. I carefully split my input by audacity but the problem still exists. Very confusing...

@johnyoung4409 6 месяцев назад

OK, after some digging, I finally find out sufficient audio length is very important, even for voice cloning. In my failure contempt, I only got 4 minutes of audio, now I've increased it to one hour and I don't have that issue again. 10 min is also sufficient in some of my experiment.

@ASlaveToReason 6 месяцев назад

@@johnyoung4409 when you use a 1 hour audio file, what is the length you break up the 1hour file into?

@DM-dy6vn 4 месяца назад

1:16 "Ch" sound is missing in "lunch"

@fjccommish 3 месяца назад

Why the bad background music?

@MrWaffleToes 9 месяцев назад

what happened to the japanese learning videos

@rickygrenadier6303 2 дня назад

lmao jeremy clarkson

@deathxrost 9 месяцев назад

Im off topic here "Jarods" can do something on xxxtentacion voice in RVC 😅 i know there are many AI which can do easily but i was amazed by the RVC song of "Kurt kobain" Somebody If mostly feel like ❤😮real one want to do something with xxxtentacion too...😊

@user-yc3hx3jx6n 4 месяца назад

Bruhh voice changing feel weird in Mac

@DOHANEWSUPDATES 3 месяца назад

Hi dear friend, your video are very useful. I have been trying to clone voice used in lonewwolf motivation videos. It's deep sound. Can you please guide me how to create my script in to this type of voices...my reference audio is like this ru-vid.com0U5PIiACwFI?si=TJmhQQV8r5SDaCFG... Please help me to train this type of voices

@jurandfantom 9 месяцев назад

Man I beg you, fix your voice synch ;_;

@benphillips2947 4 месяца назад

Every time I see the bad sync on these videos I get suspicious that it's just them trying to be cute. "Surprise you were listening to TTS all along!" Yeah, we know.