Local AI Voice Cloning with Tortoise TTS - 2024 Installation (Check LATEST update in description)

Подписаться 30 тыс.

Просмотров 81 тыс.

50% 1

Links referenced in the video:
LATEST Update - • Updated AI Voice Cloni...
Github Repo - github.com/JarodMica/ai-voice...
Curate Dataset - • How to Make the PERFEC...
Training Better Models - • A Tip on Training Bett...
Timestamps:
Demo - 0:07
Installation - 0:40
Starting and Using - 2:27
Add Voices/Zero Shot Voice Cloning - 6:05
Training a Voice Model - 9:04
Generate Config - 13:33
Run training - 15:32
Using Trained Model - 17:12
Hardware for my PC:
Graphics Card - amzn.to/3pcREux
CPU - amzn.to/43O66Ir
Cooler - amzn.to/3p98TwX
RAM - amzn.to/3NBAsIq
SSD Storage - amzn.to/42NgMFR
Power Supply (PSU) - amzn.to/430bIhy
PC Case - amzn.to/447499T
Mother Board - amzn.to/3CziMXI
Alternative prebuilds to my PC:
Corsair Vengeance i7400 - amzn.to/3p64r22
MSI MPG Velox - amzn.to/42MnJHl
Cheapest and PC recommended:
Cyberpower 3060 - amzn.to/3XjtZoP
Come join The Learning Journey!
Discord - / discord
Github - github.com/JarodMica
TikTok - / jarodsjourney
If you found anything helpful, please consider supporting me and the content I am trying to produce!
www.buymeacoffee.com/jarodsjo...

Наука

Опубликовано:

17 дек 2023

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 454

@Mowgi 6 месяцев назад

We're all very lucky to have someone dedicated to not only teaching us how to use these awesome technologies, but making it as simple and up to date as possible. Keep up the great work, we don't deserve you 🙌

@Jarods_Journey 6 месяцев назад

Thank you thank you 🙏🙏! Really much appreciate it and you're too kind 🥹

@PlaystationEu 6 месяцев назад

@@Jarods_Journeythanks a lot for your work, it's really awesome 😊

@pc_boy5371 6 месяцев назад

I agree with you a 100% love the channel

@brianlink391 5 месяцев назад

Speak for yourself - I totally deserve him! 😉

@SirRubyRed 3 месяца назад

Is it not possible to download pretrained voices?

@BlueprintBro 6 месяцев назад

Thank you so much for always making up to date and accessible guides for everyone!

@legend_of_ray 6 месяцев назад

I managed to find the original repo a little while back. Glad your your keeping it alive...thanks for this!

@shawn4990 6 месяцев назад

After getting into AI and programs like Stable Diffusion over the last year, I had to learn some code with all that's required to get them to run properly. However, since I'm not a programmer, what ended up happening is I created more issues for myself, which took way too much time to google and fix my mistakes. Yes, I've learned a ton, but I've pulled nearly all of my hair out in the process. So, thank you for making this a code-free install. Saves me time and more hair-pulling. Again, thank you Jarod... your efforts are appreciated.

@Jarods_Journey 6 месяцев назад

Appreciate it! I know there are a lot of folks that are interested in AI but all of the code revolving around it and dependency managing... Is a hell scape. So, glad that my code free install can help others out there and it also makes sure the tutorial stays the same throughout time :)!

@2mShortFormCC 4 месяца назад

GPT can code if you know what to ask for

@33rdframe Месяц назад

i am the 24th person to REALLY feel this message, lol. i never wanted to learn python 😂

@supaplay3947 6 месяцев назад

I'm so thankful for u making this video and for the community who makes these tools. I really want to change my video from silent type of video to more of a entertainment type videos but my main problem is my voice, I was born with bad voice and so I really need something like this for the voice of my video

@ShannonWare 2 месяца назад

This is an amazing video. Not only has it gotten me started with voice cloning, it is an excellent summary of quick and dirty model training.

@Nathanizer 6 месяцев назад

Thanks a lot ! I was trying stuff with Conda but all didn't work out as I expected. So followed your video, and with the own custom voices. It all works perfectly. Thanks :)

@IOSALive 2 месяца назад

This made me so happy! I liked and subscribed!

@Samuel-wl4fw 6 месяцев назад

Thanks a lot, have been struggling with dependencies, and have been following a few of your videos :)

@nodewizard 6 месяцев назад

We have quantized LLMs and Turbo SDXL and LCM models. I think it's time for a turbo/quantized TTS in 2024. Thank you as always for your tutorials and updates.

@CozyChalet 6 месяцев назад

The way ai tts companies charging people is ridiculous. I am glad there are people like you. Thank you.

@compositeur8455 5 месяцев назад

You need an Nvidia GPU to run this crap, so it's not much better

@lightning_dynamics 4 месяца назад

thank you so much for putting this all together, I'm making an audiobook and this helps a lot !!!

@jonnysmith9328 3 месяца назад

You're Awesome ! I love your videos. They make sense and easy to follow.

@MatthewJettHall 14 дней назад

OMG you rock!!! Thank you so much for putting this package together for us. It works amazing!!!! Thank you again!

@Random_person_07 6 месяцев назад

Thanks so much for making this! it's awesome keep it up!

@user-nq7nd8yz4z 6 месяцев назад

Thanks for the work ! And the tutorial ! I have leave a subscripton to your channel ! Hope you are well and Start good into the New year!

@Jarods_Journey 6 месяцев назад

Thanks and you as well!

@bwowzah 6 месяцев назад

Fantastic video! I greatly appreciate the hard work and dedication you put into what you do on this channel. You've helped me out immensely.

@MR.RECAPER 6 месяцев назад

👌👌thanks, i have trying to install tortoice tts from your first video about it. but i always get error when installing pakages but this it was so easy and it actually worked.😊😊😊😊😊😊

@tyc00n 6 месяцев назад

super awesome, I tried doing that recently and gave up. Really good idea including all the dependencies so the process becomes 1. Download 2. Extract 3. Run like everything else people download 😊

@Jarods_Journey 6 месяцев назад

Thanks! The key is using the python embeddable packages, though there are a lot of steps to getting a package up and running correctly😅

@black_dragon274 5 месяцев назад

@@Jarods_Journey Why isn't there a GUI interface for this? Does it have to be through a terminal or browser? It's so primitive!

@joshuadelacruz3907 5 месяцев назад

Thanks, mate! This is such an awesome job!

@UmakantMishra 5 месяцев назад

Great package. I will install and explore it. Thank you for sharing your valuable knowledge and experience. Big Like.

@csiguszfoxoup 6 месяцев назад

Thank you! Amazingly explained!

@pogiman 3 месяца назад

it worked!! thanks man!!

@schakuun1995 5 месяцев назад

Genuis!, great Tutorial thanks :)

@Jarods_Journey 5 месяцев назад

Appreciate it :)!

@Vulk7n 6 месяцев назад

Thank you for making videos on rvc and tortoise tts , i hope that one click pipeline comes soon

@jurandfantom 6 месяцев назад

Just noticed that you synch your voice with video

@memesprophet 6 месяцев назад

Mssive Respect to you my dude. Really needed this

@puntogcb 4 месяца назад

Hey Jarod! Just wanted to drop a quick note of appreciation for your content on AI. Your journey into the world of artificial intelligence is both fascinating and informative. Thanks for making complex topics so engaging and easy to understand. Keep rocking those AI insights! 🚀 By the way, any chance trainig Spanish LATAM voices in the future? That would be fantastic! How would it work? Muchas muchas gracias! Abrazo de Argentina!

@vrtech473 6 месяцев назад

nice one ❤ Thanks!

@gu9838 6 месяцев назад

will try it out had issues with the cloning part a wile back so we will see thanks!

@paul.j478 6 месяцев назад

that's freaking awesome!!

@huyked 6 месяцев назад

I wish all the github stuff (I'm a newbie/non-programmer) was this simple. Lol. Thank you!

@Jarods_Journey 6 месяцев назад

And that's why I wanna try and make it as hands off as possible :)! The learning curve sucks in the beginning, but it does get easier though the more you learn it for GitHub though!

@rettbull9100 5 месяцев назад

My clone voice came out sounding horrible. I used same audio clips that I've used with RVC, which sounds really good. I used all the same setting and did like you said. Though for some reason my long clip was broken up into 0 to 4 sec clips. I made sure all my sets matched what you used. It original audio clip was 54 minutes long. Took over a day to train. edit: the graph lost-mel, green light, was almost at zero at the end of training. I trained it for 500 epochs.

@Cadaveri 5 месяцев назад

Thank you so much for this release. Finally something that anyone can install and understand without problems! Btw are there any sort of pre-trained datasets or sound file databases available anywhere on the internet that you know of? (popular video game characters etc)?

@Jarods_Journey 5 месяцев назад

Np! As for dataset, I'm not sure, but am pretty sure the audio exists somewhere out there on the web!

@HistoryIsAbsurd 5 месяцев назад

Worth the sub thanks alot

@KurtStaInes 6 месяцев назад

LMAO this program now became the Stable Diffusion of voice generation, I admit that it won't take that long for this to improve . Thanks for the fork looking forward for the documentation.

@nektrs 3 месяца назад

Thank you!

@syrcon 5 месяцев назад

Your videos are Awesome Jarod! You do such a good job explaining how to install and setup these repositories (even going the extra mile to fork them yourself to make them easier to work with)! Is it possible to fuse two voices together, or is it viable to train a model by combining two datasets from two different speakers?

@Jarods_Journey 5 месяцев назад

Appreciate it! For tortoise, I believe if you train on two voices, you get a mix or average between the two as this does occur when you use two different files as reference audio files. I actually haven't yet tried this for training so this may be a useful experiment to try.

@syrcon 5 месяцев назад

@@Jarods_Journey I'll have to try it out as well. I assumed that it would have negatively impacted the training of the model, but if it instead blends the two, then that would be really interesting.

@spiffylich3349 5 месяцев назад

Awesome Video! I'm a bit stuck, though- I have about a 45 minute clip of a character talking, and I've gone and processed it with UVR-5 and the audio-splitter project you linked, so I have a ton of smaller voice-line wav files. But when I try and train the model on them for ~200 epochs, the results I get from using the model are awful! its like around 50% of the words spoken by the generated audio are just noise, or the AI struggling very hard to speak a word. any tips for getting clearer audio? like, should I put my 45 minute video into the voice folder instead of the multiple clips?

@gregorymccollum9107 4 месяца назад

😁Saved me hours. Keep working!

@Jarods_Journey 4 месяца назад

Thank you, appreciate it!

@negociodenerd 5 месяцев назад

Congratulations on the work, I've been following you for a few months now. I would like to know how I can create a model in other languages and make voice cloning at least acceptable.

@Starpluck 6 месяцев назад

Thank you for this tutorial. I will ensure you will be greatly rewarded for it. --Tutankhamun

@rubenrodenburg4478 4 месяца назад

Thanks man

@JiangXina 4 месяца назад

thank you so much

@cuccurese 5 месяцев назад

I did everything you told in the video, after all, my audio speech has an American accent, but my audio is in Italian language. :D i spent so much time and training.

@prizegotti 4 месяца назад

It's not trained for Italian. Just American English and Japanese.

@cuccurese 4 месяца назад

@@prizegotti Thanks!!!!

@datorresramos 5 месяцев назад

Nice video, super easy to understand how to install this Tortoise TTS, i have a question how can i access the webgui from another computer on the same network ?

@LucidFirAI 3 месяца назад

I am in love with this install method! Your tutorials a year ago were usable but kinda hard to follow, this method however is f'ing perfect :) Is there a way to control tortoise through command line so I can run it with a batch file? What is the best way to run it for stable outputs at the expense of perfection?

@bobbyboe 6 месяцев назад

Thank you thank you man... finally I have this thing running! Question: Does DeepSpeed make a diffrent in Quality aswell?

@Jarods_Journey 6 месяцев назад

Deepspeed does not as far as my observation sinces it's just parallelizing the process of the autoregressive model to make it faster. At least that's my understanding of it :)!

@DM-dy6vn 2 месяца назад

5:12 As far as "Samples" are concerned, I noted that the "sample_batch_size" is implicitly set to 16 in the code. You can see it in the console when generating. Having "Samples" set to 16 means that there is one batch to process. If you set Samples=100, then 6 full batches will be processes + 4 samples in 7th batch. The time needed is nearly proportional to the number of batches. Having said that, it is not "exponential". The iterations behave close to square root. Quadrupling "iterations" would approx. double the processing time. A batch of samples will be placed in VRAM, and depending on the length of a text chunk, it could push your GPU to the limit as far as VRAM is concerned. Setting "Samples" to something lower than 16 will free VRAM, but potentially lower the quality, since less samples will be used. Do not feed it overly long sentences. Use "Line delimiter" to separate your sentences during processing. You should avoid GPU using "Shared GPU memory" (my RTX 3090 can do this), because by opting for the PC RAM the processing will become even slower (slow data swapping).

@CptTurk81 4 месяца назад

This is amazing. I can see there's an api option, do you have any guides on how to use it programmatically? Say for automation?

@RobertSmith-kj6eb 6 месяцев назад

Bro, I got this working real quick. It is amazing. I copied and pasted voices from a different tortoise-tts and it sounds great! Thanks for sharing!

@Samuel-wl4fw 6 месяцев назад

Where do you find some available voices? I tried to look but couldn't find any

@leighenhenkelman8648 6 месяцев назад

I'm looking for voices too!@@Samuel-wl4fw

@thebigbigdaddy 5 месяцев назад

great video - did you ever entertain to integrate this with Twilio for creating phone gpt agents?

@Elrevisor2k 4 месяца назад

How do you create a voice model? For other languages? Great video

@RobertJene 6 месяцев назад

gettin ur 2024 video in early I see

@Jarods_Journey 6 месяцев назад

If I put 2023 on it, it'd be outdated a month later 😂

@dezenzplay 5 месяцев назад

Thank you for all your work to keep the project alive! :) I've already created some fun gifts for a few friends with Tortoise TTS in the last year and without your help and videos, I would never have thought of it! I can't wait to see how fast generating with DeepSpeed works now with the updated version! I do have one question though, unfortunately I couldn't find it in the wiki of the original repo or anywhere else on the net. Is there a possibility or a command to save several sentences in the input prompt as separate audio files instead of a combined .wav-file which includes all the sentences? I'm planning to create a kind of podcast with two speakers, for which I'll copy the entire dialogue of a single speaker into Tortoise and then repeat the whole thing for the second speaker. Then I'll put the individual snippets together in Audacity. It would therefore be easier for the project if the WebUI produced individual sound snippets directly instead of cutting the coherent .wav-file by hand. :D EDIT: Okay, that's cleared up. I finally figured out how to load the model at the beginning using a JSON-command so that the appropriate autoregressive model can be loaded for each speaker. For people who also want to try this, the command is e.g: {"voice": "Peter", "autoregressive_model": "./training/Peter/finetune/models/5000_gpt.pth"} Text for the prompt But it seems like, if you're doing this method with changing models and DeepSpeed, there will occur a CUDA error and it asks to recompute voice latents. But doing it with the involved models doesn't seem to work.

@Jarods_Journey 5 месяцев назад

There should be an option to save uncombined sections individually, should be in the settings somewhere I think. As for the error, deepspeed for some reason doesn't like being unloaded and reloaded for new models which is why I show that you have to restart TTS each time you change models. Idk why this is the case and I spent quite a bit of time trying to find out why 😅

@dezenzplay 5 месяцев назад

@@Jarods_Journey Thank you very much for your feedback! :) I'll have a look, but luckily I've now found an even better way with the method mentioned above. Ah okay, so that's the reason behind the reload of TTS. But well, you can make sacrifices at the cost of speed. :D Thanks for all the videos on the subject so far, the new implementation with RVC on your part could also be very useful for the future!

@ash3844 4 месяца назад

Hi, Thanks for the content. Does it work on Ubuntu? or only windows? facing few issues while running on ubuntu 22

@neros1277 3 месяца назад

spent few hours learning from your videos on tortoise tts then tried to make my own module, i decided to go with cloaker from payday 3, result was stupiditely high pitched voice that sounded like shit, trained it on 10 clips 5 second each, set epochs to 200, would you say i should use more samples and more epoch on training? also should samples of character yellign and speaking soflty be traned together or make it separate modules?

@ErnestoPossiSpanishVoiceOver 2 месяца назад

Do you know how to make it the same software in a Mac system? Great video Jarods!

@KaruHart 6 месяцев назад

Legend

@parmesanzero7678 5 месяцев назад

Is there an ideal script for voice training? That is, is there an ideal series of things to have the speaker saying to get the best results for new speech from the voice model?

@shovonjamali7854 3 месяца назад

Another great one! But can you show us, how can I run this thing in google colab as I don't have sufficient hardware to run this?

@mohamedemam-5807 6 месяцев назад

thank you for your useful content :), i have a question . is it possible to use the trained models in rvc in tortoise tts ?

@Jarods_Journey 6 месяцев назад

Unfortunately not, they're different architectures so it won't work

@morningwood3457 3 месяца назад

When I tried to train a model, the training console said "ETA: 1 day 18 hours to complete." 🤯

@couldntbemebro022 7 дней назад

What's ur GPU and CPU?

@midnitejesus Месяц назад

My model came out sounding nothing like it was trained on. I had 2300 super clean chopped samples for a character and realized my 3080 would take forever. I trained on 250 samples over 3 hours. The output was 7 models, from 60_gpt to 402_gpt. I tried them all and the voice is simply pitched too high and sounded nothing like the source files. I followed your instructions to the T. Any suggestions?

@rushic24 4 месяца назад

Hi, thanks for the video. Do you know if there is a way to retrain the model with new data?

@jeffketter9677 6 месяцев назад

When I have hifigan enabled, I'm getting very "warbly" under-water sounding audio. I checked with both the random voice and one that I sampled, same effect. If I turn hifigan off, it sounds normal. Any ideas? I've been using this with your audiobook maker (thanks for all your work on these, by the way!) and was really hoping for the speed boost.

@Jarods_Journey 6 месяцев назад

Hifigan with some voices is MUCH lower fidelity for that speed gain. Some voices/trained models do better I've observed, but that's the tradeoff unfortunately

@Skalekul 22 дня назад

Do you have any idea why custom trained models don't work using hifigan, which produces the error 'tuple' object has no attribute 'device'

@soundgif 3 месяца назад

Hey, thanks for this awesome video. Question - how is the autoregressive model tuned without the VQ-VAE? Since CLVP and CVVP operate on the VQ codes produced by the autoregressive output, wouldn't this harm selection of the samples generated by the autoregressor? I understand that the downstream diffusion model (and presumably the hifigan) operate on the final latents produced by the autoregressive model (and not the codes), so in theory this could be used to tune the autoregressive model weights, but wouldn't it result in poor sample selection performance -- since the autoregressive mel code head can't be trained without the VQ-VAE? Also, just curious - why choose to train the autoregressive model without training the diffusion model (possibly in tandem)? Has any experimenting been done in this area?

@Jarods_Journey 3 месяца назад

We do have the VQVAE, it's the dvae.pth model inside of the models folder. I'll give you the 2 blogs posts about this: 152334h.github.io/blog/tortoise-fine-tuned/ and 152334h.github.io/blog/tortoise-fine-tuning/ which are better explanations than I can give at the moment. As for training the diffusion model, I don't have a strong enough understanding yet on what finetuning would do for it, but as far as my understanding is with the AR model, we are training in new representations for the tokens in its vocabulary so that it can output appropriate mel tokens for whatever dataset you use.

@michaelmezher9635 3 месяца назад

Wow! Wish I knew the VQVAE was available before! I'd think tuning the diffusion model may be useful for dramatically different voices from whats found in libritts, since theoretically the space of what can be represented in the diffused Mels is limited to these voice characteristics. This is especially true because the diffusion model is trained (fine tuned after autoregressive model convergence) on the autoregressive latents, not the Mel codes.

@yaracorreia8209 6 месяцев назад

Thank You so much for all your content! Really Awesome

@al3x__0 6 месяцев назад

reinstall thats what I did and it worked

@yaracorreia8209 6 месяцев назад

@@al3x__0 were you able to add new voices using the .PTH voice model? for tts?

@Jarods_Journey 6 месяцев назад

You have to use tortoise models, and they would need to be placed in training. It would look something like this: training/name of folder/finetune/models/put the tortoise tts models here.

@SirChogyal 5 месяцев назад

I love this. But unlike other applications, why is this AI voice cloning messed up with large files?

@DYLOGaming 3 месяца назад

Yo! Any reason why my vocals end up sounding super robotic? I'm using custom vocals, but idk why they sound filtered and very bad. Any assistance would be greatly appreciated!

@Chriscs7 4 месяца назад

11:56 - What model is better in the generate tab ? base, whisperX or something else? You need to explain what gives the most accurate cloning not only what is faster to train

@Parsitube_yt 2 месяца назад

i wish there was such a TTS for persian language as well

@carnacthemagnificent2498 6 месяцев назад

I was really excited to try this because I have not been able to get deepspeed running on my machine, period. However when I run this I get the error about mismatched latents but it adds "The specified pointer resides on host memory and is not registered with any CUDA device" and recalculating latents doesn't make it go away, every time you generate it's back. I guess I'm stuck with old original slow Tortoise.

@Jarods_Journey 6 месяцев назад

What's your Nvidia GPU? This error occurs on my machine if you don't wait for TTS to finish loading or you didn't re(load) TTS in the settings. It's specific to only when you have deepspeed enabled.

@bridicot 6 месяцев назад

Great tutorial. I wanted to buy a desktop to do this. Will an RTX 4070 work? I am thinking of having 32GB DDR5, a 2TB SSD, and either an i7 or i9 CPU.

@cjmcneley8869 6 месяцев назад

Is there anywhere to gather voice training audio files. Or anywhere to gather already completed training voices.

@DM-dy6vn 2 месяца назад

5:12 For the sake of speed (without decrease in quality), you should definitely use "Half precision" (see Experimental settings).

@kiranaric 4 месяца назад

Your channel is excellent, I'm able to get some AI voices up and running without having to learn much of coding. I am encountering an error in this particular case though - in regards to training a new voice. When I click 'Validate Training Configuration', every option in Generate configuration page is turning into a red Error message and I get the notification 'Empty Dataset'. How do I solve this? I did follow the prepare Dataset step before I went to this page. EDIT: I was able to rectify the error. Looks like there was an error with the Prepare Dataset phase the first time for the train.txt file was empty. I did a TTS Reload from the Settings, deleted the voice training data entirely and created a dataset afresh and it worked this time. I am currently in the Run Training page. Feeling positive! Thanks for you awesome work!

@Soljarag5 5 месяцев назад

Thanks so much for ut tutorials! What does the temperature setting do?

@Jarods_Journey 5 месяцев назад

Temperature is kinda like randomness. Higher means possibly more random and unstable, lower is more deterministic and stable.

@Soljarag5 5 месяцев назад

@@Jarods_Journey thanks man

@merridius2006 6 месяцев назад

Awesome, I was wondering how Microsoft was able to do it with their Libby which is also pretty good, seems like that one is instant but you can’t generate files, just use it as a screen reader

@andrewvz1914 2 месяца назад

Is there somewhere we can import pre-trained models that have been downloaded elsewhere? I was trying to get it to work, but kept winding up with crashes, so I'm guessing I've got something wrong.

@AiFanArt 5 месяцев назад

Thank you for the video, I have now created my own model and it works very well. I wanted to ask how it works with the line delimiter and emotions? I only ever get an error message when I select anything other than "None". Is there a tutorial for this?

@Jarods_Journey 5 месяцев назад

Line delimiter was originally for how it passes and separates the text prompt. is for new line, but you can change it for a period if you prefer. Emotions give a suggestion to the model to produce a certain style of voice, but it's inconsistent or doesn't work all that well so I generally don't do anything with it.

@poszukujacprawdy 4 месяца назад

Hi Jarod, can I train my voice in Polish language as well or is it only for English?

@Puwunda Месяц назад

Is this able to fully utilize a multi-GPU system, or does it only utilize one card?

@LongevityLotusInn_ 6 месяцев назад

Thank you so much Jarods. I'm wondering is it possible to set "speaking speed" when using tortoise?

@Jarods_Journey 6 месяцев назад

Unfortunately not, that is a randomized feature

@LongevityLotusInn_ 6 месяцев назад

@@Jarods_JourneyThank you for your reply!❤ I also find that it is easy to pop up "CUDA out of memory" on my computer, so is there are any chance to run it online?

@madokahomura929 3 месяца назад

Thanks all worked great. UPD: I fixed it. If anyone encounters same thing just increase your paging size or set it to automatic. But suddenly I started running into problem. With tortoise TSS it simply doesn't load whisper larger (or higher model) if though everything worked perfectly. It just freezes with connection error. In configuration I get: "Batch size exceeds validation dataset size, clamping validation batch size to 0" and when I finally press train, I get "UserWarning: Passing `gradient_checkpointing` to a config initialization is deprecated and will be removed in v5 Transformers. Using `model.gradient_checkpointing_enable()` instead, or if you are using the `Trainer` API, pass `gradient_checkpointing=True` in your `TrainingArguments`." and it freezes. Sometimes this error appears "dll load failed while importing _iterative: the paging file is too small for this operation to complete." or "CUDA out of memory. Tried to allocate 12.00 MiB. GPU 0 has a total capacty of 15.99 GiB of which 13.53 GiB is free. Of the allocated memory 1.18 GiB is allocated by PyTorch, and 3.90 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF" With RVC the same thing basically. It complains about not having enough memory even though everything worked fine two weeks ago forcing me to reduce data set or batch size. Would really appreciate your help. Thanks. UPD: It was all because paging file size was too small (512mb). I set it automatically managed size and it fixed it.

@HauntedVCR Месяц назад

hey! I appreciate your work. I followed and seem to be successful all the way up to the point of generating a configuration. it created my dataset with a lot of files in my folder then it is just not letting me select my dataset, I am not savvy in coding so i dont know what the cause could be since the files are put into training > vinny > audio help would be appreciated

@user-vv2oh8ni6o 28 дней назад

so we can essentially take elevenlabs generated voices, use them as samples, and clone them?

@amlanlegend3092 4 месяца назад

We need a video on how to use the gradio gui for tortoise tts and how to get a good quality voice with those options in the gui.

@FrankGlencairn 5 месяцев назад

After updating 7Zip, I was able to - at least - unpack it, but when running the bat file, the command window just shuts down after "loading autoregressive model" any ideas?

@SAnsAN091190 6 месяцев назад

Hi! Thank you very much for your videos! I would like to know if you have tried to train models in languages other than English? What are the successes?

@Jarods_Journey 6 месяцев назад

I still haven't trained other languages unfortunately 😅

@ForTheEraOfLove 6 месяцев назад

@@Jarods_Journey The docs are so convoluted when dealing with the language switching and training. I look forward to more tutorials from you brotha

@SAnsAN091190 5 месяцев назад

@@Jarods_Journey It's a pity. We look forward to more content like this from you in the future! 🤗

@Razor7557 4 дня назад

Any suggestions how to make it clone a voice that had certain effects applied to it? Namely I mean Mr. House from Fallout New Vegas. I have the voice files from the game, but they have a slight "speaking through speaker" effect applied to them(Which is kinda important to keep too...), and the results are pretty bad, sounding nothing like they should and/or turning into completely another voice from one sentence to another. Should I try making entire model with them instead? If so what would be recommended settings?

@waimak7507 4 месяца назад

Thank you Jarrod for your amazing teaching firstly, it was easy to understand however when I clicked on Transcribe and process , I got the error popped up "Something went wrong 'text' and in the CLI, the error was " File "C:\Users\Thanks God\Documents\deleteme\ai-voice-cloning\src\utils.py", line 2588, in prepare_dataset if len(result['text']) > MAX_TRAINING_CHAR_LENGTH: KeyError: 'text'" am I missing or done something wrong? please help (I did extracted my vocal in to one wav file in the me folder inside the voices folder as per your instructed...., thank you very much!!

@AMMV24 4 месяца назад

"I was not in the mood" hahahah so identified.

@daryladhityahenry 6 месяцев назад

New Question: I never get good quality. I already use my audio file that I use for recording, and free from music etc. Pure vocal. But, I'm getting robot like sound no matter the quality, diffusion or hifigan. Already try to use "High Quality" too.. I train for 500 epoch, try each result ( every 100 epoch ), no one good. Already follow tutorial on split audio file too for data. Is there any missing steps? Thanks. Also, what is "Voice chuck" when we want to generate voice? Thank you so much... [nevermind this all] After I transcribe & process, all is done. On generate configuration, and click "Validate Training Configuration", it said "Empty dataset". But I already check training folder, and my folder audio srt all exists. Why is that? Thank you. I check the code, and it checks "train.txt" file to be empty. What's should be inside train.txt file? Hi! There's some problem with what you make here, which is: the both model for TTS & Whisper is running TT__TT.... My GPU can only hold one of them. So, I can't transcribe & process while TTS server running ( Not even running the process, just starting up ). What manual code that I need to run to transcribe all of the files? I mean, where's the source code located so I can run it manually without running TTS server? [/nevermind this]

@jeffisgett 4 месяца назад

Jarod, quick (and hopefully not too stupid) question: I am using a somewhat dated graphics card (GTX 1070), does this prevent me from doing local voice cloning? Sorry if this is already answered elsewhere, but I'm a little overwhelmed by all that's out there, and I'm hoping to be able to use voice cloning and voice changeover to do some very small, short, independent movies. Helping with minor audio edits without needing individual actors to return just to change a phrase or an inflection, etc.

@Jarods_Journey 4 месяца назад

I think you should be fine... 4gb of vram I think on that card? But it'll be very slow. If you try and run into out of memory errors, it might be too small

@MrUsamamubeen1 3 месяца назад

Once the model is trained, can we delete those audio files instead to just putting them in "backup" folder? Or will they be needed for something else. Just asking because I like my folders clean without any unnaccessary files. Thank you.

@Jarods_Journey 3 месяца назад

Yep, you can delete them. If you don't need them for additional training, it just takes up space!

@MrUsamamubeen1 3 месяца назад

@@Jarods_Journey Thank you

@RedditSlop 3 месяца назад

Was the beginning a Re:Zero reference?

@narrativeninja Месяц назад

i just hope that there is a description/function when i hover the buttongs i dont know what to click

@user-el2jv2nn5c 15 дней назад

I followed your data curation video and have ended up with loads of short recordings however in this video you use just a single large recording. When I add all my short recordings into the voices folder and train it does not work and does not create the dataset in the training folder.