Local Piper TTS Training (Using WSL)

Подписаться 2,4 тыс.

Просмотров 3,3 тыс.

50% 1

Training and finetuning custom voice for Piper TTS using WSL.
Github: github.com/rhasspy/piper
Article: ssamjh.nz/create-custom-piper...
Batch File and Transcript Python script: github.com/natlamir/ProjectFi...
0:00 Intro
0:25 Custom Dataset
3:17 Install WSL
4:45 Update Ubuntu
5:25 Install Piper
7:15 Preprocessing Dataset
7:59 Training
9:29 Export
10:52 Testing

Опубликовано:

7 июл 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 41

@guile3d 7 месяцев назад

Great video @natlamir! You’re doing a great job with your tutorials 🎉 By the way, I love ducks too! I have around 70 in my yard which I treat like pets. Most people would be amazed on how intelligent, funny and friendly these birds are!

@Natlamir 7 месяцев назад

oh wow, that is amazing! 70 ducks! 😲 I am jealous of your yard! :D Thank you! 🙏

@KentHambrock 7 месяцев назад

Excellent and detailed as always. Thanks for these. :) Your channel introduced me to Piper and now I'm learning all about the onnx runtime. xD

@Natlamir 7 месяцев назад

thank you! 🙏 that is awesome! same, I am also planning on learning more about the onnnx runtime! 👍

@ssamjh 6 месяцев назад

Awesome video! Glad my written tutorial helped :) I've popped a link to the PiperUI you've created in the guide too!

@xbruhmoment9484 6 месяцев назад

I love this man keep up the good work

@cucciolo182 3 месяца назад

beast 💪

@jesusruiz6061 6 месяцев назад

Thanks a lot for this great tutorial. In the last step, trying to export ckpt to onnx, I'm getting theRuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA__index_select). Any idea? Thanks!

@gkhndnc 7 месяцев назад

Thank you bro

@Natlamir 7 месяцев назад

🙏

@gkhndnc 7 месяцев назад

I'm encountering an issue while generating training data using the command python3 -m piper_train.preprocess \ --language tr \ --input-dir ~/piper/my-dataset \ --output-dir ~/piper/my-training \ --dataset-format ljspeech \ --single-speaker \ --sample-rate 22050 I keep getting the following error: INFO:preprocess:Single speaker dataset INFO:preprocess:Wrote dataset config INFO:preprocess:Processing 11 utterance(s) with 16 worker(s) Traceback (most recent call last): File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/home/dncgogo/piper/src/python/piper_train/preprocess.py", line 502, in main() File "/home/dncgogo/piper/src/python/piper_train/preprocess.py", line 225, in main for utt_batch in batched( File "/home/dncgogo/piper/src/python/piper_train/preprocess.py", line 491, in batched raise ValueError("n must be at least one") ValueError: n must be at least one

@Natlamir 7 месяцев назад

@@gkhndnc thanks for the detailed input and stack trace log. looking at the code, i think you might need to add more audio samples, or provide an input parameter for --max-workers so that the number of workers is less. i would personally try adding more utterances / audio samples. in the code, the error is thrown when batch size is less than 1. and batch size is calculated with this equation, and according to this, it looks like the batch size is 0 for the amount of utterances and workers in that output message for your error: batch size = utterances / (workers * 2) batch size = 11 / (16 * 2) batch size = 11 / 32 batch size = 0 with that many workers, i would aim for 32 audio samples resulting in batch size of 1, and to be safe, 64 audio samples so the equation would produce 64/32 = batch size = 2 so that logic that checks and throws that error (batch size < 1) won't throw the error

@gkhndnc 7 месяцев назад

@@Natlamir waoow INFO:preprocess:Single speaker dataset INFO:preprocess:Wrote dataset config INFO:preprocess:Processing 33 utterance(s) with 16 worker(s) I didn't have many sound samples on hand, so I duplicated one to test it and did what you said. This worked. Really thank you so much. Videos you make like this are really valuable. I feel lucky to follow someone who understands the importance of TTS and delivers pinpoint projects. As someone waiting for your next video, I wish your work to be successful...

@Natlamir 7 месяцев назад

@@gkhndnc thank you for the kind words. 🙏

@EnixForce 7 месяцев назад

Is VRAM size important when local training? I saw that colab uses 16 gb vram with their T4s which is obviously fast but can you get away with much much less and train slower locally without errors?

@Natlamir 7 месяцев назад

I think so, my machine is 12 gb vram and it ran fine on that, not sure what the lower limit may be though, so i imagine it should work but will probably take a while, and may need shorter audio samples to avoid memory errors. there is also a setting that can be modified which i cant remember the name of in one of the input parameters that reduces the resources used during training for gpu's with less ram

@user-ms8ek1ju1g 6 месяцев назад

The content of this channel is very wonderful. Thank you. Is it possible to bring pre-trained audio files and download them from the Internet? I think I won't be able to train voices like you. It's hard for me. I am looking for the Arabic language. Please help.. For your information, the Arabic language in the Piper program is weak and does not read texts well

@okeygoogle3188 Месяц назад

Very useful video! But I had a little problem. Maybe you know what can cause "N must be at least one" to appear. At the stage with "piper_train.preprocess"

@AutomatizaTuTiempo 3 месяца назад

It doesn't work for me on AMD with integrated graphics, it tells me killed in the training process help me

@mohdgh7394 4 месяца назад

How long did you train for ? 2 hours? should i train more

@GG-rs5wr 3 месяца назад

whats the max amount of text you can add to piper to read?

@LucidFirAI 3 месяца назад

The WSL command didn't work, instead it is: wsl --install -d Ubuntu-22.04

@tr1pod623 7 месяцев назад

is there a chance you could build this into piperUI

@Natlamir 7 месяцев назад

that would be nice. i think it may be easier to create a new python file within the WSL linux penguin folder where the python training code is like a ui python script which would be just a gradio web ui which makes the prepossessing and training commands more convenient to run. i will add that to my todo list to implement that at some point. thanks!

@tr1pod623 7 месяцев назад

@natlamir please see this do you know what the issues can be

@Mavrik9000 7 месяцев назад

Could you add more frequent and longer pauses in the dialog? Maybe double length at commas, triple at periods, and quadruple at paragraphs? The method I assume would be to search and replace , with , , and . with . . .

@Natlamir 7 месяцев назад

that is an interesting idea! i will try that out and see how it behaves when it comes to the pauses in between sentences / paragraphs. thanks!

@LucidFirAI 3 месяца назад

@@NatlamirI love happy full on attenborough. People can slow down the youtube playback speed if they need to

@ChatbookSummary 7 месяцев назад

Hi bro you are not replying now a days, I just want to ask can I create a best hindi language onnx file and model with the process you shown in your reply , because neither there is any hindi language selection available here nor any hindi onnex file is available here. Please reply dear🎉

@Natlamir 7 месяцев назад

you should be able to. use "hi" instead of "en" in the command, and should be able to create hindi with that.

@LucidFirAI 3 месяца назад

gawd dammit all the way to the last step and I can't do the long training and can't figure it out 😢

@teenudahiya01 7 месяцев назад

Can i train hindi language voice

@KentHambrock 7 месяцев назад

You should be able to. Piper seems to work for any voice, and the text files just require UTF-8 encoding which does seem to support the Hindi language characters. I wasn't able to find any currently existing voices supporting Hindi though, so if you do make a Hindi voice, please consider putting the onnx file under a permissive license so other Hindi speakers can also benefit. (You don't have to, it'd just be a nice thing to do for the Hindi speakers who either can't, or don't feel confident, training on their own voice.)

@teenudahiya01 7 месяцев назад

@@KentHambrock please dont doubt on indian confidance we leading from front, check our leader in different different company and country and a new name just has added mira murty new ceo of openai

@Natlamir 7 месяцев назад

@@teenudahiya01 I don't think Kent meant anything hostile, just that there are people who may benefit from having a pre-trained voice so they don't need to go through the process themselves. As Kent mentioned, it should be possible with piper. You would need hindi dataset / audio samples to train with. There are open source datasets available here. Select Hindi from the language dropdown here: commonvoice.mozilla.org/en/datasets Then you can use the batch file if needed to convert to the appropriate wav format, and the transcript whisper script to transcribe the audio (whisper should support Hindi). During the training command(s), use "hi" for hindi instead of "en". You could try finetune of an existing voice that is in english, but it may be better to train from scratch. Also, I found a couple other videos on RU-vid which use other projects for training in Hindi: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-zjt0d9fXFSM.html ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-jE-lKBKfxJw.html

@teenudahiya01 7 месяцев назад

@@Natlamir no sir yes he is right, but i am just enjoying our people success, and i am also apperciating you, you are doing a good job, i just wanted him know about it. Its our duty to make this earth safe simple and prosperous enough for every humanbeing

@Natlamir 7 месяцев назад

@@teenudahiya01 🙏 for sure, thanks.

@user-ub9fy8qs7k 7 месяцев назад

I am getting many errors.....

@tr1pod623 7 месяцев назад

Did bro die

@Natlamir 7 месяцев назад

haha, nope :D

@tr1pod623 7 месяцев назад

aye your back!@@Natlamir i managed to get passed the preprocess error because my audio samples were in total under 5 minutes. 5 minutes or more it works. But cannot get it to train, it gives me an error "RuntimeError: nvrtc: error: invalid value for --gpu-architecture (-arch) nvrtc compilation failed: #define NAN __int_as_float(0x7fffffff) #define POS_INFINITY __int_as_float(0x7f800000) #define NEG_INFINITY __int_as_float(0xff800000)" That's only 1 part of the message, but I have a laptop windows 11, RTX 4050 Laptop gpu, 13th gen intelcore i5 16 core If you dont want to check what's up. thats, fine, you dont have to, but i love your content