yeah, that's what I did - I found that it was still a bit of work to get all the dependencies happy though. I've also found that for a bunch of folks new to coding Docker is more accessible than a conda environment
What I would find amazing would be to find out how to build stuff like tortoise myself. I'm a 4th semester, but we didn't do a lot about machine learning stuff and I have no clue how to make advanced stuff like tortoise tts
You might want to reach out to the person who created tortoise. I've seen them comment on github that they were thinking of publishing their training methodology.
Hi there! Great video. I came across it when I was looking for a script for Linux that uses Mimic3 to pronounce highlighted text, similar to a built-in feature in MacOS. I've been trying to make it work for a few weeks now but so far without success. I would appreciate it if you could create a video on this. Thank you!
copy that, i'll give it a think - what part of the project are you finding the hardest? The getting the text part, turning it into speech or something else?
@@LearnCodeWithJV Truth be told, I have a script but it doesn't work. I've been trying to make it work with xbindkeys. Plus, I found another script online but for Espeak. I haven't managed to modify it, so, I would appreciate your help in this matter. In my opinion, it's a very useful MacOS feature, which I'd like to have on Linux with a decent voice. Espeak is too robotic and outdated.
Tortoise is good but very slow. Is the reason for this that it starts over from the voice training set every time? You mentioned the ability to save an intermediate vector of the voice - could you cover that in a video and whether it improves the speed. Thanks.
yeah, it's name is apt. I've seen a few derivative projects kicking around which are claiming significant speedups. I'm waiting until I find one with good voice cloning and multi lingual abilities and a commercially available license to do the follow up.
i wonder if there's any way to run the game voice thing though CLI. im bulk translating all the voice lines in an already existing game to learn a new language but theres thousands of lines so i dont want to copy paste each one by hand and download and name them... also my GPU is from like 2013
So Tortoise TTS is great but using a model trained by someone else (so I know it works from the output); 2 out of 8 sentences were spoken in a male voice, it was a Melina (female) voice. If you have ANY clue why that would happen... I would love to know, this is all way beyond me I'm a mere end user and not a programmer.
cool cool video. I've been having a wild time trying to get tortoise-tts to work on the gpu. it works on cpu but very very slow. was suggested to look into pytorth gpu tutorials. looks like my conda enviroment wasnt allowing it but now I'm able to run cuda and use the gpu for thing. now I'm trying to get back into tortoise tts with prehaps a fork instal in hopes i can get some practicality from it. my gpu is tiny. 1660 but its better then the cpu. looks like quite a few folks have noticed outdated information. any additional advice appriciated.
I was finding it slow going on my 3060 so I can imagine what it felt like on a 1660, and cpu would be really tough to get useful work from it. When I build a Dockerfile for the project I'll play around a bit more to see if there are any obvious performance tweaks that might be useful.
I would love a docker file if you have one? Thanks for the video. If you are into the training of Text to image, Text to Speech and cloning of voices and finetuning for user cases (tuning an LLM to write stories in a genre or review data and report in a specific way, I for one would love to see how you approach it. Thanks again.
I would like to second that request - I've been wanting to try Tortoise for a while, but the python dependencies issues have been keeping me from mustering the energy to do so. @@LearnCodeWithJV
I just went to look at doing this and turns out someone added one last week - github.com/neonbjb/tortoise-tts#docker Looks like development pace has picked up on the repo in the last few months.
I remember it only showing up some of the time and being disabled when a voice was rendering. Have a half memory of it only working for built in voices but I can't recall.