Does the colab notebook no longer work? I get this error in the first cell block: ERROR: pip's legacy dependency resolver does not consider dependency conflicts when selecting packages. This behaviour is the source of the following dependency conflicts. albucore 0.0.14 requires numpy>=1.24, but you'll have numpy 1.22.0 which is incompatible.
Not sure where or how to properly submit a ticket for the colab creators, but, the colab has been broken for weeks now. I check every couple days and it's only getting worse, stopping with errors sooner and sooner in the process.
I’m currently working on a fixed colab version, I’ll give you a heads up if I get it working Seems that we need to force downgrade all the packages that were changed in recent google colab updates
I fine tuned the model and saved it in my device . but in every time I want to use the model I should provide the speaker_wav which is from data used for fine tuning and this process(analyses the record) take a long time. so, how can I use the model with my own speaker id to avoid providing it the speaker wav ???
This might sound like a dumb question, but how would you load the fine tuned model in other python programs. I know we get a config.json, vocab.json and a modal.pth files after the fine tuning process, but would we use the TTS.api?
Also, it doesn't seem to run locally after downloading the model.pth, vocab.json and config.json Do you need to download the whisper model for it to work locally or is that just for training? Edit: No the whisper model didn't change it, I was desperate, figured maybe it needed to check you had requirements for training in order to inferencing or something, that that didn't do it. Removing the quotation makes from the paths made it look like it was loading for a moment, but then after 4 seconds it just says "error." When loading the finetuned model.
I gave up on it, i tried to train it on a 2:30 duration audio that was cleaned properly and it just was still training after 20 minutes on the default settings
@@Gobolinn disappointing. i've been looking for a good ja->en voice clone tts. best I've found so far is moegoe which ends up being a bit weird sounding with the pacing when doing inference in english (but the sound of the voice is spot on). Every other voice clone thing I've tried doesn't seem to match the voice at all. I was hoping this would work but it seems not?
when I ran the cell 1,I got this error message Building wheel for docopt (setup.py) ... done ERROR: pip's legacy dependency resolver does not consider dependency conflicts when selecting packages. This behaviour is the source of the following dependency conflicts. lida 0.0.10 requires fastapi, which is not installed. lida 0.0.10 requires kaleido, which is not installed. lida 0.0.10 requires python-multipart, which is not installed. lida 0.0.10 requires uvicorn, which is not installed. librosa 0.10.1 requires numpy!=1.22.0,!=1.22.1,!=1.22.2,>=1.20.3, but you'll have numpy 1.22.0 which is incompatible. plotnine 0.12.4 requires numpy>=1.23.0, but you'll have numpy 1.22.0 which is incompatible. pywavelets 1.5.0 requires numpy=1.22.4, but you'll have numpy 1.22.0 which is incompatible. tensorflow 2.15.0 requires numpy=1.23.5, but you'll have numpy 1.22.0 which is incompatible. gruut 2.2.3 requires networkx=2.5.0, but you'll have networkx 3.2.1 which is incompatible.
I love the Coqui performance, results and ease of use, but Is it possible to be even easier? Like 1 file for input file for training or microphone input, button 2 for training, and 3 type and speak. I am not sure why in year 2024 we still need to copy and paste text ...
This is currently the peak of technology. The top of tech available to the public. It's the very first iteration of the ui too. So in the future it might get easier, the more people want to use it. Like automatic1111s ui used to be barebones and hard to use. But now it's become a lot more user friendly. Just a few months ago all this was pure command line
Maybe you just found out about it😄 People are already making money from the same peak technology. I paid months ago for this type of peak technology and moths in the age of AI means long time ago :)
@@HyperUpscale Perhaps I should have clarified myself more. I meant the peak open source versions, that are accessable for everyone for free. If you take the older models of coqui for example, just a few months ago it would have taken you many hours to train a proper model that works well on any ways. A year ago even just a basic voice was considered a big step towards open source AI technology. I am well aware of AI-Voices being used for many many years now, however, technology of this caliber were not yet accessable to the everyday user for free, only through paid alternatives.