You must have a monster desktop. Im running a System76 laptop on Linux an NVIDIA 4090 16gb gpu with 64 gbs of ram and 4tb Samsung Pro. Its taking me almost ten minutes in and im still waiting on step one to complete processing lol Also my audio file is MP3 not WAV.....should i change it so this thing processes faster? Im stuck on XTTS Finetuner at 1680 seconds now...-__- Done...only took 29 mins!!!
Two questions: how do you "make" new characters? Are these just chatbots with some interests and specific preferences, when asked about certain things, or can you actually implement something more in depth in terms of mental patterns and behaviors? Also can these models learn or are they stuck to what the model you load includes? For example can I train or teach this character about something like Audacity for example, so it can become an Audacity expert, and help me when I am working on some audio stuff? or is just a funny thing to talk to?
I got the first step to work, but I´m having trouble with the finetune-webui. A lot of the time it says the audio file it to short, even though it´s well over 2 minutes, and if I actually get the process to start the command line windows just shuts down after a while and I get an error without specification in the UI. Any tips?
I get this error when I run the go-web.bat file: ValueError: mutable default <class 'fairseq.dataclass.configs.CommonConfig'> for field common is not allowed: use default_factory
This aged badly. No finetunes being made for SD3. CivitAi has banned SD3 on their website. Heavily censored training on images. So NO, it's not the beginning of an amazing series. Your two cents can be thrown in the poubelle.
Doesn't work. At 2:21 call install.bat I get this message: Nie mo'venv' is not recognized as an internal or external command, operable program or batch file. 'pip' is not recognized as an internal or external command, operable program or batch file. 'pip' is not recognized as an internal or external command, operable program or batch file. Install deepspeed for windows for python 3.10.x and CUDA 11.8 Nie moInstall complete. Press any key to continue . . .
too complex, stopped before the 3 minute mark I'm so far beyond git/Python and all that mumbo jumbo black magic ritualistic naming schemes of everything else that follows... ain't no way I'd be going on this journey to make the best TTS when the whole process is so unbelievably far from a non-programmer's common sense wake me up when you have a smartphone-like app that does the job in 5 seconds this whole tutorial is some bullshsit to me
OK, question: I assume it's a good idea to create a partition on your HD for the model to use during training, which is only for that model to use. How large should that partition be, ideally?
It's not working, when I put python -c "import torch; print(torch.__version__)" in it says python was not found but I have it installed. I have 3.10.11.
Shit tier license, 2b data set, nuking of 4b in favour of a rushed product by corpos, loss of devs due to absolutely mental decisions by managment. How can SD3 be a future of AI models with it itself has no future from the get go?
I couldn't get past 2 minutes of tutorial, there were so many errors, some I solved, others were impossible to solve. 4 hours trying. Still, thank you very much for your efforts, unfortunately I gave up trying
Only seasoned animators recognize how laughable this tool it... It was explained to me , and to be honest they giving us baby food and calling it solids. I think wait for a few more months for this to become truly ground breaking, for now not much time has been saved cuz you still have to edit arcs that dont make sense with a human intervention.
First video I disagree with you on ...respectfully. The future of AI Models is open-source. I do agree that the fee is cheap, but the censorship is an issue, not just for nudity, but for realistic art, you need to sometimes push the envelope.
So basically... I went to the video in the first place to learn how to properly caption training images after being intimidated by the thought of precisely describing everything for a style... only to be told to suck it up. UGH.
Finally, a video on style LoRA! Very thankful for it. My VRAM isn't so good that I can train/generate stuff on SDXL, but this guide is so general that I can rely on it even for stuff based on SD1.5. I suppose the only difference would be that I'll still be stuck on 512 x 512 resolution and some other minor differences. Thank you very much!
There are thousands of images of people lying down on the internet. It's crazy to me how they wouldn't include that in their dataset since it's going to be a fairly common thing people will try to get SD to generate. Dall-E 3 does it quite well.