What an excellent way to show off the differences! That must have been a bunch of work for a 2-minute video result. :-) Thanks so much for all your tutorials!
So I stumbled upon a certain "Royal Skies" stable diffusion install in 3mn. Praised the guy! 2 hours later, me "what the heck are the differences between the sampling steps...?" ... what do you know! That same Royal skies explaining me in less than 3 minutes! Ok you won a sub sir!
Those results will also depend on the sampling steps, where certain methods tend to require more. It would have been nice if that was described (or the settings you used for comparison), as well as the CFG scale. Personally, I have found the Euler Ancestral does well between 50 to 70 steps, whereas DDIM tends to require about 100 steps to get a good result. Also, certain methods work better with higher CFG scales (~10), and others with lower (~5).
I've found the differences can actually vary even more strongly if you're working in different art styles or trying to get different props. Cybernetics is a keyword that can strongly show off a difference between the methods. So, while this is great, it's only the tip of the iceberg on how the different methods can affect things. That said, DPM Fast is wild. It's usually the least similar to what I want, but not in a bad way. DPM adaptive has also been the best with hands in my experience.
img2img also works well to get a sense of whats different between them, the eulers for an example appear to be better at composition then ddim when objects are replaced/added in the image
The effect of the different methods is much more pronounced with simpler prompts. Try with something like "woman wearing a turtleneck riding a bike in shorts". There are also big differences in some responses to the other configuration parameters, like sampling steps. In fact, with some, adding more steps can get you a completely different image - it's really easy to see when you watch the steps individually.
I followed your guide to installing Stable Diffusion and worked perfectly, but I tried changing the method and got a certificate warning in the console.
Great video. BTW, Euler is pronounced "oiler" and he was probably one of the most brilliant mathematicians ever born, which is why someone born in 1707 is still pretty awesome even today.
Can anyone give advice? Should I pay $10 for credits for Stable Diffusion (I used them all up doing learning tests)? Is there a difference between the online version and offline?
If you got a nvidia graphics card with more than 4gb of vram, save your money and get the offline version. So as long as your training models are up-to-date they should all be the same.
@@DrFeho It doesn't even need to be a super computer. Any GPU with at least 4gb of VRAM is capable of running it. That's most mid range GPUs for the last decade. It might take 30s per generation, but it's not that much slower than online, and it's free and uncensored so The Man can't tell you what you can or can't generate.
@@choo_choo_ it seems you do need a certain number of cuda cores, i tried it on my gtx 1060 and it refuses to run stable diff despite having enough vram
That 'borrowed' UmbrellaGuy Voice... sigh. otherwise I enjoy the vids. Ditched Midjourney because it censored too many words. E.g. 'censored' is censored in Midjourney... yes, censored, nit 'uncensored' as one might assume, the literal opposite. nuts and paranoid. In SD i can prompt freely - i may not get what i asked for, but it least it lets me ask for it.
you should delete this video honestly. so, there's a thing called "dumbing down a topic" so it's easy to understand, but that is not what you did here you demonstrated that you know actually nothing about the schedulers or the sampler types you can't just "XY" plot the samplers and then pretend to understand what they are doing they are mathematical operations and need to be looked at in that manner SDE stands for Stochastic Differential Equation. If you don't talk about that, or the concept of ancestors in sampling, then you're not qualified to talk about this subject
Everything I prompt is super abstract or very wonky regardless of CFG scale and Sampling. For example, I simply wrote "A dog on a sidewalk" and got some multi-head abomination. Anyone have any suggestions?
The more detailed the prompt, the better the results, generally. The point is that the whole system works by finding patterns in random noise. You can also get much better results using img2img - hand draw the scene in simple colours, and let the network fill it in. But overall, figuring out the right settings, the right source data and the right prompts is definitely an art.