As others have mentioned, not using this LCM at full strength helps if you are having issues with messy/distorted images. I'm getting pretty good results with setting the LCM at 0.5 with 16 steps. Still really fast, but with better looking generations. Also, I recommend trying this if you are having issues with the LCM while using models and lora's that are trained on a particular subject.
One issue is it makes animatediff not work well since animatediff usually needs more steps like 25-30 to get some good motion. Just wanted to put that out there, it does work with animatediff though.
@@alsoeris In automatic1111, when LCM is used in the prompt, it would look something like this... for half strength. for full strength for 20% strength etc.
Sebastian, I really like your videos and your simple way of explaining things. Could you create a tutorial or recommend a video for Stable Diffusion or CmofyUI on how to insert an object that has been generated into other scenes? Generate the same element in different scenes? For example, I generated the design of a new bottle and then, the prompt gave me a perfect result, after which I want to create an image of this same bottle in a scene with different angles or different poses (like a new photo of someone holding the bottle of juice, for example) It would be very interesting to have this type of video.
Tip for experimentation: use it like a regular lora and play with the weight. Some custom models that give horrible colors at 1, will actually work better at 0.7.
I've discovered the same. Also increasing steps to hone in on the right quality. Maybe not 1000% increase, but 500% is still pretty good :) Even going all the way down to .1 will allow some to work much better and still get the speed increase.
yes, I'd feel more comfortable using the standard lora syntax instead of this black box method from the dropdown. same with my saved styles. Anyone know how to see them again and not just the tabs to add them? (please don't mention styles.csv that's where I edit them).
I am also using an RTX4090 setup and i gotta say that i dont see much of a speed difference, however finding out about the comparison capabilities made it so much better to choose what model to use based on what i wanted to create. thank you for the info
@@joppemontezinos2092 You're supposed to use 4 to 10 sampling steps AND cfg 1 to 3. It's very fast and yields good results but it's honestly a godsendfor mass producing images. You can make 100+ images SO FAST you can just pick the best one and high-res that with a better config to get the absolutely best of the best results.
I made the same grid as in the video with 8 sampling steps for 2 cases: 1) with this LoRA and 2) withOUT it / None. The time to generate is basically the same (actually without this LoRA is 10 seconds faster) => so the speed depends on the sampling steps rather than LoRA. While quality => depends on the sampler but there are some VERY good effects without this LoRA at all for the same sampling steps. I can't see much difference in either speed or quality if the right sampler is used.
The point of using this lora & sampler is that you can achieve results in 8 steps that you otherwise might need 25 or more steps with other samplers. For the best quality, I'd recommend the Comfy route using the lcm sampler together with that Lora, as a1111 with another sampler is more of a half-measure atm.
@@sebastiankamphlet's be honest, nobody uses lcm if they are looking for the best quality. The only people using lcm are the ones with old pc's who want to have some fun poking a couple 512x512 still unusable image. On any high end graphic card, 8 steps vs 25 steps is only 1 second difference, no matter the model or sampler used, so something like the lcm makes no sense to professional users.
My not so "ptato PC" and my impatience thank you very much, I am your fan. I already passed the information on to my brother, I'm sure he will be happy too.
I've tried this with my ancient gpu gtx 970😂, generating 512x768, cfg 7, 30 steps image usually takes 42 seconds. With LCM it takes only 7 seconds, the result is comparatively good 👍
@@jibcot8541which 100% looks like trash and is totally unusable. Not sure what's up with people wanting to brag about being able to generate some tiny (512x512px) low quality images in a second.
I am a big fan of you. Thanks for sharing knowledge in easy to follow language while everything is explained within the details not like other radio just repeating information that sometimes is not fully useful. Your stuff is good. Got my like and sub and a long time follower. I am one of you as AI researcher. Thanks very much.
@@sebastiankamph It's been amazing honestly, an order of magnitude faster on my 1080, going from 20+ mins with hires fix to about 1.5-3 mins using lcm. I was trying it out with 1.5 yesterday and it's great too, went from about 3 mins to just 30 secs. It honestly makes the experience much more enjoyable for me, being able to see this kind of improvement.
fuck das ist das beste SD Video dieses Jahr, ich kannst nicht fassen, wie schnell man jetzt damit arbeiten kann! Nvidia kann ihre TensorRT extention in die Tonne hauen!
I found out the picture quality is worse ONLY when applied to custom SDXL models, when applied to SDXL vanilla, or SDXL SSD-1B, it's somewhat par in quality yet SUPER FAST!!! (Tested on ComfyUI, LCM SSD1-B, LCM Sampler, 8 Steps).
Useful info, thanks! Unfortunately, in my case, I'm often on custom checkpoints, but the methodology could be instrumental in making future iterations faster. 👏🤩
The SDXL lora does not seem to work for me. My RTX3060 with 12Gb VRAM gets 100% loaded and freezes the whole system for several seconds for each iteration. The outcoming images are usually a jumble of pixels. SD1.5 lora, however, seems to somewhat accelerate things for SD1.5 trained models.
Update: I wasn't able to get it to work, then found a post on Reddit which suggested deleting the "cache.json" file in the webui directory. I renamed mine to cache2.json (just in case) and sure enough the Lora tab was showing ssd-1b in it and noticed speed improvements. Must be a bug of some sort as the cache.json file showed up again and everything seems to be working
Thanks for the mega grid comparison - most of the comparisons so far are probably using the DPM 2M Karras, long time best performer, and seemingly terrible with LCM. I'll let the community do a few more evaluations with sampler and CFG before switching over.
I'm using the DirectML version because I have an AMD and I have to use my CPU and It's PAINFULLY slow. Will this help with that? Or is it only for those using GPUs? I actually have a really decent GPU (RX 5700 XT) but I sadly can't use it since SD hardly supports AMD.
Did u try it? I have rx 7800 xt and have the same problem. Looking for options to improve rendering performance. AMD released a video with a tutorial but I haven't tried that yet.
@@LinkL337 I have not I just sucked it up and using the painfully slow CPU way lol. I spent 7+ hours trying all types of things though and nothing worked. I literally have to use my CPU it seems.
I'm confused with trying to get this working with SSD-1B. I downloaded, put in the correct folder, renamed and it shows in the add to network prompt drop down, but so far notice no improvements and quality seems poor. I keeps seeing something about diffusers but not sure what that is all about . Going back to the drawing board lol
Is there also a way to enhance performance for image2image generations? I selected the Lora, adjusted the steps and the CFG Scale but the render time is still the same if not worse. Please help :'D
HAHAHA 😅 I ::: honestly ::: look forward to the Dad jokes 🤣 Even if I don't have time to watch the entire video when I initially see it, I will watch until the joke and then come back later 😆👏🏾
Oh and @sebastiankamph... I almost always laugh at your jokes even if my wife hates when I tell her them. Said the facial hair one to her yesterday because I DON'T like facial hair and she knows that! :)
i have a 3060 12GB gpu, was getting vram errors with this workflow on XL. process was rerouted to cpu. 50-70 seconds. so i suspected my vram was being squatted by orphan processes. rebooted and it's now working the way you describe. thanks.
Ok first run of video, very confused what the one step use to make it 1000% faster??? download "1" file?? you started download several files and what so lost..
hey :) did the KSampler changed with the last update? i get errors on all my animatediff workflows since i updated all comfi-ui. Error occurred when executing KSampler: local variable 'motion_module' referenced before assignment
Thank you! It works nice, Both A1111 and Comfy as well. But I have a rookie question. I can't save the Comfy workflow explained in the video, with the Lora loader node installed. If I save it as a .JSON file or PNG image it does not reload....
test LCM on stable diffusion - seems that img2img lcm and vid2vid has an error - TypeError: slice indices must be integers or None or have an __index__ method
I'm kinda new, but isnt it a problem if i have to use this LoRA? I mean, I can only use 1 LoRA at a time right? And if Im using this one it means I can't use another, which sort of defeats the purpose...
You can use as many LoRAs at a time as you like, there could possibly be a limit that i'm not aware of, but I know for sure you can use at least 4 or 5 at a time
Thanks as always! I have an off-topic question, is there any way to make StableDiffusion not show people but only clothes? I put no human, no girl, etc. in the negative prompt and it still shows people.
This is WILD! This ecosystem continues to boggle the mind. There's certainly some amount of "too good to be true" in here, such as the lora not playing nice with a lot of samplers, but cool nonetheless. Btw, a couple things I would have liked discussed / to see is how this performs with common current settings (i.e. higher steps ~20 / CFG ~5), and on other models even if just sd1.5 / sdxl based models. Even if it was just like 15-30 seconds showing a good model vs a bad model that you've found. ofc, there's always the whole "try it in your workflow to see how it is for you," just would be nice to know if I can expect this to work outside of vanilla sd.
Not optimized for a1111 yet. Im using a custom checkpoint, a1111, 1.5 same settings as in video. Im on a 1080ti, and the quality is worse and the generation speeds are faster, but lower quality image.
Strange, I did everything you said, but it took 7 seconds longer to generate. cinematic, techwear car Steps: 30, Sampler: DPM++ 3M SDE Exponential, CFG scale: 7, Seed: 4128880464, Size: 1024x1024, Model hash: 74dda471cc, Model: realvisxlV20_v20Bakedvae, Version: v1.6.0-400-gf0f100e6 Time taken: 17.2 sec. cinematic, techwear car Steps: 30, Sampler: DPM++ 3M SDE Exponential, CFG scale: 7, Seed: 4128880464, Size: 1024x1024, Model hash: 74dda471cc, Model: realvisxlV20_v20Bakedvae, Lora hashes: "lcm-lora-sdxl: 2fa7e8e56b09", Version: v1.6.0-400-gf0f100e6 Time taken: 24.1 sec. Tried it on another sampler, get a 2 second gain. Apparently it doesn't work well enough on all samplers
Hmm... why is it working for you and not for a lot of us in automatic1111? * Downloaded and renamed both Loras and put them into their Lora directory * Enabled sd_lora in User-Interface Option in main UI * Reloaded UI * Updated complete automatic1111 with all extensions * Restarted automatic1111 (ORIGINAL) * lcm Loras do NOT appear in the Lora Tab Gallery, Only in the unusable dropdown list if you have a lot of Loras * Tried all my Models AND Samplers for 1.5. and XL, all with really bad results with 8 sampling steps My Options in main UI (like the "Add network to prompt" dropdown is shown in the left column under CFG scale, seed, etc. Are you using a different version of automatic1111 or ist there something else that has to be anabled what a lot of us maybe don't have?
@@sebastiankamph I used Auto1111. I did Put the 1.5 lora in the lora folder, loaded a 1.5 model, added the lora to the prompt and set the steps to 8 with euler. Result looks worse than without the lora.
The video title is wrong. 10 times faster is 900% faster. The percentage is always 100% lower than you would intuitively expect from the factor. Just like 50% more is 1.5 times as much and 100% more is 2 times as much.
I am super confused, when I go to download the LCM model for SDXL, are we downloading the "pytorch_lora_weights.safetensors" file? I did that and used it as LORA, it is stuck! I am using a RTX 4090.
I would indeed say it's a trade off. I wouldn't call it vastly subpar with the LCM sampler and some finetuned settings. This is a good step in the right direction. If we would have bashed on Stable Diffusion day 1, we wouldn't be where we are today. This is a fantastic step forward where these ideas can be developed further!
The images you can get with the LCM lora and sampler is in no way garbage. Run it in Comfy today and you'll probably be amazed by the results at that speed@@BabylonBaller
@@sebastiankamph yes I activated SD Lora in a1111 cause I use sdxl on a1111 and I tried ....and..... Was a massacre, but I use 1.5 with Vlad (SD.NEXT) But problem ..... Sd_lora not appearing:/
@@DerXavia sssssht, don't mention the quality. it's all about speeeeeeeed now. to be serious: community will find out eventually: to get the same quality, we will end up with the old settings again.
@@sebastiankamph Just over 20it/s :) It is really nice to be able to get a bunch of images, pick a nice one and then use img2img or controlnet to refine it.
How do I make the interface like yours ? At the top where you select the model/checkpoint you have two more dropdowns to the right called SD_VAE and Add Network to Prompt. If somebody else than the video creator has the answer feel free to reply
tried this with sdxl with no good resaults. sdv1-5 worked great though. any ideas? was using sd_xl_base_1.0.safetensors [31e35c80fc] with the lcm-lora-sdxl on mac m1 if that makes any difference
I am not seing this effect on a Macbook pro. Yes I get a speed increase from doing 8 steps instead of 20. And yes the image has better quality, but the low cfg scale means i get a highquality image that isnt what I asked for. I am not seing any improvement in it/s .
It's a very different philosophy. I would recommend automatic1111 for beginner and also for flexibility. ComfyUI in my opinion is more specialized but you don't have as much creative power (the inpaint for instance is quite annoying to setup). I tried ComfyUI and I'm back to automatic1111, it gives me the best results (also I kinda lost my node setup for ComfyUI and it's a pain to do).
@@jonathaningram8157 thank you! I also have been using automatic 1111 atm, but saw so many videos for ComfyUI so I thought i'd ask. thanks for the response!