Thanks for making this video. A few things: (0:55) Lock the seed (manually set it rather than letting it randomize) so that "silver hair" gives you the same shot but with silver hair. (5:38) The picture did completely change. You have two choices: a) never go above "1.2", or b) shift the guidance by one or two. (7:50) Look carefully at the "0.7" version. You'll see she's inside of a building. This shows you that the seed wants to do indoors. Change "fire" to "burning building" to get a great shot. (8:18) A weight of one is useful for grouping multi-word terms. "(16th Century Sloop:1.0)" works significantly better than just "16th Century Sloop" where the adjectives can float. (12:50) Notice that only "underwater portrait" is a different shot; the rest are just versions of the same. That's because descriptors appended to the end are naturally weighted less, and so are just offering "noise" to the original. It just so happens the seed in question has "underwater" data outputs in can draw upon that was ignored originally but now is revisited. (12:59) Tell me why "concept art" would do anything. They're all "concepts" and they're all "art". That phrase only adds "noise", which is only helpful if your guidance value is too low. (22:43) There is no such thing as "resolution" markers. As evidenced by... (24:17) "Is there a huge difference? Not really." People throw all the "resolution markers" in and pray. When they get a good result, they claim victory and when things get worse, they assume it's something else. "8K" does not exist (1024x1024 max) and "Unreal Engine" is a video game platform. Why would things "look better" in video game format where we devs throw out detail on purpose to make the game run better?
Thank you for the detailed feedback. Getting to your questions - yes, I agree that "concept art" didn't/doesn't have much impact. I just wanted to get a cross-section of attributes I see getting used. Second, you are correct, 4k, 8k, and the other resolutions aren't actually putting the image in those resolutions. When the model was trained 4k and 8k images were sometimes used. Those images tended to have more details and sharper features. Using say "4k" isn't generating a 4k image, but rather you are telling the AI to make a clean crisp image similar to what you'd see in a high-resolution image. Sometimes it has nice results, sometimes it does next to nothing. I just wanted to introduce another tool people could add to their toolbox.
@@NotThatComplicated Thank you for the reply. Here's the error the Internet has told you and here's why "8K", as you say, "sometimes has nice results". I build my own datasets using DreamBooth. All incoming images must be a) 1:1 ratio, and b) 512x512 max. No 4K. Second, dataset makers don't add resolution to the prompt because every image we use is 512x512, so that's just wasted typing. So why does it "sometimes" work? If your guidance is too low and you add more characters (even 2 letters) to your prompt for subsequent images, the results will get better. Same is true of too high a guidance. If you remove words from your prompt, you now are at proper guidance and the result improves.
@WifeWantsAWizard hey, not sure if this is really the place to ask, but you both seem really smart so I figure nothing ventured, nothing gained. I generated an image, got a good picture of a character, and I want to use that image of a character in various ways. Different pose, different background, full body, etc. Do you two have any tips or advice to go about that, aside from inpaint to change/adjust the background on the image? Would that be more of a photoshop sort of job? Am I missing a useful tool or extension?
@@dylc3373 Two pieces of advice: 1) generating characters on a gray background and then removing the gray in Photoshop works great for me because I'm 20+ years with PS, however... 2) in the same way that using "(Actor 1 Name|Actor 2 Name:0.5)" can prevent randomness in your subject, you can also use "(Location Name:1.1)" to attempt to lock down the background in Stable Diffusion. The more unique the better. For instance, "(Hagia Sophia:1.1) hallway" is a very consistent background across all seeds versus "(Hogwarts:1.1) hallway" which will have zero consistency.
🎯 Key Takeaways for quick navigation: 00:00 📝 Exploring how to organize prompts for Stable Diffusion. 01:01 📝 Adding details to prompts for more specific outputs. 02:41 📝 Adjusting attributes' weights to influence image generation. 04:14 📝 Exploring different mediums: portrait, digital painting, and ultra-realistic illustration. 05:58 📝 Fine-tuning attributes like fire weight to achieve desired effects. 07:03 📝 Examining the impact of different artistic styles on image generation. 09:05 📝 Balancing weights between attributes to prevent canceling effects. 11:08 📝 Exploring how resolution markers like "portrait" affect generated images. 13:21 📝 Comparing the influence of different resolutions on image style. 16:02 📝 Using artists' styles to create different artistic interpretations. 19:15 📝 Considering the impact of different styles on generated images. 21:13 📝 Exploring alternate resolutions like "unreal" in the prompt. 23:18 📝 Comparing effects of various resolutions on image style. 25:00 📝 Adding depth of field to a prompt for enhanced image quality. 26:19 🔥 Using alternate resolutions and styles to enhance image details. 27:01 💡 Exploring color, lighting, and effects to add more depth to images. 28:26 🌟 Comparing different lighting and effects options for image generation. 29:20 🎯 Selecting the best combination of settings for the final image. 29:35 🎬 Summarizing the tutorial and plans for a companion video. Made with HARPA AI
Yes, the labels on the images were actually generated by the search and replace script. The group of images with the words across the top was all generated by the program at the time of creation.
First thing I'd like you to check...go into extensions, go to your "installed" tab and make sure you are on the most current version of both roop and controlnet.
most custom model it can generate up to 768:768, 960:640 or 1024:640 (double head may occur). basically double resolution can break the image, in this case 1024p. my rule for SD 1.5 max is 768:1024.
Yeah, sometimes I generate straight to 1024 by 1024, but for the sake of the tutorial I keep it at the default 512 by 512 for the sale of time. I think I did 768 by 1024 in my first tutorial. Funny enough, SDXL was trained in 1024 by 1024 images and you often get an undesirable result if you process at any less than that. Thanks for the feed back!
keep the good work bro, thank you, perhaps you should continue with mastering the prompt to know how to prevent color bleeding, minimize the token, using the capital word's secrets and how to use the additional prompt of the hires and the ADetailer to our side .. thank you again for sharing your knowledge.