I think that showing the outcome at the beginning of the video and only then starting explaining how it's done can be a kind of a "hook" for viewers. Keep it up!
It wasn't about Jennifer Lawrence, but using a prompt with a celebrity (that surely is included in the StableDiffusion model), helps improving the consistency between a sequence of frames.
Looks like the best way to do prepared this is to render the character against a green background and render the background separately. Then just key it out as normally.
Yes, green-screening the character in Blender and making 2 render-passes in SD, one with the character and one with the background, would also be an option. I used this method in one of my previous tutorials about creating an audio-reactive music video with StableDiffusion. Makes sense!
Right, temporal coherence still is one of the big issues with this technology. Still, it's getting better over time and just I'm trying to find some workarounds, by green-screening the character from the background and rendering it separately, which seems to produce better results in most cases. Also tried RunwayML Gen 1, but that's in my view not yet suitable for producing professional stuff, due to it's severe time limitations for rendering clips. Maybe I'll make a video about it, I I think I can add a valuable contribution to this topic.
This is incredible. But also insanely long and complicated. I hope someday they can make an AI that can generate this video all in 1 app. Something like ModelScope text to video synthesis 2. Or stable warpfusion. Or Runway Gen 1. Also you made this very long and complex. I think you could just use a video game like Blade & Soul to create the avatar, dance, background scene, and just record a video of the video game. Then input the video into stable diffusion. Using a video game could have saved you 100 of those steps.
I know, it's still a long process, but that's where we stand now. I'm working on a video to compare the pros and cons of Runway Gen 1 and Automatic1111 in video creation in a quick step by step guide, also giving some ideas how to deal with the Auto1111 temporal coherence issues. You are right in the sense that using a video game as an input could accelerate the process, but since I'm not so much into video-gaming, I had to go the more tedious way and create the input-footage myself. But it's a good advice that you gave!
Such a program has already been created by Epic Games. The Meta Human mobile application allows to record your movements as animation and voice acting in 5 minutes.
Great intro to that topic. But did you also manage to get a good video result, where the costume stays the same, as well as the background? If so, is there any chance for a follow-up video?
I'm just working on that issue! There've been a lot of developments in StableDiffusion since I've posted this video, and now it has become possible to produce stable, flicker-free animations of any person, just by using a Blender animation or a video input and a single facial photo of that person. I'm planning to post another tutorial in the coming week, this time using ComfyUI instead of Auto1111, as it's more versatile... just need to solve a few minor issues before I'm ready.
Well, the Deform extension still has some issues with temporal inconsistencies. You can get better results, if you set the Strength and CFG-Scale even lower, so StableDiffusion sticks closer to the original video. Also I found that batch img2img and ControlNet, with low strength and scale make the scene to stay more consistent.
Yeah, Automatic1111 is quite capable of producing great images, but there are still major issues with the temporal coherence in videos, which makes them look kind of trippy. But I think I'm about to crack that nut... just working on a new video, where I'll be addressing this issue (and hopefully can provide some ideas how to solve it). I'm also checking out some other stuff in this regard, like RunwayML Gen1, which is far from being a perfect solution, but gave me some conceptional ideas how to deal with the coherence issues in Automatic1111. Well, we shall see...
Yes, we're still at the beginning, but it's just amazing how fast the technology and the tools are developing. It's just great fun being part of it already at this early stage, watching it grow and adding some humble contributions to it.
Thanks a lot for this awesome tutorial and all your work. Would the ebsynth program not be a good step in the process to make the video look more consistent?
Yes, it might, and I've already installed the app, but didn't have the time yet, to get deeper into it. There's also an extension for Automatic1111 available under Extensions ->Availabe->Load from, which helps you through the process. Maybe I'll make a video about it, if I think I it can be helpful. Just working on another solution for improving the consistency, by green-screening the character in Blender and rendering it separately from the background scene, then making 2 render passes with batch img2img + 2 ControlNets, one for the character and one for the background, and putting them together again in my video editor. Looks promising and I'm going to make a quick tutorial about it soon, together with a short review about RunwayML Gen1.
dude that was so helpful but for i have Q i start learning blender but i want see if my pc i suitable for it or not the cpu is 10400f core i5 and gpu is 6900 xt 16g is that good for making animation or shod i upgrade it??
Well, Blender requires some efforts in order to get started, but once you're getting familiar with it, it's a wonderful tool for producing 3d renders of any kind. I'm also using UnrealEngine for cinematic 3d renders and, while it's a monster in terms of memory and GPU-requirements, it's also on top of my list. And both, Blender and Unreal are completely free. Yet, you don't need any of these tools for StableDiffusion, but can create your input-videos in any other way you like, be it a simple smartphone-camera, or even by recording scenes from a game, if you are into computer gaming. StableDiffusion / Automatic1111 is very well connected with NVIDIA RTX graphic cards. Besides my MacBook Pro M1 Max, I also own a middle-class PC with 32GB of RAM and an NVIDIA RTX3060, 12GB VRAM GPU, so nothing fancy, and that works pretty, pretty well, even beating my Mac in terms of performance. To my knowledge, AMD-cards are not so well supported as NVIDIA, but should also be able to get along with Automatic1111. Here's an article I found on Github, addressing this topic: github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Install-and-Run-on-AMD-GPUs I would just give it a try with what you have, and consider an upgrade only if you are not satisfied with it.
Yes, that's possible, it just requires a little trick. In Blender, render the scene only with the character on a green-screen background, hiding the rest of the scene and feed this video into StableDiffusion ( If you don't know how to create a green screen effect in Blender, take a short look into my tutorial about creating an audio-reactive music video at timeframe 9:25.). Next unhide the scene in Blender again, but hide the character and render it again. Then import the background render from Blender and the SD-rendered video with the character into your video-editing software (FinalCut, DaVinci, Premiere) with the character placed in front and remove the green screen from the character with a keyer - and it's done. Hope that explanation has been understandable, if you have any further questions, just ask!
I'm sorry but this is terrible. There are much better solutions for this then using deforum. You should take a look at the img2img batch processing. You can use that with multiple controlnets like depth, canny, pose and landmark all at the same time. And no need for generating a video file first, img2img takes a folder with images as an input. That should give you a good consistency. And then look at some flicker remove tutorial for the free version of Davinci-Resolve. You will be amazed of how much better the result is.
Batch img2img can give you slightly better results with low strength and scale, but temporal consistency is still a big issue with all these tools. I still think that Deforum is a great extension with many possibilities, like math functions and prompt shifting, but it's fairly complex to turn this great variety of functions into meaningful results. ControlNet has been a great improvement, no matter if you use it together with batch img2img or with Deforum - I think it's a must-have for most use-cases. For reducing temporal inconsistency it also seems to help using less detailed background scenes, or even separating the character from the background scene by green-screening it and using separate render-passes with slightly different settings for the character and the background, before putting them together again in a video editing software. DaVinci Resolve is great for pre- and post-processing, though I tend to prefer FinalCut Pro, as long as I'm on my Mac, as it has some pretty good tools for stabilizing and improving the optical flow of a clip. Still, my main focus at this time is at tweaking the settings in StableDiffusion in order to improve the temporal consistency. I'm also looking into some new scripts and extensions, and there are some promising concepts and ideas coming up, that try to address these issues, but I still haven't been able to find a convincing overall solution. Well, it's a steep learning curve and all the available tools still have flaws, but I think it's worth dealing with them, as well as sharing your thoughts and ideas with others, no matter how imperfect they still may be.
Please take a look at the ControlNet settings in the Settings Tab: Settings->Controlnet and make sure, that the "Do not append detectmap to output" box is checked. If not, please check it and restart of the webui. Also make sure, that the latest ControlNet version is installed (Extensions->Check for Updates). If nothing helps, try to remove the whole ControlNet Folder from your stable-diffusion-webui/extensions folder and reinstall ControlNet (Extensions->Available->Load From->ControlNet->Install). Hope that helps, if not, leave me another note!
Sure! If only single frames are defected, either just delete them from your output sequence, or replace them with the frame before or after it. If there are more defected frames in a sequence, import all frames into your video editing software as an image sequence, then delete the defected frames and interpolate the missing parts. Depends on which software you are using, but I think that most video editing apps are capable of doing that.
Thanks for the link! Looks like a professional tool for SD video creation, definitely worth taking a closer look, though I'm not sure about the real costs for using that tool. They seem to be working a lot with green-screening the characters, which surely helps keeping them more consistent. I've also done that in one of my previous videos about creating an audio-reactive music video, you then just need 2 render passes, one for the character and one for the background and then merging them together in a video-editing software, like FinalCut or Premiere. Again, thanks, I'm going to play around with it for a bit and see what I can do with it!
I still cant generate any asset in blender using stable diffusion,, even tho i already following the installation.. after i input the prompt it not generate anything.. can you help
Can you tell me a bit more about the problem, please. Is it that you can't create and export an image sequence in Blender, or is the problem that you can't render this image sequence in StableDiffusion/Automatic1111? I will like to help you, if I can!
No, I'm not using dms, but if you want you can simply send me an email to my channel address (blndrrndr@gmail.com) and attach the blender file, so I can take a look at it.
Would have been nice if you gave us the full clip dude 😎 that's like waving jenna Jameson in front of us retired pornstar and didn't tell us to go find
Yes, I tried to use a celebrity name for improving the overall temporal consistence of the character. It's not about her as a person, just as a stronger guidance for StableDiffusion than, for example, (a beautiful blonde woman).
No, sure no clickbait, I think the model I used produces very realistic images, the only issue with Deforum is the rather low temporal consistency, which is especially visible at the background scene. I'm just trying to find a way how to fix this issue, by green-screening the character and rendering it separately from the background, with the background scene render at very low scale and strength. Then putting them together again in my video editing software and remove the green screen from the character. I wish there were some buillt-in tools in StableDiffusion for keeping a higher consistence across subsequent frames, but still the tools seem to get better and better with each new version, so I'm pretty confident that we're on the right path. Just a few months ago nothing like this would have been possible to make, and the technology is advancing rapidly.
Well, it could also be done by simply feeding a dancing video into StableDiffusion, instead of creating one in Mixamo and Blender, but this tutorial was also meant to describe how to combine different technologies and tools for creating something new. Yes, still a long way to go, but the StableDiffusion tools are advancing so rapidly, so it makes me very confident that it's going to get a lot easier as we move forward.
@@-RenderRealm- direct me if there is tutorial for that as I am new to stable diffusion . This feels like when the iphone first came out . There is excitement all over
There are some good basic tutorials about StableDiffusion on RU-vid, that I would recommend watching as a starting point: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-DHaL56P6f5M.html ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-cVkMnskciHU.html ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-3cvP7yJotUM.html I've also listed some good channels covering various Stable Diffusion topics at the end of my video description. If you have any specific questions, don't hesitate asking me!
@@-RenderRealm- Thanks, I appreciate your reply . I’ll Go and check out the starter videos. I have already learned the basics but of-course , took me forever.
I see where we're going with this but I have to say until we have temporal cohesion it's kind of useless.. what's the good of a video where the dress and face changes 60 times per second?
Right, the temporal consistency is still an issue with StableDiffusion, but the tools are getting better rapidly, so I guess these issues will be only temporal, too ;-) I'm just working on another video, where I'm trying to deal with temporal inconsistencies by interpolating the input frames from a video with low strength and scale, hoping it will be more consistent - the Deforum extension doesn't provide this feature yet, but maybe it can be done with the video-input mode and a frame-to-frame interpolation with mathematical functions in the prompts (just like the Deforum interpolation mode works in the background, but with a video input and not just interpolating a series of prompts). Well, let's see how it will work out... the whole technology is still work in progress, but I believe we'll be getting there rather sooner than later.
@@-RenderRealm- Well I think they're going to solve the temporal issue pretty quickly here and then what's going to happen is 3D programs are going to become a thing of the past, programs like blender and Maya we will literally look at them and say "yeah that's the way we used to do things...", They will be nothing but relics of the past
True :-) I still love my "old" 3d tools, like Blender and Unreal (never worked with Maja) and hope they will integrate the new AI-technologies in a meaningful form some time in the future... but maybe they will just become relics of the past, as you said. No matter how it will turn out, the way how we will create digital artworks will change dramatically. These are fascinating times we're living in!
That would be a great step forward in improving temporal consistency in Stable Diffusion videos. Until then, we need do figure out some creative workarounds for this topic. I'm just trying to use frame-interpolation for creating a SD-animation... if it turns out to be a viable solution, I might post another tutorial describing this method.