This is hella cool. The only issue I see is that the neck collides and clips through the collar of her shirt on MH Animator. But aside from that its really really cool. Thank you for sharing this demo! I can't wait to try them out.
Actually, that clipping only happens because only the head is being animated here. If the whole character was being animated the shirt wouldn't clip, it would follow the head.
Nope, wrong guess. Nobody has got it yet. Basically shows that voice-to-voice AI gets the tone of the voice correct but unless the input voice speaker can imitate the accent and speech mannerisms of the output speaker, it's not a really a copy.
@@DdawgRealFX if you use mha you need to create a metahuman identity four face but you can keep using that identy unless the actor being captured changes
There is not much of a pipeline really. When I recorded the animation the iPhone also records a video and audio. You can place the audio track into sequencer and render out the whole thing with sound to get a video with your own voice. When you have installed RVC from the link in the video, you must choose a voice model and an input audio file, then you just hit the big convert button and listen to the voice it produces. There is a field in the browser interface where you can shift the pitch of the voice up or down till it sounds right. For example if you are a man with a deeper voice and converting to a woman, you may have to up-shift 6 or 12 to make it sound correct. Now you can run the audio track from your video through RVC to convert it. Eventually you load your original movie (with sound) into your favorite video editor (I used Davinci Resolve) and load the sound track from RVC as an extra track. You can then sync the RVC track to the original audio, turn off the original audio and turn on the RVC track and output your video. This may be simpler than you thought because RVC preserves the timing of the original audio, so the two will match exactly. Only the tone and overall sound changes. For a voice to "sound right" you need to be able to imitate the way the original person talks and their accent. For example if I use an arnold schwarzenegger voice I won't sound anything like him unless I try to imitate his accent and timing. I think this is why nobody recognizes the actress Amanda's voice is based on, because that actress has a noticeable European accent that I didn't try to imitate. I will try to do a tutorial later on, there are also a lot of other people on RU-vid with RVC tutorials as well.
In this video, the voice was recorded by LiveLinkFace (animator mode) and exported through metahuman animator. After all the animations were done I just added the audio from LiveLinkFace into the sequence and rendered out the movie. Audio usually gets exported as a separate file when you render so I used Davinci Resolve to add the original audio to the video or to substitute the AI audio version. The only problem is that LiveLinkFace isn't all that good at recording audio. I ended up plugging in a microphone with a gain adjustment which worked much better.
I've been trying to find a way to take the files used for LiveLink Face and reuse them later in Metahuman Animator. The goal is mocap that is real-time at first And then higher quality later. The best of both worlds.
I don't think this is possible, I believe when you are using Live Link Face in arkit mode it's not recording all the depth data that is required for Metahuman animator to work. To do this I literally had two iPhones stacked up, one running animator and the other running arkit. I'm not sure if you can even run the iPhone's depth camera and arkit face tracker simultaneously. It might be possible, but since both use the same physical camera...not sure. Actually, unless you really need a realtime preview, I think you can just use Metahuman animator. On a fast PC it doesn't take long to process it and produce an animation. The main thing that takes time is creating the "metahuman DNA" for the actor, but you only have to do that once per actor.
@GregCorson It was my understanding that LiveLink was the only option that used the lidar. Instead, it only required the known camera/len combo found in the iPhone models 10+. If that's true, then it means we might be able to use the recorded video from LiveLink to later drive Metahuman animator. I would like it to be real-time for live demo purposes.
Honestly I'm not sure, I believe there is additional data besides just the video that is sent over for metahuman animator, but I'd have to check. Pretty sure some of the data is based on depth because if you want to use metahuman animator with a normal video camera it needs to be a stereoscopic one.
Would so appreciate a tutorial on the voice changer please, just downloaded it (I think I just downloaded it) and extracted the Zip File, that was a scary moment. Thank you.
Both of the heads were rendered at the same time from the same model, the only difference should be the animation. Because they are side-by-side there is a slight camera angle difference and a slight difference in the lighting angles because they are both lit by the same lights. I can see a bit of a difference down near her neck, I'm not sure where that's coming from unless I somehow didn't set the level of detail override correctly, I will have to check. The camera is so close that they really should both be running max detail no matter what though.
Thanks, was searching for a comparison in the direction of Blender. Now I see, that it will not work the way I wanted... ARKit version is just worse. Too symetricall everywhere.
Metahuman animator can be very expressive, the ARkit face tracker is not bad, but has a limited number of variables to work with (around 40 I think). Though it still can be very useful as a starting point for animations. That is, get the basic animation with ARkit and then use something like blender to tweak it.