Layered Depth Light Field Volumetric Video

Подписаться 1,1 тыс.

Просмотров 3,7 тыс.

50% 1

Cake Player downloads: joshgladstone.com/cake_player....
Direct link to the Quest Beta: www.meta.com/s/1g8erbRAa
My Instagram: / joshgladstone
Previous video on Multiplane Volumetric Video: • Multiplane Video - Vol...
Looking Glass Portrait: lookingglassfactory.com/looki...
Lume Pad 2: www.leiainc.com/lume-pad-2
Local Light Field Fusion: bmild.github.io/llff/
Google's Light Field project: augmentedperception.github.io...
Music generated with Audiocraft MusicGen: github.com/facebookresearch/a...
----
0:00 - What is Volumetric Video?
0:45 - Recap Multiplane Video
2:12 - Sparse Light Field Camera Array
2:49 - Processing Layered Depth Images
6:15 - Layered Depth Video
11:25 - Cake Player
11:56 - Cake Player on the Looking Glass Portrait
14:03 - Cake Player on the Lume Pad 2
15:19 - Cake Player in VR
18:16 - Download info

Кино

Опубликовано:

4 авг 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 56

@rubi-w- 3 месяца назад

DUDE THIS LOOKS AWESOME! Can’t wait for it to become a consumer technology, easy to use, to capture the most possible details out of your life.

@JoshGladstone 3 месяца назад

Thanks!! I hope you're right!

@rubi-w- 3 месяца назад

@@JoshGladstone Maybe with AI it will be possible?

@JoshGladstone 3 месяца назад

@@rubi-w- This technique already uses AI, but it's a fairly active area of research and there are advancements every year. The main issue isn't really the capture technology, it's demand. Volumetric content requires a different technology to view than a normal screen, so there's not much demand for it right now. Maybe as VR/MR headsets get smaller and more popular.

@rubi-w- 3 месяца назад

@@JoshGladstone Thank you for the clarification ^^ Gonna force my friends to buy a VR/MR headset. Already own a Quest 3 but am planning on switching to the Quest Pro. (Maybe it’s better to wait or save up for something else?)

@JoshGladstone 3 месяца назад

@@rubi-w- Unless you specifically want the face and eye tracking, I think Quest 3 is better than Quest Pro

@adammutchler4569 8 месяцев назад

finally got cake player on my quest 2. so cool to walk inside your volumetric videos

@JoshGladstone 8 месяцев назад

Great, so glad it worked! Thanks for checking it out! :)

@weevilman 8 месяцев назад

Veeery cool stuff, well demonstrated, great project, good on you!

@JoshGladstone 8 месяцев назад

Thanks!!

@Mac_Daffy 8 месяцев назад

very inspiring work. your progress looks substantial. thanks for explaining it in such a detail.

@JoshGladstone 8 месяцев назад

I appreciate that! Thanks for watching! :)

@NimaZeighami 8 месяцев назад

Soooooo sick!

@JoshGladstone 8 месяцев назад

Thanks!!

@AndyGaskin 8 месяцев назад

The distortions and artifacts are cool, artistically. Looks like a time ripple effect. Could be put to good use in a movie where some detective has to "scan and enhance" surveillance footage.

@JoshGladstone 8 месяцев назад

Glad you like it!! :)

@KuyaQuatro 6 месяцев назад

finally watched this; really impressive work! would love to check it out sometime if possible! also, the example footage i recognize that corner of sunset & wilcox; used to work close by, there were ALWAYS car accidents at that particular intersection haha

@importon 8 месяцев назад

Neat! Looks like your doing something similar to debevic's format of geometry layers with displacement from the depth map. What github project are you using to get the depth maps and in-painting? You still keeping your cards close to your chest?

@importon 8 месяцев назад

I think you meant "Post a comment if you have any thoughts or questions ...... that I may or may not answer"

@JoshGladstone 8 месяцев назад

Yes! It is very similar to the Google Layered Mesh representation, although the implementation and playback part is pretty different. You can PM me on instagram if you want more info

@importon 8 месяцев назад

That wasn't the part of my comment with the question mark 🙂@@JoshGladstone

@erickgeisler 7 месяцев назад

This is really cool. I wonder if you could calibrate colmap to know your rig so essentially you do not need structure from motion. This way scaling with more then 5 cameras would cost less processing. Also you could try in-painting on each layer to help minimize artifacts. I suspect having a larger distance between each camera could yield better results. Just a thought. Really cool stuff. Great work. I'm going to downlaod and start playing with cakeplayer. You could sync all camera with a timecode slate running on a phone or tablet.

@JoshGladstone 7 месяцев назад

Thanks! At the moment any movement whatsoever requires a completely new colmap solve to yield good results, so ever shot does need its own solve. Still working on that though. My earlier rig did have a wider baseline between cameras, and that did seem to help with objects further away, since there was more parallax differences, but it then struggled with closer objects. This rig is my attempt to split the difference and also have something more portable. Ideally sync would be at the sensor level, but really it's only an issue with fast moving subjects at the moment. If I'm able to somehow get moving camera shots to work, it could also be an issue there too, but for the moment sync is alright for what it is. I also need to experiment with shooting at higher frame rates to try and get the sync closer.

@Thats_Cool_Jack 8 месяцев назад

This is really cool. Ive always wondered if it would be possible to use the depth pass to create background masks that could be content aware filled with stable diffusion or something to mimimize distortion. Might not be worth it though with how much sd can variate between frames.

@JoshGladstone 8 месяцев назад

Check out lifecastvr.com, I believe they're using rgbd + stable diffusion to inpaint a static background layer.

@DavidAddis 8 месяцев назад

Quite amazing work really, well done! It's impressive what you can achieve with just 5 camera angles. Would you get some benefit from moving the cameras further apart? Funnily enough I just watched the shell rendering video from Acerola a few days ago - seems like a (coincidentally?) similar technique. I think if you can clear up some of the artifacts, platform holders will start to get interested!

@JoshGladstone 8 месяцев назад

Thanks, I appreciate the kind words!! Actually, my first rig was a 1ft by 1ft square, so the cameras were much further apart, and it was able to get good results at some further distances, but closer captures were challenging because a lot of the image was out of frame and there was no overlap. This rig is an attempt to be able to capture some closer subjects. I even had one version that had the cameras spaced even closer, so this is sort of the middle ground between the two. I'm still optimizing, but yeah for distances it's definitely better to space the cameras out more. It would probably improve things to use more cameras, but without a better way to control all of them, it's sort of a balancing act between capture quality, usability, and portability. It would really help if an action camera company got back into the wired array game, but that hasn't been a thing for years. It's an extremely niche market. That's one of the reasons most camera arrays are studio only.

@roknovak9991 2 месяца назад

Really cool work! I have some more technical questions about parts of the process. You mentioned a neural network that generates the layers. Is it a publically available network and model or is it your own custom network that you trained yourself? Also, how did you decide on 8 layers? Also about the compositing in unity. Do you deform each layer along the Z axis based the depth map?

@EmanX140 8 месяцев назад

Have you looked at Gaussian splatting? What are your thoughts on that?

@JoshGladstone 8 месяцев назад

Yes, Gaussian Splatting is very cool! In fact, the footage of my GoPro rig is a Luma AI gaussian splat! One of the issues with gaussian splatting and nerfs in general are the number of views needed to get good results, which is a challenge for video content. There has been some work in nerf video (Neural 3D Video Synthesis, DyNrf), but real-time playback is a challenge. It would need to be baked down into another format that can be streamed/downloaded and played back, and then you can lose a lot of the view dependent effects, i.e. the nerfiness. It is an active area of research though!

@sexy_koala_juice 8 месяцев назад

Wow this is really cool, i'm really interested in both VR/MR and CV stuff so this video was right up my alley. I wonder how this would work with 360 video cameras? Granted you'd lose some of the 360'ness because of the 3d printed rig. I wonder what effect that would have on the close/far spacing issue that someone else was talking about.

@JoshGladstone 8 месяцев назад

This project is specifically for planar content. Google's light field project was able to do 180º content, but they had a much different setup and different neural network. Planar content is simpler.

@sergioyamasaki 8 месяцев назад

have you tried to capture volumetric videos using the oculus quest 3 stereo cameras and depth sensor?

@JoshGladstone 8 месяцев назад

I haven't, but it wouldn't work with this technique. It probably would with my previous stereoscopic multiplane project, though. But the depth sensor wouldn't factor into that. I can't think of any projects that take stereoscopic images + depth as input.

@SoundGuy Месяц назад

I'm wondering if super sparse lightfeilds like this always require more than 2 cameras, or can you use already fillmed stereo videography to make things like this. and then i'm wondering if you can use AI to clean up the scene eliminating those ghosts and makeing more perfect layers and also guessing or imagining the missing informatoin. there are plenty of cool upscalers out there already available.

@JoshGladstone Месяц назад

At the moment, it needs more than two cameras. But my previous project was based on stereo pairs, check out my older video "Multiplane Video - Volumetric Video with Machine Learning". I even converted some historical stereo photographs to volumetric, which is also available on the channel. It's certainly possible that future projects could produce similar results from stereo images. In fact, there are projects currently in the works to produce similar results from a single image. Hell, there are even current generative AI projects that aim to produce similar results from 0 images! Crazy!

@AndriiShramko 8 месяцев назад

Great! Did you test a gaussian splatting for a volumetric video? Why did you choose this method but not splatting?

@JoshGladstone 8 месяцев назад

Splatting tends to need a lot more views to get good results. It's also an open questions on how to play back gaussian splats in real-time as a video. This can be wrapped into an mp4 and streamed on mobile hardware.

@JoshGladstone 8 месяцев назад

I did use guassian splats for the video though! The shot of the camera rig is a Luma AI capture.

@eagleed99 4 месяца назад

Were u able to do the luma pad version to download?

@JoshGladstone 4 месяца назад

I submitted it, but it still needs some fixes before it's able to be published, unfortunately.

@JoshGladstone 4 месяца назад

The Lume Pad 2 version is now live in the Leia App Store!

@dlawrence 8 месяцев назад

Nice work, Josh! Have you played with spatial video from the iPhone 15 Pro? It will be enabled in the next iOS release. When this happens it will be one of the most mainstream spatial video capture formats out there. It would be so cool to have a player that could display with this video.

@JoshGladstone 8 месяцев назад

Thanks! I haven't seen any samples yet, but Apple's spatial video is just stereoscopic 3d wrapped into hevc. There's no volumetric or 6dof aspect to it, so once the format is parsed it should be relatively simple to view on any VR headset.

@Naundob 8 месяцев назад

@@JoshGladstoneThat was my guess as well. But rumor has it that it will incorporate depth from LiDAR and/or some AI trickery to give it some 6dof flavor. It fact it would be rather disappointing if Apple would end up with only good old 3d video.

@JoshGladstone 8 месяцев назад

@@Naundob The frame it's displayed in has some 6DoFiness to it, but it's definitely just 3D wrapped into hevc in such a way that it's backwards compatible with 2D playback. There's more info here: developer.apple.com/videos/play/wwdc2023/10071/