NVIDIA’s New AI: Wow, Instant Neural Graphics! 🤖

Подписаться 1,6 млн

Просмотров 319 тыс.

50% 1

❤️ Check out Lambda here and sign up for their GPU Cloud: lambdalabs.com/papers
📝 #NVIDIA's paper "Instant Neural Graphics Primitives with a Multiresolution Hash Encoding" (i.e., instant-ngp) is available here:
nvlabs.github.io/instant-ngp/
🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:
Aleksandr Mashrabov, Alex Balfanz, Alex Haro, Andrew Melnychuk, Angelos Evripiotis, Benji Rabhan, Bryan Learn, Christian Ahlin, Eric Martel, Gordon Child, Ivo Galic, Jace O'Brien, Javier Bustamante, John Le, Jonas, Jonathan, Kenneth Davis, Klaus Busse, Lorin Atzberger, Lukas Biewald, Matthew Allen Fisher, Michael Albrecht, Michael Tedder, Nikhil Velpanur, Owen Campbell-Moore, Owen Skarpness, Rajarshi Nigam, Ramsey Elbasheer, Steef, Taras Bobrovytsky, Thomas Krcmar, Timothy Sum Hon Mun, Torsten Reil, Tybie Fitzhugh, Ueli Gallizzi.
If you wish to appear here or pick up other perks, click here: / twominutepapers
Thumbnail background design: Felícia Zsolnai-Fehér - felicia.hu
Meet and discuss your ideas with other Fellow Scholars on the Two Minute Papers Discord: discordapp.com/invite/hbcTJu2
Károly Zsolnai-Fehér's links:
Instagram: / twominutepapers
Twitter: / twominutepapers
Web: cg.tuwien.ac.at/~zsolnai/
#instantnerf

Наука

Опубликовано:

18 фев 2022

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 912

@dissonanceparadiddle 2 года назад

This is going to make photogrammetry so much easier

@ET_AYY_LMAO 2 года назад

I hope so, imagine being able to take a few images of a place and instantly be able to walk around virtually in this space. You could couple it to image search services and just type a location to find third party images, and be there.

@Qubot 2 года назад

Google Earth can be very enhenced with this method.

@dissonanceparadiddle 2 года назад

@@ET_AYY_LMAO imagine if vr headsets used this to finally make ar truly be able to interact with the environment. It's so fast that just a quick walkthrough would make a 3d map. Plus you could have it run in the background if there's any new positional info that needs to be added as you move

@WangleLine 2 года назад

I really hope so!! I hate how long it always takes to process my inputs

@dissonanceparadiddle 2 года назад

In fact with a few cameras something like this could make a 3d video phone call finally work

@draco6349 2 года назад

This is incredible. It's not real-time raytracing, it's an AI literally just eyeballing it. And it's MORE real-time than the regular methods. Can't wait for this kind of rendering to make it into games and simulations, it would outperform anything ever seen before.

@Dimension_eleven 2 года назад

Introducing ai vision to settings in fps games 😂

@draw4everyone 2 года назад

Imagine how much this will streamline workflows for 3D graphics designers! You’ll have updates to adjustments in SECONDS

@lorimillim9131 2 года назад

It also begs the question and possible reality or realities that if an AI can generate graphics on demand in this contextual sense how do you know what you see is actually what is represented before your eyes compared to something that's shown to another entity? Not taking any position just wondering what if?

@yellowblanka6058 2 года назад

@@lorimillim9131 Yeah, that has pretty frightening implications for media/the legal world.

@FunnyVidsIllustrated 2 года назад

@@lorimillim9131 I assume like we saw with deep fakes that a counter-ai trained specifically on the pitfalls of the technology in question gets rolled out at about the same time to counteract misinfo

@sqworm5397 2 года назад

Great video! It would be amazing if you had a second channel titled "Twenty Minute Papers," where you go more in depth on topics that interest you.

@TwoMinutePapers 2 года назад

You are too kind. Thank you so much! 🙏

@DonC876 2 года назад

Yeah i have been thinking the same lately, that i would love to dive deeper on some topics. @Two Minute Papers i think this is something you should seriously consider as a second channel - big fan of your work :)

@brightmatter 2 года назад

just saying, I'd subscribe to that.

@guillermojperea6355 2 года назад

@@TwoMinutePapers Karoly, we expected you to tell if you'd do it!

@v-sig2389 2 года назад

@@guillermojperea6355 a second channel with in-depth analysis of papers, which is a whole new huge project with hours of work for each episode ... yeah let's decide in a few seconds and announce it in a comment's reply 😂 Btw check the chanel's playlist, he has courses !

@halko1 2 года назад

I’ve used this about ten years ago and … It was labour intensive, took ages to complete and results were sketchy at best. Seeing where we are today blows my mind.

@brexitgreens 2 года назад

That was what I've just elected to name "classic photogrammetry". Back then, there was _no other,_ and AI was still barely something more than academic curiosity. Unless you count stuff like OCR as AI. I'm talking about the distant past of four years ago.

@AlvaroALorite 2 года назад

A lot of people are mentioning computer graphics as an interesting application, but are missing the bigger picture: This is using neural networks, and it's replicating a 3D environment from limited input, which is VERY similar to what our brain does (dreaming, for example)... This is amazing for neuroscience.

@jerchongkong5387 2 года назад

Neh, its amazing for rule 34 artist, imagine the possibilities. ( ͡° ͜ʖ ͡°)

@unintentionallydramatic 2 года назад

This is an incredibly important point. On that note. Let's also not forget that this means you can hyper-accelerate drug design from first principles based on receptor shape. There's already algorithms that can design molecules with a certain shape and algorithms that can search for a synthesis pathway. So you could theoretically feed the machine a series of images of a receptor and get a recipe for a drug targeting it out the other end.

@chrisray1567 2 года назад

It’s not just in our dreams, our brain creates a 3D environment while we are conscious too. Our eyes are 2D sensors. It’s our brains that combine that information into a 3D experience.

@stevenrogersfineart4224 2 года назад

100% FMRIs can already get enough data to discern if someone is thinking about a building/person/ animal etc. Until now the resolution was bad. If AI fills in the extra data reliably , we are not far from mind reading/projection :P

@rxtr664 2 года назад

" which is VERY similar to what our brain does" - Not really. Our brains probably don't need to "reconstruct" a 3d environment. It's already perceived to be a 3d environment, no need for "reconstruction"

@Rezmason 2 года назад

This channel will live to see the day when the pace of progress in this field will exceed the speed of publishing and paper discovery. The format may have to switch to a statistical approach that samples results from multiple simultaneously published papers to depict the state of the art. aka "paper transport" Then Nvidia will publish work on a hardware accelerated paper transport resolver that produces "perf"s at a rate faster than this channel. The papers will move so fast we won't be able to hold onto them

@Supreme_Lobster 2 года назад

This is the paper singularity

@nowymail 2 года назад

There are already neural networks that can learn by reading papers. And other neural networks that can compose videos. Not much more is needed.

@NotASpyReally 2 года назад

This is funny but could also end up being true WHY NOT

@THEMATT222 2 года назад

Very Noice 👍

@Peter-ik9fz 2 года назад

Two Minute Papers: a better paper in every 2 minutes 😲

@zynius 2 года назад

In a few years when this can run at 60hz+, all you need is a few cameras in a space and you'll be able to use VR to insert yourself in that location :O That will be completely bonkers!

@krajsyboys 2 года назад

As I understood it, it takes a couple of seconds to create the "render" of the scene but when it's done it is in fact running at 60fps. So I guess you can just have a loading screen or something before you get to see anything

@lorimillim9131 2 года назад

Imagine ditching the equipment too and having the AI embodied?

@XZYSquare 2 года назад

could even be next month lol

@michaelleue7594 2 года назад

This NOT creating a 3d environment, or even a single 3d object. This is creating a smooth track of 2d images. It's a very cool technique, but if you're imagining a game where you can do anything more than ride a roller coaster or something like that (and not change the direction of your camera) then this won't be applicable to that game.

@Mike7Lof 2 года назад

Yes, with one remark: it will be done with ONE camera.

@koendos3 2 года назад

Imgine rendering only 10 frames of a 100 frame animation. And then feeding it into this new AI. You'll finish your render 10x faster. Thats amazing!

@GierlangBhaktiPutra 2 года назад

I am eager to see the practical application to be available. As an architectural historian, it would help documenting architectural heritage much easier with more simple equipment!

@Adhil_parammel 2 года назад

But there is better laser scanner now to scan whole building. I have seen an episode about that in net geo

@virutech32 2 года назад

@@Adhil_parammel lasers are cool. Getting a near-perfect render with a phone camera & 2 minutes is objectively better. Especially if the site is hard to get to or not very secure. even if the laser thing is higher quality the lower cost & higher accessibility of this technique would still be mighty usefull.

@stub42 2 года назад

Careful. Remember that what isn't in the source photos has to be made up. Great for many applications, but not so great for archival and historical research. Is that the actual gargoyle , or invented from training data from all periods of history?

@BHBalast 2 года назад

@@virutech32 Also one could use small lightweight drone to take photos of places where using laser scanning is impossible.

@BHBalast 2 года назад

@@stub42 It's a valid argument, but the same is true for a laser scanning, and it requires human clean-up. After photo scanning with this net there also would be validation process.

@WikiSnapper 2 года назад

Can you imagine being able to apply this to table top gaming maps and have the ai fill in the game world as you zoom in!

@Robert_McGarry_Poems 2 года назад

And then turn it into a photo realistic image. What a time to be alive!

@Ginsu131 2 года назад

What exactly would be interesting about that?

@WikiSnapper 2 года назад

@@Ginsu131 A lot of time can go into making the details of a map in table top rpgs it would be awesome to have an AI be able to fill in that detail as it would speed up production and take a lot of mental energy off the GMs.

@rafqueraf 2 года назад

Imagine what cheap GPUs will be able to do in the future

@SnrubSource 2 года назад

nothing because new cheap GPUs won't exist anymore, they're all going to keep costing $700+

@ChuckSploder 2 года назад

@@SnrubSource 3090's will be cheap

@victorius2975 2 года назад

@@ChuckSploder trust me it'll stay the same and newer gpu's only get more expensive

@wojciechbem8661 2 года назад

Mining bitcoins I suppose.

@silly_lil_guy 2 года назад

My gt 1030 can run tetris! In 10 FPS... I meant 10 Seconds Per Frame

@danczer1 2 года назад

Is it possible to apply this on a video footage? It would be mind blowing to have this in a VR video player. Current VR video players play a video projected inside a sphere. Which not a real 3D, because once you start moving your head in 6DOF it breaks the immersion. But having a dynamic 3D mesh which would be used for the video projecton instead of a sphere, would be mind blowing!

@spyral00 2 года назад

the are so many applications to this, in VFX too

@cbuchner1 2 года назад

If that‘s the case, we‘ll soon be able to walk around in movie sets and possibly assume the role of a character, just like the book Ready Player One predicted.

@Danuxsy 2 года назад

@@cbuchner1 bruh Imagine being able to see boobs from other directions like that 🤯

@lucaspedrajas5622 2 года назад

@@Danuxsy and it it's gonna be the most rentable application of it

@georgri 2 года назад

I'd argue that machine being able to interpolate between photos is still far from understanding the actual geometry and synthesizing the scene from ANY point of view, as VR does require.

@sigmata0 2 года назад

Absolutely extraordinary. My first thought was that "Now we know what those old CSI shows were using to zoom into their videos to find important details" Haha Yet also, does this mean from a couple of photos we can now create 3D models which could plausibly be printed? Effectively advanced Photogrammetry?

@EVPointMaster 2 года назад

As cool as this tech is, you'd have to be very careful if you wanted to use it like this. The AI doesn't know the truth, it's just guessing one possibility of what could be true, based on the limited amount of information it was given.

@sigmata0 2 года назад

@@EVPointMaster I assume you mean the CSI idea. Yes that is a joke. With regards to the 3D printing functionality I assume you'd still need to do some work to get a useful mesh. However it looks really close to something that would be useful.

@ge2719 2 года назад

@@EVPointMaster youd like to think that enhanced footage was never used in criminal cases, but just look at the kyle rittenhouse case, the prosecutor submitted a frame that was an interpolated and upscaled image from a video. in order to try to make a very specific claim of kyle rittenhouse doing a specific thing. for one frame. it was allowed as evidence and even enhanced it was a blurry mess that showed nothing specific, but the prosecutor was allowed to claim it showed him aiming a gun at someone.

@fuchsiebabe 2 года назад

This has the potential to revolutionise filmmaking and visual effects. Awesome!

@Devoun 2 года назад

2 months from now it'll be finished training a week before we even start.

@ollllj 2 года назад

nvidia GPUs of 2021 are heavily optimized for matrix multiplication, with a mode for sparse matrices. It is mostly used for up-scaling, but can also be used as great noise-filter (audio and raytracing-denoiser) ray-tracing is also useful for more realistic audio. The more general applications of this are pretty much anything with linear-algebra, where ever you multiply 2 or more matrices, most likely rootSolvingInverse*projection.

@danielng1765 2 года назад

Sorry, havn't dive into the paper bcoz am totally noob in AI, any idea which graphic card they r using in this paper? Am bout to get a new rig with RTX 3070, but had put onhold due to the progress in AI for photogrammetry is too fast..

@technewseveryweek8332 2 года назад

@@danielng1765 rtx 3090

@ollllj 2 года назад

@@danielng1765 the RTX30xx series cards are for private house holds, nvidia also makes graphics cards for datacenters (commonly used to train ai models or for things such as sorting long prioity-lists for the internet). The server-rack architecture is similar, but the bandwidth and parallelization is much higher, and it scales the price to "millions per unit".

@ollllj 2 года назад

@@danielng1765 for the common high-tier-gamer (or indie gamedev and ai-code-learner), the playstation5 has a gpu, that compares pretty well to the GTX 2070, This is 2020 technology, significantly slower memory access, significantly worse for motion blur than 30xx cards of 2021. GTX 2070 card is currently still a relative good value (damm all the cryptocurrency scammers/thiefs) to put in a new pc, that costs up to 1100usd new. A GTX3070 is significantly better (>2,5x of a 2070), and commonly fround in new PCS that cost over 1700 usd. the "ti" suffix makes a significant difference and is not to be overlooked (commonly means: 24% faster memory, +20% power consumption +50% more cuda-cores.) This in general seems to appeal more for higher resolution, displays (up to 4k)

@danielng1765 2 года назад

@@ollllj thks for the advise, I initially decided to get RTX 3070 due to available performance comparison based on agisoft metashape. RTX 3070 has the good cost/performance balance compare to others based on their standard samples. Shall check 3070ti as well if the result is available.

@101perspective 2 года назад

I wonder how long until we will have interactive movies? Where they just film from a few angles and then at home you feed that footage into VR and can move around within the scene as it unfolds.

@StolenPw 2 года назад

When I was about 17 in 2011 I tried making my own version of what is essentually NERF to make buildings into 3D models really quickly just using photos

@DonC876 2 года назад

How did the results look in the end ? Would love to see that.

@URB4NR3CON 2 года назад

I remember Microsoft had a program that did something similar for popular tourist destinations, forgot the name

@StolenPw 2 года назад

@@DonC876 It kind of ended up looking a lot like how google maps does their 3D models just a lot more manual and a lot more buggy but it did kind of work

@ChuckSploder 2 года назад

@@StolenPw do you still have that program, and could you make a video of it?

@RainFox84 2 года назад

@@URB4NR3CON Photosynth

@markwood1855 2 года назад

Thank you for making Two Minute Papers. Your excitement and enthusiasm is infectious! And I always find myself getting more and more enthused as I watch your videos and see the pace of progress. It's nice to see a channel that just... Makes me feel that the future can be bright.

@jamesabell9494 2 года назад

Amazing! Thanks for your videos, they really keep me up to date with visual AI.

@jonnyhifi 2 года назад

Astounding - I can see 3d scanning apps / software will pretty soon become “trivial”on phones etc. … which itself is astounding never mind all the other stuff !

@dilonardomultimediaproductions 2 года назад

There are a lot of amazing two minutes papers, but only a few standalone software are available for this (like the ones from Topaz Labs). When will this technique be available easily for everyone?

@B0A0A 2 года назад

It can take several years to ten years. No matter how great the performance is, if it is a specialized single function, there is little motivation to offer it as an easy-to-use application. If this feature can be further developed and used for all kinds of surveying on construction sites, people's productivity will be visibly improved.

@brexitgreens 2 года назад

@@B0A0A Why the society cannot simply into Kickstarter some developer?

@GunwantBhambra 2 года назад

@@brexitgreens Nvidia don't need Kickstarter they will loan the tech to game devs to earn royalty

@mustardofdoom 2 года назад

The method is already available on GitHub. For commercial use, the authors say that Nvidia should be contacted. So it is a matter then of going through a sales process to generate interest between stakeholders, agreeing on pricing, and going through legal. Only then would a commercial agreement begin. And if the goal was a graphical interface you'd have to give ample time to develop this, bug test it, and run perception tests to ensure that it is user-friendly. It altogether takes a while and can explain the lag between the newest results and easy-to-use graphical programs like Topaz Labs. I work on a sales team for a commercial scientific image processing software. I suggest new ideas to our R&D regularly. Maybe 2%-5% of ideas are accepted and end up in the product. For these features (which are mostly low-effort high-reward due to cost considerations), it often takes 2-3 years. And that's when we already have an application and team to build the new feature in to.

@popcorny007 2 года назад

@@mustardofdoom Thanks for the detailed explanation, it really puts things into perspective

@J0R1AN 2 года назад

NVIDIA has been going crazy with these AI papers

@ep1cn3ss2 2 года назад

This is unbelievable. Phenomenal work, can't wait to see the applications! Especially in photogrammetry.

@heliusuniverse7460 2 года назад

what caught my attention is the neural representation thing. Can it be used for image compression? I imagine there's lots of room for improvement from other current methods like jpeg, which doesn't really understand the image

@frenzscivola3099 2 года назад

great idea! It would also be easy to generate data. What is the state of the art on this?

@taktuscat4250 2 года назад

Nvidia maxine exist as neural video compression

@alihms 2 года назад

I believe this is what they are aiming for. Extreme compresstion for photos and video while preserving the important details. Say a video of a footballer kicking a ball inside a stadium. The important details such as the facial expression and the actual movement are preserved. Non-important details such as the field, the spectators, the roaring sound can be compressed. During playback, these non-important details are then procedurally generated. This is somewhat anolagous to how we stored information in our brain.

@tiefensucht 2 года назад

Making a game in the future: "Hello Siri, create me a game that plays like Doom 12, but with Disney-Characters that look like musicians that are currently in the top 20 charts and this all should happen in Tokio at daytime, raining. End boss should be a giant paper."

@magen6233 2 года назад

that's what OpenAi Codex and Github Copilot are doing (not at this level, but they are quite good)

@hondajacka2 2 года назад

Crazy. Can’t wait to see applications of this come out.

@theencore398 2 года назад

Papers so fast, they won't even let ya hold them, damn. As always really informative and fun video sir.

@Khether0001 2 года назад

Would be interesting if there was a follow up showing us where these technologies are eventually available in actual commercial products But I know this isn't the scope of the channel.

@themore-you-know 2 года назад

I can already tell you: - Video games production will just be... wtf-level of workflow improvement ; - Google Earth, which already have bazillion pictures, will either be banned or transformed into a life-like simulation ; - Military analysis and transport will make use of the above (Google Earth) ; - Urbanism, real-estate construction and sales of real-estate ; - Back to Google Earth, the applications are just crazy: imagine a traffic application where you can get a life-like visualization of the traffic, making it seem so much more reliable to the viewer than a red line on the road? - I'm going nuts thinking about all the crazy ways this will affect us in our day lives. ... aaaaah, just the VR-Google Earth life-like simulation is utopian/dystopian enough.

@Instant_Nerf 2 года назад

@@themore-you-know Im working with google earth studio.. combine that with blender -and Video editing .. you can create a realistic 3D scene. The problem is close up... the textures/models are so broken.ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-UrvKsuDSaNE.html

@Hexcede 2 года назад

This is astounding! It's so fast you could train it on the user's hardware, no need to include a trained model ;)

@eragon_argetlam 2 года назад

This is insane. I've watched many of your videos, but this is the only one so far that *really* seems like straight up magic.

@Zoza15 2 года назад

Its always exciting to see new videos on innovations in A.I and software Karoly ✌🏽👍🏽.. Assets creation is made a lot easier and faster it seems.

@nemonomen3340 2 года назад

For the program that showed more detail when zooming in: was it an AI that filled in missing detail or was it an AI that made the high def image less “costly” by simplifying the image when zoomed out?

@Robert_McGarry_Poems 2 года назад

The AI created the detail from that one single grainy, almost black and white image. It is so good at creating the next layer that it can procedurally generate ever increasing depth of image. Pretty neat stuff.

@jessiejanson1528 2 года назад

From my understanding, and i dont think in this case he explained it vary clearly... An AI seems to take a large image and train another AI with it in order to reduce the size of the image since the new AI will then be able to fill in the details based on its training. So each image would contain its own AI or be paired with one specific to it. So long as the end result is smaller its a win. Though the explanation like i said is lacking and it would have been great to see the original file size VS the end results file size.

@outlander234 2 года назад

@@jessiejanson1528 Well thats dissapointing in a way. But it could be used for compressing data immensly.

@anthonyrepetto3474 2 года назад

I remember hearing how, in 2019, AI was going to be in another "winter"... same thing said in 2020, 2021... but this really is just the dawning of it! :p That gigapixel compression, in particular, will be adapted to "memorizing" the input-output map of software, so that you can just use a look-up table, instead of computing values in code. It'll let us fix bugs by flipping a bit on the look-up table, without needing to find a way to "correct the software"!

@AdityaTripathi 2 года назад

I can't wait for research like these to end up in game engines, truly incredible!

@TeamJackassTV 2 года назад

Man do I love everything you put out! Thanks for the time and effort you put into these videos!

@silentbob1236 2 года назад

I'm curious about how accurate the models are. Photogrammetry has not had good accuracy in the past, I wonder if this changes that.

@GD15555 2 года назад

If it can also convert it to clean quad poly it will be amazing

@J3R3MI6 2 года назад

Yes retopology and UV mapping is the most annoying part of 3D design. We are close though 🔮

@Turruc 2 года назад

Your passion is absolutely contagious. This is amazing!

@eamonia 2 года назад

What a time to be alive, indeed! And your documentation of these accomplishments will be preserved forever. Thanks Doc 😊

@sharky278 2 года назад

Impressive the time scaling in just one year ("O"). I'm expecting a generalized model with a "kind of 3d segmentation" to change parameters in materials or add physics for the future ...in any case this is the first step for the "synthetic" rendering. Amazing ♥️

@DarkSwordsman 2 года назад

This has me excited for the future of video games and other simulations. I obviously an enthralled about the idea of a "Full Dive" video game like Sword Art Online. Seeing things like this, Unreal Engine 5 with Lumen and Nanite, AI in general, as well as what Gaben and Elon have been doing for neural interfaces, as he excited for a future where we can be fully immersed in whatever scenario that we want. It's definitely a pipe dream of sorts. But I can imagine a future where we have insanely detailed, low cost simulations, as well as the ability to dive into these worlds with all of our senses. It is a driving factor for me to learn more about ML, AI, and video games.

@jimj2683 2 года назад

same here. Imagine how good GTA Earth could be!

@Settiis 2 года назад

Out of all the AI shown in this channel that has blown my mind, this is probably one of the most impressive.

@Connor3G 2 года назад

Incredible stuff to say the least. Imagine having a few pictures of your old house that turn into a full VR space in a few seconds...

@eelcohoogendoorn8044 2 года назад

Its interesting how you have completely moved away from any attempt at explaining the papers you present. I suppose that makes sense for the 2-minute format; I always scroll to the results section first anyway. But a tiny bit more depth and context wouldnt hurt. I was assuming this papers content must be all about how to leverage a whole datacenter full of GPUs in parallel; but its even more mindblowing to see their abstract mentions a single GPU... now thats a bit of detail that would really add to the presentation of this work.

@mynameisal7 2 года назад

So could we use something like this to create a real time driving simulator? The AI could use the input data from something like google map's street view and edit it into an interactive 3D environment.

@DanielHJeffery 2 года назад

Already have it downloaded! Going to use this!!!

@sabrango 2 года назад

Wow, this shows deep learning is great for large data if we teach they properly!

@tobiascornille 2 года назад

Seems super cool! Don't fully understand what the technique is doing exactly (in terms of which inputs and outputs), though.

@skarfie123 2 года назад

I find this with most of his videos

@georgri 2 года назад

It interpolates between given photos of a scene.

@Pixelarter 2 года назад

Basically they developed a multiresolution input encoding that simplifies and allows to highly parallelize the task, taking full advantage of the GPU. They apply it to different techniques (NeRF, Neural Gigapixel Image, Neural SDF, and Neural Volume ). From the paper website: *_Instant Neural Graphics Primitives with a Multiresolution Hash Encoding_* _We demonstrate near-instant training of neural graphics primitives on a single GPU for multiple tasks. In gigapixel image we represent an image by a neural network. SDF learns a signed distance function in 3D space whose zero level-set represents a 2D surface. NeRF [Mildenhall et al. 2020] uses 2D images and their camera poses to reconstruct a volumetric radiance-and-density field that is visualized using ray marching. Lastly, neural volume learns a denoised radiance and density field directly from a volumetric path tracer. In all tasks, our encoding and its efficient implementation provide clear benefits: instant training, high quality, and simplicity. Our encoding is task-agnostic: we use the same implementation and hyperparameters across all tasks and only vary the hash table size which trades off quality and performance._ _Abstract:_ _Neural graphics primitives, parameterized by fully connected neural networks, can be costly to train and evaluate. We reduce this cost with a versatile new input encoding that permits the use of a smaller network without sacrificing quality, thus significantly reducing the number of floating point and memory access operations. A small neural network is augmented by a multiresolution hash table of trainable feature vectors whose values are optimized through stochastic gradient descent. The multiresolution structure allows the network to disambiguate hash collisions, making for a simple architecture that is trivial to parallelize on modern GPUs. We leverage this parallelism by implementing the whole system using fully-fused CUDA kernels with a focus on minimizing wasted bandwidth and compute operations. We achieve a combined speedup of several orders of magnitude, enabling training of high-quality neural graphics primitives in a matter of seconds, and rendering in tens of milliseconds at a resolution of 1920x1080._

@tobiascornille 2 года назад

@@Pixelarter Thanks! Are the inputs to all of these tasks the same (2D images?)? Cause the tasks sound quite different, so it's cool that their encoding works for all of them

@Pixelarter 2 года назад

@@tobiascornille No. Some are 3D, some are 2D. From what I glance, the hash they developed just encode the position of the input at different resolutions, concatenates them and apply some transform, and feed as input along regular information.

@hellfiresiayan 2 года назад

We're at the point where it has all become magic to me. One program does all this?? How???

@laurent-minimalisme 2 года назад

Deep learning abstraction... first layers do the stuff, just plug your application on the top.

@Wertsir 2 года назад

The same way your brain does, evolution.

@Andytlp 2 года назад

Yeah they give descriptions of a scene or draw primitive zones like this blob is water, this is land, this is trees, this zone is sky etc and a.i paints a picture. That is magic indeed. One upping that is telling a.i to do a task or write a program and it does it in a matter of minutes or hours.

@gurglenurgle6539 2 года назад

Great video as always!

@viperlimo 2 года назад

We'll soon be able to walk around inside our favorite movie scene sets. The implications of this are insane.

@brightmatter 2 года назад

So I am curious; can the AI transition past context knowledge to a current context knowledge, and retrain on the fly? For instance, if you see a painting of a woman and contextualize it as 'old painting of woman with no background'. Then, as you zoom in switch the context to the current frame and re contextualize 'old painting of woman's head'. So portrait -> head -> face -> nose and eye -> eye -> pupil -> retina -> optic nerve -> light sensing cells -> cell wall -> DNA At some point you are only providing that last context to the AI to create the new image from pre-trained understanding. i.e. you reach a point where you are not caring about the reference material anymore.

@Robert_McGarry_Poems 2 года назад

I got that from this explanation. The layer that it produces is good enough to use as input. So, yeah at some level it is creating images based on pixels that it generated.

@ge2719 2 года назад

i imagine its trained based on images that exist, and learns what detail should exist in such an image, and then fills it in. so you can zoom in on a city because it knows what buildings, cars, roads, people look like. but it wont understand that zooming on on a persons face would reveal skin cells, unless it was given high resolution photos of skin, that transition to the microscopic level to reveal cells. also, if its given context that its a painting, at some point you would want it to assume its zooming on on oil, not assume that the artist was able to paint in individual cells in the painting because its a painting of a face.

@MarkEichin 2 года назад

I'm not completely clear on what the gigapixel-image one is actually doing - taking a gigapixel image, building a model, and then keeping only the model (which is smaller? how much smaller?)

@phmfthacim 2 года назад

Mind blowing results

@dprezzz1561 2 года назад

Amazing work. I cannot wait for a smartphone implementation.

@wagnercs 2 года назад

Great video! Thanks for all work! However…….. Is it possible for people outside the research team test it? Are all of these available for us mortals? Can we test this? Or these will only go to Nvidia? Sorry for the dumb questions…

@thegeekclub8810 2 года назад

The code is on GitHub! Dunno how hard it would be to actually get it running, and the GitHub page says you need a Nvida graphics card, but it is available to the public if you know what you’re doing!

@daanhoek1818 2 года назад

@@thegeekclub8810 I got it running just now. I'm on a GTX 1080 and it runs quite slow, but I can train the fox one and look around in low res. Works pretty well. Setting it up is quite impossible if you don't know what you're doing. But you can always try. I tried training one of my own datasets but it throws an error. Still working on that. edit: The GTX 1080 is probably much slower because it is not RTX . the Geforce RTX line of cards have tensor cores which are more optimized for this kind of job and my 1080 has none.

@Piyush10129 2 года назад

@@daanhoek1818 can you please share the link?

@wagnercs 2 года назад

@@thegeekclub8810 Thanks for the information!

@brexitgreens 2 года назад

Sir, your hype is both awesome and fully appropriate in equal amounts. I learned about NERFs only a few days ago. You don't exaggerate when you say that this was science fiction only four years ago. Back then, I intuited the possibility of AI photogrammetry - in the same way Star Trek intuites warp drive and the holodeck. And now - it is here. The tech straight out of my dream. What a time to be alive!

@jmendezsj 2 года назад

Another great paper. Amazing!

@noobcaekk 2 года назад

Wow this is INCREDIBLE! I mean, there's just no comparison between the 2-month old and 1-month old papers. Unbelievable how crisp and smooth everything comes out to be.

@Sven_vh 2 года назад

Hey, so I'm kinda new to this whole AI enviorment but this looks amazing! Is this public? Like can I upload some pictures and the program maken an 3d object out of it?

@maythesciencebewithyou 2 года назад

There is a link in the description to it.

@prestow 2 года назад

Its becoming easier to accept we live in a simulation.

@cks2020693 2 года назад

imagining importing all the google maps drive-by photos into this AI

@SlinkySlonkyWaffle 2 года назад

I wonder if this could be used to make 3D scanning through photogrammetry incredibly fast, since it can detect the geometry of such few pictures SO fast.

@imveryangryitsnotbutter 2 года назад

If it only takes 5 seconds to render a still object, and that same rendering speed is applied to each frame of an actor's performance filmed in 60 FPS from multiple angles, then each minute of footage should take about 5 hours to render. If you're a small game dev studio, that means that you can basically feed your dedicated workstation a few minutes of footage and leave it running overnight, and you'll get the final animated asset rendered in a day or two. What a boon this would be for Myst-like adventure games! EDIT: Actually, not even a day or two! If it takes 2 seconds per frame, then even five minutes of footage would be rendered in just 10 hours! You could leave it running overnight, and it would be done the very next morning!

@Lord2225 2 года назад

It takes 5 seconds to train nn, and evaluation (rendering) is real-time. Acording to paper this animation (5:02) can run 133fps in 1080 on rtx3090

@magen6233 2 года назад

that would be quite heavy to replace your model on every frame of animation.

@Lord2225 2 года назад

@@magen6233 Idk I did not red whole code yet. From what I read, I understand that synchronization is done with m_training_stream (for training) and m_inference_stream (for rendering) (these are cuda's streams and are used for runing kernels asynchronously). Whole magic happens in Testbed::NerfTracer::trace() function and train_nerf. I think that they are coping sth but for sure not every frame (update_nerf_transforms function, copy sth every training step).

@SYBIOTE 2 года назад

This is just amazing but, other than image compression, I fail to imagine how this technique can be integrated into existing software

@TheChenchen 2 года назад

Dude you can create 3d models from photo alone

@skunko1871 2 года назад

I'd love to use this on Google Earth. Imagine you can slowly walk down the street instead of zipping down a few meters at a time

@Hexcede 2 года назад

Photogrammetry, self driving vehicles, and, an interesting and maybe not so feasible one, pre-rendering a complex scene and simplifying it to be displayed on lower end hardware using this technique. From what I read, it produces an SDF, which is super awesome because they're cheap to render and can offer a lot of mathematical meaning, e.g. with self driving vehicles

@SYBIOTE 2 года назад

@@Hexcede yes it creates 3d models, but how would you integrate it into existing software would be a bigger challenge. 3d models to be used in applications need to be highly optimised as well (topology , different maps and stuff) I get that this has amazing and varied applications but I fail to see how it can be seamlessly integrated into existing software say blender or reality capture, unity, etc

@ge2719 2 года назад

@@SYBIOTE well obviously it would need other processes added to it. for say if you wanted to use this to create cgi characters. youd probably start with a person in minimal clothes, then add the clothes on top after. if you want it for map making, then youd remove everything you want to be interactive and model those objects separately, so the level, the walls, floor, etc is created using this technique, and you dont have to cut things out of the model. combine this iwth teechniques for removing the specific lighting condition and being able to use an in engine lighting, which weve already seen in other papers. these things are all literally just research papers. how they get applied to software that is end user friendly, even for professionals using complex software, is years down he line, and will likely require companies like weta to create them.

@RamenPoweredShitFactory 2 года назад

Amazing, this is so close to being real time.

@simian.friends 2 года назад

I love it whenever he says ''yes''

@Vini-BR 2 года назад

Can't wait to see that reconstruction applied to the Google Street View or similar

@tartarosnemesis6227 2 года назад

Yes, I also immediately thought of that when I saw the video. Like GTA in the real world.

@mikiqex 2 года назад

I'm wondering about the storage difference. Images are huge (there are a LOT of them), but this is probably way bigger. Or is it...?

@Vini-BR 2 года назад

@@mikiqex maybe the neural network could generate the intermediates in real time someday?

@MrImperativeoz 2 года назад

This channel latelly seems more about black magic than science.

@Zoza15 2 года назад

We are entering Dr Strange realms here 😂..

@AykutKlc 2 года назад

Google Street View with this would be mindblowing.

@AgentParsec 2 года назад

Musicians would call what you keep doing with your voice a "mordent". But seriously, I've been impressed by the pace of AI predictive algorithms in recent years, and it keeps getting better.

@colox97 2 года назад

with that scene in paris i had an epifany about this being used in google maps🤯 imagine how much better it could become

@lelsewherelelsewhere9435 2 года назад

USE THIS ON THE PATTERSON BIGFOOT VIDEO! (The classic one for the 1970s)

@oceanbreeze3172 2 года назад

This video was an absolute showstopper! I feel just like how I did when I first saw what AI was capable of!

@sirbackenbart 2 года назад

The 3d world this method creates really remind me of the GTA V world. You could take stuff from that AI and just put it in the game, probably without much further editing. That's impressive.

@jorgemfgoncalves 2 года назад

I am at a loss for words with these results. It's just astounding.

@emperorsascharoni9577 2 года назад

Wow this is so cool. What a time to be alive.

@stoef 2 года назад

This is truly incredible. Unbelievable! What a time to be alive!

@DreckbobBratpfanne 2 года назад

I wonder what the next big thing in AI after deep learning will be. The pace is already brutally insane. How fast will it be with even better methods

@epiczeven6378 2 года назад

2 seconds??!! It´s here, it´s now, let´s digitize the entire world! :D

@sc0rpi0n0 2 года назад

This is a massive breakthrough in 3D photogrammetry for sure!

@sieyk 2 года назад

It's actually mindblowing how useful this will be for static environments.

@somerandomautisticguy2181 2 года назад

Holy cow This will improve virtual graphics like 100 folds

@LucGendrot 2 года назад

I couldn't hold on to my papers for this one. Incredible!

@calibaba2739 2 года назад

Wow this technology is amazing. I’m a boxing fan. I hope this can replay some classic fights viewing from angles. Thank you 🙏👍

@attitudego 2 года назад

So realistic avatars for VR gaming using your phone? Damn, the meta universe will be salivating over this paper.

@1MarkKeller 2 года назад

This is incredible!!! The potentials in VR, AR alone ...

@Aliketie25 2 года назад

WHAT!!!!! This is amazing. What a time to be alive

@uirwi9142 2 года назад

Utterly astonishing!

@jenkem4464 2 года назад

Imagine movie making with this, or stage performances played after the fact in VR. It sounds like if you have enough cameras, like the original matrix setup, you'd be able to process possibly 1 second of film or movie in about 1-2 minutes...that's amazing! To have that kind of viewer angle independent data sounds like the dream of a VR holodeck style experience is closer than we think!

@Chillingworth 2 года назад

I remember maybe 15 years ago when Microsoft had the feature to view landmarks from tons of different angles from user uploaded photos. But this is just insane

@iwanchandra3295 2 года назад

I want this mounted on video call and the other side could move around virtually without have to be pointed on camera

@PunmasterSTP Год назад

It’s incredible things have come this far. I’ve had fun playing around with Stable Diffusion, but I know that’s only a prelude to what we’ll see in the coming months and years.

@Qubot 2 года назад

Riping Sketchfab models will be so EZ in the future !

@magen6233 2 года назад

Thats currently easy.

@pulkitgera8509 2 года назад

This work really blew my mind. Still cant believe that we went from 2 days to 2 seconds. Although i don't know if its fair to compare results on cuda vs pytorch