I’m starting to like the sound of Dev Log Mondays… hope y’all enjoyed the slight-more-in-depth-than-usual dev log! Let me know if you want to see more in this style ❤️ P.S. I got a little carried away with the thumbnail
I actually encountered the same issue with the irradiance shaders. One way I solved this was to tile the renderings. Basically render a subset of the irradiance map and split the queuesubmit calls so the driver never kills the task. I considered changing the driver timeout threshold. But not every pc will have that modification applied.
What you've encountered with the second issue is most likely the Timeout Detection Recovery. Actually as a 3D Artist working with Substance Painter, at the first boot, they are recommending changing the TdrDelay and TdrDdiDelay registry keys to a higher value. Those two keys specify how long should the OS wait before killing the driver. To make sure that the driver doesn't crash while doing a long computation.
If, by changing the registry, you mean the Timeout Detection and Recovery registry keys, to my knowledge people just got used to always break up long computations into several short ones because of different issues it was causing. I sometimes think it has become a habit and most people don’t really think about it much anymore... they just avoid it by force of habit... apparently, sometimes, it’s good to be reminded...
3 года назад
This. If possible, can you split the compute shader or the data and do multiple compute passes and then combine the output?
@ As long as the task has independent parts it can be split. In the case of irradiance calculations. Each Pixel of the cubemap is independent. But splitting that far down is bad for performance.
I've seen that happen in CUDA codes. As soon as you started describing when it happened, it rang bells in my head. The solution for that was to split the work into multiple dispatches.
@@kvatikoss1730 the multiple dispatches altogether won't take less time, but each of those dispatches will take less time, which is what the driver is concerned about.
I thought you have mentioned Nvidia insight tool in the previous video lol , I'm happy that you debug it & resolved the issue it should be so satisfying thanks for explaining how you figured it out
Came across the compute shader crash a few months ago. We run two computer shaders on startup for our app, once to create a normal map from a height map, and one to set a mask in the alpha channel from a 2d polygon that's represented by a bunch of 2d lines. 2nd compute shader was crashing on some pc's, finally figured out the driver was dying from taking too long. So had to break up the compute shader into a couple of passes. Really makes me question using compute for large tasks if the driver can just crash the application, especially when we don't know the full range of cards our app runs on. A gl/vk call to disable the timeout would be really handy since setting client pc registry settings isn't ideal. Until then, the only fix is to be really really conservative with your compute calls!
Finally getting a chance to watch this and I have to say (in devops land), troubleshooting timeout issues in multi-tier apps is one of the more difficult issues that can happen. Here, the timeout is non-obvious and that's just crazy to try to figure out. All timeouts should be well documented, whether in api or part of a larger system!
I feel you. From time to time our buildserver used to crash from GPU timeouts from environment map filtering as part of a test because it had a slow GPU. Forced us to implement a faster filtering shader in the end.
@@thetrickster42 I'm actually getting an error of std::marker::Sized not being defined on vec (Animal just being a trait I created), when I pass a vector to a function. Could this be possibly solved by passing a slice?
Man, the only thing i can say is: Hats off to you man! All that code! I know a little bit of coding. I'm a graphic (Web) Designer. But i don't come anywhere CLOSE to your skills. I'm so curious as to when it will be done. Even though, of course, you are never truly done but i mean, when it's finished up to point where one could use it, how it will look like etc... 😎This must be so much work man!
We faced the same TDR crash while working on a Point cloud rendering system. It was super annoying because we set the points buffer to scale with the GPU VRAM (so the "better" the gpu, the more points you get loaded and rendered in a frame) so the crash appeared mostly in high end GPUs. This is the thing with graphics programming... more often than not you feel like you are working in the X-Files, dealing with supernatural forces.
Changing the watchdog timer will mask a similar issue in the future. Breaking the work into smaller chunks is you're best bet. Since the change is only local to you're pc
From a user perspective this was also very informative. I have a GTX 770 and I get Device Lost crashes in Unreal Engine 4 games quite often and I didn't knew what could cause this (apparently it wasn't happening that often for everyone). Good to know what's up in the backbone, now I kinda know which settings I might try and tune down in the game settings/config file.
@@roiiam actually tried that. Didn't helped, it would only freeze for longer (as long as TDR value) so the only thing changing is the driver's time to recovery. Leaving default value there is no freeze just straight crash/hung and recover. As stupid as it might seem the issue was a weirdly unstable overclock (modded vbios) on my end. The weird part is that it crashed only under certain workloads but very dependant on context transitions (sometimes spotted after 1h of gaming, other times after 5h). Quite hard to test for stability there.
I had a similar problem while running some heavy polymer simulations in CUDA last year. It turns out that depending on the length of my polymer and the number of forces to calculate (that were random by design), the CUDA kernels would drop after 1 or 2 seconds. The solution? I needed to re-structure the memory layout and break down kernel calls. That was actually nice, because the code ran much faster after that. I heard that this is a safety feature. Like so, the work (and the heat) is distributed more evenly over all the processing units.
Nsight Tutorial would be awsome! I’ve seen very few up to date tutorials related to it. And they don’t cover much knowledge. Also CUDA and Gpu occupancy would be nice topics
I've actually met with this issue before while using some programm and the solution was to change the waiting time to a higher value in Windows paging files if I remember correctly
if you want to be able to do the same amount of compute work while avoiding a timeout, break your compute task into smaller passes and use commandBuffer per pass (add a barrier between each pass to preserve your read & write order) so each level of your radiance cube map will run on its own commandBuffer
Question: Why have mipmaps for a skybox texture ? Are not mipmaps only for minification (and not magnification), and at 2048x2048, and being infinitely distant, are they not always going to be used at level 0, and therefore not require minification ?
This does sound familiar, I've had this happen before during point cloud generation. It only happened on NVidia cards though for us, and we were using OpenCL which is pretty unstable on Nvidia. we've had to split up the workload into multiple compute shader dispatch calls. On NV OpenCL calls longer than 5 seconds cause a timeout on OpenCL it throws a CL_OUT_OF_RESOURCES and then crashes the driver, and possibly the entire pc... On AMD OpenCL doesn't do anything until 15 seconds and then throws a CL_ERROR_KERNEL_TIMEOUT_TI. This is however configurable on AMD cards. not sure if you can configure it on NV as well. So I recommend doing something similar to what we did and do a little GPU performance analysis at startup to get a general guestimate, then we split up the compute calls into multiple smaller dispatches. we start with one we guestimated we could handle and slowly increase the size until we reach a sweet spot. yes, your compute dispatches will be slightly slower, but they'll be stable.
there's like a (somewhat) big red text whenever you open substance painter that says "can you edit this registry so the gpu won't freak out if there's a 10s operation"
19:50 this is fairly common. It's a feature of Windows called TDR (Timeout Detection and Recovery) to reduce the impact of GPU operations that hang for whatever reason. You CAN change it in the registry, but that's only really useful for debugging.
As you found out, reducing the time various operations take is the real solution. But if you can't scale down the work-load you can sometimes split it over multiple frames or draw calls. It's somewhat similar to making sure windows doesn't think an application has crashed, by making sure you pump the message loop, except for GPU work, rather than CPU work.
I had the exact same thing happen 2 days ago when implementing a compute shader for marching cubes, stack exchange helped me out a lot and I almost downloaded nvidia nsight, small world lol
Since compute shaders are used to run heavy long-running computation on gpu (like mining), I guess there should be some way to write a long-running compute shader by providing some kind of notification to tell the driver it haven't hanged?
Hey Cherno. What do you think about Vulkan and their roadmap or supported gfx card? I have a Nvidia 560 and this gfx card is not really boosted up with Vulkan API. Could you explain Vulkan in a new Video, please.
In substance painter when you load it, it mentions that the program can crash if your nvidia driver has a setting enabled or disabled, something about how long it takes. Sounds like the same thing. I'll try loading it up and see, other programs probably set this automatically to get around this.
yeah driver timeouts are horrible... we had this problem in blender too with gpu-raytracing in cycles... on complex scenes it will trigger a crash if a compute-cycle is taking to long in cuda/opencl
Similar problem happens when using GPU heavy programs like Substance Designer/Painter. They recommend to change registry entries " TdrDelay " and " TdrDdiDelay " to a higher value. I always found it strange.
Substance Painter needs to change something in the registry for the driver wait some more before decide to crash, and Blender in some cases with the Cycles cuda based render.
O that second crash problem sounds like the same thing early versions of substance painter had. At least it sounds very similar and there was a reg edit that increased the time the GPU would wait as a work around.
I've faced the "compute shader taking too long" crash quite a bit frequently when working with CUDA. I've implemented an accelerated offline renderer about 3 years ago and that driver crash came up quite frequently. To get rid of the issue, you just open Nsight Monitor options (if you have that one installed) and change "Microsoft Display Driver" settings for that timeout. However, I would not recommend it. Haven't yet touched anything compute related with my "infant" Vulkan engine just yet, but with CUDA I was literally unable to move windows around when the compute kernels took too much time and the system was terribly unresponsive (I too have a GTX 1080ti, but once again, it was an offline renderer and the case was far worse for me because of the constant load that came with it). Just split the job into multiple dispatches and it should be both smoother and more stable. Kernel launch overhead is a real thing, but it's not that hard to strike balance and still maintain 99% of the performance, at least from my limited experience.
I can't get those NVDBG shader dump files. I tried using the SDK and the standalone aftermath monitor that comes with Nsight Graphics with no luck. I have compiled the shaders with debugging symbols. Anyone knows how to get Aftermath to map the shaders to crash reports? Is it possible that it does not do the mapping if the crash did not occur in a shader?
Blender3D used to have "Hung GPU" problems all the time. "NVidia Timeout Settings" were capped at 2.0 sec. "Windows GPU Timeout" set at 2.0 sec. Renders with high memory usage by zone would take upwards of 4-5 sec. which would crash the driver. you would have to restart the driver, and in some cases restart the computer, because the driver would hard-crash. Nvidia was the biggest culprit, since you had to edit the Timeout Settings in a text editor, because it was not available in there shell application. Took we too weeks to track down a forum discussion pointing to the settings that needed to be changed.
Device lost is not a driver crash, it is a TDR. TDR occurs when DMA buffer execution takes too long (more than TdrDelay value, which is 2 sec by default). PageFault is not a driver crash. It is a Gpu hang, also visible as Tdr because in this case Gpu can't read the memory properly so it can't proceed next commands, so it goes into 'idle' state and currently executing DMA buffer time is > TdrDelay. Driver crash occurs when you pass invalid data to the API.
Okay so Cherno is reading my mind! I had the same problem in my engine a few days ago. Just implemented irradiance filtering. Launched the sandbox and after a few seconds, my whole system froze dead. I am on arch btw running the open source amd drivers (amdgpu). Then i lowered the sample rate and it worked fine. Seems like a common issue. I'll use some crash reporting tool and further investigate this.
GPU's need run multiple jobs at the same time (especially on windows), So a job (like a VKCommandBuffer) is scheduled in smallish chunks. The GPU needs to limit the time taken for a context to drain/complete the chunk so it can switch to another context (for instance a Windows OS Job). This is similar to thread scheduling on a CPU core. Without this scheduling approach GPU operations would execute sequentially so the OS wouldn't be able to update the desktop while a game was rendering a frame. The main causes of Device Timeout/hung are: too complex compute shaders, certain memory faults or rendering super large triangles.
After some experimentation. The GPU divides the work by vkQueueSubmit. Which was kinda annoying since I like to send a whole bunch of cmd buffers all at once.
@@jamesmnguyen Sort of. It is actually recommended (on both VK and DX12) to submit multiple cmd buffers in a single VKQueueSubmit to minimise the driver scheduling overhead(and allow some driver optimizations). The chunks of work only contain a few shader executions (compute or graphics) that have the same pipeline/GPU state set. A PSO change can cause the context to "Roll" as the GPU hardware needs to change configuration (for example, enable blending hardware unit) , causing a context switch at the same time.
I personally had quite a lot of driver TDR on proprietary AMD Vulkan driver, and strangely enough, the same shaders work on RADV (open source AMD Vulkan driver). Idk things are strange lol.
wow, I had this EXACT same problem but with opengl. I wrote a little program that would bake my IBL data into a file and it works fine on my main machine. But when I try to run it on an older PC it completely crashes the driver (unless I reduce sample count in the irradiance shader). I haven't found the fix, but it's good to know that I'm not the only one :)
Can't agree more with your opinion on sharing experience with all other developers around the world. By the way, it seems like that the Nvidia Nsight has lots of problems, cause my VS 2019 has just became unusable after installing the CUDA toolkit. Finally, the VSE part of the Nsight is to blame for greatly slowing down Visual Studio 2019.
I just met a problem with this as well, Vulkan instance-6 keeps popping up whenever i try to start Breath of the wild for the pc, in Cemu and I have no idea how to fix it, if you have any insight on this that be awesome right now
This is in regards to Sound Pipeline: what backend are you going to take in? Hopefully something like steam-audio or something similar as a modern replacement of EAX; where before we had graphics, sound carried the game. It even supports raycasting hardware support but for sound. Sound is sooooo underdeveloped or an afterthought but such a big part of game/engines even when not so great graphics are used; but games with over the top attention to detail sounds/score/story, that carry or sucks you in....
i don't develop graphic engines for more that 15 years, but i always knew that you can't lock primary/only graphic card of computer for more that one second, and even now rare new GPU have ability to execute different application (i.e. processes in terms of OS) simultaniously, which means while (shader/compute) program is running, no other process can access gpu, and in case of primary gpu on pc, for user that means that his pc hangs(don't react to any input in reasonable amount of time). i.e. don't run (singe) shader/compute program for more that 1 second, give others processes to use video card, or use separate (non primary) gpu and acquire it in exclusive use from OS. this one was pretty obvious for me.
Everytime i try to launch r6 vulkan i get an error message saying "no compatible hardware/driver found", does anyone know how to fix this issue. My computer has all the needed specs.