This same method of fuzzing x86 chips for backdoors was performed by Chris Domas many years ago, by leveraging how many cycles certain instructions took to execute with all interrupts disabled you can gain insight in how much "work" is being done behind the scenes when a certain instruction is being ran. Passing random input to certain instructions and then watching how long it takes to execute the instruction is a very clever way of detecting hidden behavior or even finding hidden instructions in a given architecture.
There's also SandSifter, which fuzzes "by systematically generating machine code to search through a processor's instruction set, and monitoring execution for anomalies."
Kind of, but not exactly the same. Chris Domas has done two similar things. 1) Sandsifter, which looks for undocumented opcodes, although it primarily used access violations for detection rather than a timing side channel. 2) His research on readmsr, where he used the TSC (time stamp counter) to detect whether reading a certain MSR did something covertly in microcode. Tavis' research in this project is similar but distinct in that it seems to only focus on valid opcodes, and use other performance counters than just timing.
I just want to leave a complement on the very good animations/visuals in this video. They are very well done and intuitive and kept me engaged to the point where I actually noticed the positive impact while not being too flashy or distracting in any way.
This was *way* more interesting than I assumed. Tavis managed to bring a batch of new angles to the CPU fuzzing, by being a not-CPU fuzzer! I'm glad both you and Tavis do what you do!
This reminds me of an old Defcon talk (I think) where a guy fuzzed CPUs to discover undocumented instructions. The way he did it was exploiting a quirk in the memory controller and I believe putting the instructions across a page boundary in such a way that a valid instruction would go through, but an invalid one generated a page fault. Through this he could generate instructions and compare them to the known instruction set to find the undocumented ones.
@@5555Jacker I looked him up and I didn't realise he also did the MOVfuscator, one of my all-time favourite tech talks! I also highly recommend that one to anyone that hasn't seen it.
That's how he discovered the length of an instruction. By putting the instruction at the end of the page you can adjust it byte by byte and discover whether the CPU wants to read the next page as part of the instruction or not
In this kind of videos, it would be also really nice to realize how many time takes to find these kind bugs. Sometimes could be just a couple of hours or a happy idea, but in most of the cases it takes several months. In any case, I love your videos :)
Sounds like security critically software could compile in oracle serialization mode to prevent these side channel attacks (at the expense of execution speed/efficiency)
This technic is already used to an extent in some modern compiler hardenize flags (introduced after Spectre/Meltdown shitstorm), however blindly disabling speculation is unexceptable from the performance standpoint. You can do this ofc if e.g. 10x loss of performance is not the issue for you, but it is better to resolve this in the CPU microcode if possible, due to binary backward-compatibility.
does it? afaiu the issue is not victim programs using speculation, its the attacking programs. its all well and good if *your* program runs absolutely 100% correctly, but if *i* can still abuse the CPU to get a side channel, youre just throwing away performance for no gain
@@jotch_7627 In some cases, the attacking program is a victim program, too. The prime example is a web browser. The web browser executes javascript code, which is not trustworthy in itself, but it is run in a sandbox that tries to prevent the Javascript code to access anything that is protection worthy. If the javascript code manages to trigger a browser (like Chrome or Firefox) to run spectre attack code (which was possible), the javascript code can read parts of browser memory it must not access (like stored passwords). So compiling Chrome or Firefox in a way that it won't speculate "too much" prevents these programs to be ab-used as spectre attackers by Javascript code. Actually, this idea is more general: Any program that processes complex untrustworthy input can by turned into a spectre attack utility by giving it maclicious input, as long as the processor does enough speculation during processing the untrustworthy input. Spectre mitigation is meant to prevent that by reducing speculation when processing untrustworthy data.
Really looking forward to a new part of this!! :) Brilliant video! 5 a.m. here and I am more awake due to this video than the last 5 hours trying to sleep. 😂
I'd be very interested in following the development of automated systems for identifying interesting performance counters. Human review can often overlook innocuous solutions, and I feel like this is JUST up the alley for interesting machine learning classification.
Hw vulnerabilities i would argue are generally vulnerabilities at the hw/sw interface. And very few exploits of hw vulnerabilties come at it from the hw side. In general the most critical hw exploits take some advantage of inconsistancies between the hw implementation of the isa and the sw formalization of it.
I wonder if there is a mathematical theorem (or a quasi-theorem) that shows, for a cpu and a set of instructions, that beyond a certain well-defined complexity in hardware and software, security can never be guaranteed, even in principle?
The light from above is strong, and the wall behind is quite glossy. Light color gets changed when reflecting of an object, specially in the first bounce How to fix? Get a more matte wall, change the angle of the lights, get a more matte shirt, etc
Phew. When I saw this video I got scared and began researching a bit. And luckily it seams to only affect Zen 2. My laptop cpu is too old so it's Zen (2 generations older), and my desktop cpu is too new so it's Zen 3 (1 generation newer) I lucked out with this one.
Also such vulnerabilities are typically possible to fix in microcode / CPU firmware. And chances are when you hear about it your system is already patched, given you update your OS regularly. For this very vulnerability AMD released fix already, but only for server platforms (Epyc, where it's a big issue) -- maybe overhead is tad too high and they hope on finding a fix that has less perf impact for consumer platforms.
Ver interesting topic! I am always more fascinated by the process. Wouldn't the serialization instructions affect the performance counters used as coverage information?
Yes, it would. But the serialized reference code isn't execute to detect "whether something interesting happens with these instructions" (what you need the performance counters for), but to provide the expected reference result of the non-instrumented code. You only use the counters to measure the non-serialized code. Also, you compare CPU state (e.g. register contents) after executing the non-serialized code and the serialized code. If there is any difference, the CPU does unexpected things without serialization.
Am I the only one that thinks these CPU vulnerabilities are pretty scary? I think there should be a safe mode, where for example all the instructions are run serialized.
I think performance is overrated, at least when you are talking about critical systems like webservers. These bugs may be discovered by blackhats long before whitehats find them.
@@helgesupernova788 Then you would turn the miniscule chance of somebody else on your server accidentally stumbling over your information from the cache to making DDoS attacks far easier. You underestimate how much performance we gain from running instructions simultaneously and out of order. It has rendered the CPU clock irrelevant for comparisons. For example, the i7-6950X, a pretty old CPU measures at ~10.6 instructions per cycle. You would lose 10.6 times the performance with that 'safe mode' enabled.
Regarding Oracle Serialization, I don't quite understand how is it supposed to work. For example, if some information is leaked based on the timing, how you detect it with serialized code? You run the unnormalized code and measure that it takes 10-100 cycles and then you measure the serialized code and it always takes 1000 cycles. Did the unnormalized code take a variable amount of time because it was leaking a bit of information about some internal state, or was it just random chance?
yeah it's a bit tricky and I'm also not 100% sure. But checkout my RIDL video to learn how cache timing leaks are measured. You could add such measurement after the fuzzed code (this measurement code would obviously not be serialized)
Disclaimer: didn't read the writeup yet, might be wrong. If I understand correctly, the fuzzer had to generate code that somehow output the leaked information. They were checking if force-serialized code was giving the same output as code without forced serialization. In the abstract model, this is guaranteed. Any discrepancy is a bug. I am assuming they relied on just outputs, as it has to have been something guaranteed to not be affected by serialization and run-to-run variance, so they couldn't check for timings directly.
@@chedatomasz That's how I understood it too. You either output the data your instructions retrieve or otherwise use the data your instructions retrieve in further execution in some way. Any disrepancy between what the speculative execution variant does and what the serialised oracle variant does means there is by definition a speculative execution bug. You might even be able to use some of the same performance counters to measure the degree of disrepancy.
@@chedatomasz Yeah, this is also my impression of the technique, but the video makes it seem like the Oracle Serialization is supposed to catch side channel attacks, not just regular data leaks (for example, 11:31 and 12:11). This is probably a mistake in the video, unless I am missing something.
@@ruroruroI read the writeup. I think they are catching it by the macroarchitectural state (registers, performance counters etc, those guaranteed to not change by optimizations) not agreeing between the raw and oracle versions. This pointed them to vzeroupper not being optimized correctly. The fuzzers contribution ended here, and the escalation to a side channel data leak attack was manual on top of that. I guess the weeks of tuning work they refer to was choosing the elements of macroarch state that are guaranteed by the standard to not change, and they finally came across one that did when it shouldnt.
I was thinking about running the code on a simulator and compare the result to the real stuff. Without realizing they can be run without any optimization on the cpu😂
Can someone explain me how performance counter works? They compare 2 cpu's with same processes and if they are much different they know which error is coming?
Running validation and fuzzing on the simulation could catch bugs like this, which are logic bugs. It would also be more efficient, as you could do feedback more directly. The problem would be with accurately modeling timing and other side channels in the simulator.
Of course he works for Google. I still can't believe how can Google be so big in industry yet still can't maintain such a basic service as Google Domains and sell it to another company. I'm so f*cking mad now I have to transfer ~100 domains to another registrar.
Maybe that has something to do with bubble sorting, or a bubble sorted list. Is it possible that people start talking about an attack that doesn't exist yet because they found it and cause a vulnerability? That is to say they are announcing a vulnerability before the update is released?
NEED HELP, how do I create a No clip hack on Need for speed undercover (ULES 01145) PPSSPP and create a CWCHEAT with it?, the same with (ULES 01340) Obscure The Aftermath also PPSSPP game and free camera (the camera also have no clip), thanks
I think it shouldn't be considered medium at all. The PoC can catch nearly all important strings put on XMM registers by a browser just because of memcpy, strcpy and strcmp.
Awesome way to explain the discovering process of Zenbleed. It became very intuitive (the whole bug discovery process) as you comparatively explained the fuzzing components in a software and a CPU bug. A great way to explain!
Hardware has so much security holes. Would be interesting to hack DMA instead. The DMA-chips may be programmable and can access anything. But if you can not touch them.. Maybe your videocard or hard-disk-controller can read directly from memory. Can you read a hard-disk sector from controller-cache, before it is actually read, so you can read the data from another process?
Thanks for covering this, very useful. AMD patched the EPYC cpus but not the desktop cpus. (ETA is December) So is the TLDR for now to use VM without network if its being used to run potentially hostile code? Publicly this is a info disclosure bug so it should be mitigatable this way. But if there is a risk of write as well, disconnecting network alone wouldn't help.