Тёмный

AMD's 3D V-Cache Problem 

High Yield
Подписаться 56 тыс.
Просмотров 22 тыс.
50% 1

Опубликовано:

 

28 сен 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 138   
@ProceuTech
@ProceuTech Год назад
My dad’s room mate in college back in the late 90’s did some of the fundamental research on TSVs and how to mass manufacture them. According to him, it’s really cool seeing all this stuff come to the consumer market now; imagine what will be available in 20-30 years!
@RM-el3gw
@RM-el3gw Год назад
"hahaha remember when our electro brain implants used non-quantum chips? hahaha what a joke!"
@RobBCactive
@RobBCactive Год назад
Interesting :) When I was designing and implementing CAD software for chip design, adding extra metal layers and linking the planes with vias was an issue but in production .. about a decade before your Dad's room mate so the fundamental research would be a decade or two further back. :)
@ChinchillaBONK
@ChinchillaBONK Год назад
We can all be cyborgs with NANO MACHINES BABY
@christophermullins7163
@christophermullins7163 Год назад
Certainly we will have super efficient stacks of many high logic and cache transistors. Moores law is coming to an end but this doesn't mean performance won't continue to improve. Some people think Moores law is still in action but I will remind you that the cost of manufacturing is also involved in the equation. If a node is 50% better but 50% more expensive.. it's a null gain. The future is all about 3d.
@RobBCactive
@RobBCactive Год назад
@@christophermullins7163 but Moore's Law is continuing because of stacking and other tech, it's about transistors in ICs. Yes, costs of process shrinks increased, but it's Dennard scaling that ceased, it used to be shrinks brought faster, denser and more efficient, now leakage is highly significant and thermal density a real constraint.
@kirby0louise
@kirby0louise Год назад
On die water cooling with microscopic pipes sounds promising, but I have no idea how realistic it is. I will say shortly after AMD showed off the new IHS for Zen 4 and said the unusual design had an engineering reason behind it I wondered if they were going to be inlets and outlets for such on die cooling. That was not the case, but I certainly thought it was an interesting idea.
@dan2800
@dan2800 Год назад
Then the erosion will be huge problem because you have only few microns or at beast maybe millimeter of space to run water thru and with decent pressure that could be gone kinda quickly
@oMega-sm1eg
@oMega-sm1eg Год назад
That's still quite far away from us. The nature of microscopic pipes through the silicon means it must use ultra-clean close loop coolant, otherwise it will be clogged in days. This means it would require a two-stages loop. A more realistic solution is to use a vapor chamber to replace the solid metal IHS, so it would have direct-die contact. I would expect similar level of performance from it as direct-die liquid cooling with liquid metal TIM.
@musaran2
@musaran2 Год назад
I am expecting it too at some point. IMO liquid transport with nanoparticle phase change is the way.
@procedupixel213
@procedupixel213 Год назад
Chances are that VCache dies are designed with a bit of redundancy, i.e. with physically more SRAM blocks than nominal cache capacity. It might not even be neccessary to have a dedicated mechanism for mapping out bad blocks. The cache controller can be tricked into never hitting a bad block simply by setting tags appropriately. If that hypothesis were correct, then one could further assume that the amount of redundancy could have been reduced from 1st to 2nd generation (of VCache), as the 7nm process has matured and yield has improved. This would be another way to save die space.
@awdrifter3394
@awdrifter3394 Год назад
AMD has said that SRAM scaling had pretty much stopped. So no point in using smaller node for it.
@alexmills1329
@alexmills1329 Год назад
It just AMD saying it, it’s TSMC saying they aren’t getting benefits in node shrinks for certain applications like memory and even logic is less than perfect.
@solenskinerable
@solenskinerable Год назад
i believe that heterogenic crystals can be part of the solution. for example silicon carbide has a heat conductivity about 3 times higher than silicon. synthetic diamond has about 60 times higher heat conductivity. silicon carbide is already used in HEMT transistors. i can imagine growing flat crystal heat spreaders on top of the logic, between the layers, and crystal "trough silicon vias" to transfer heat up trough the stack.
@needles_balloon
@needles_balloon 10 месяцев назад
Likely the biggest problem when choosing materials for this is idea is differing thermal expansion rates. If the thermal expansion rates are too different, it could cause the dies to pull apart from each other when the CPU gets hot or cold
@crazyelf1
@crazyelf1 Год назад
I think that both solutions may have to be used. So, in the future, the CPU will be above the cache chiplets and in turn, multiple layers of cache chiplets will be used. Having the CPU on top should address the heat issues, although there will be engineering challenges as you've noted with routing the data and power through the cache. So in the future for higher end CPUs: - CPU on top, with cache underneath in many layers - This then links to an active interposer with IO die - Then there is an HBM L4 for higher end SKUs Which is the best compromise.
@skywalker1991
@skywalker1991 Год назад
Can AMD stack l3 cache , like 4 layers or more . CPU with 1 GB L3 cache could be change how game engines be designed to take advantage of large cache.
@BaBaNaNaBa
@BaBaNaNaBa Год назад
if it's so cheap, why didn't AMD make the 7950x3d dual 3d vcache CCD...
@greebj
@greebj Год назад
I think we'll just end up with logic dies more separate from cache. Data will be moved over links between a tiny logic die on cutting edge node, and vcache dies on cheap nodes stacked stories high. It's all about cost and profit and datacenter wants wide and efficient and doesn't care about hot single thread throughput, so heat transfer through stacked silicon will always be an afterthought with a patch job to allow consumer parts to clock a bit higher. I think that's more likely than pie in the sky concepts like through chip watercooling. The maths on moving enough water at sane pressures through the heat dense core logic just doesn't check out at all, and I can't imagine the feat of design and engineering it will be to route all those tiny water channels so close to electrical wires with perfect reliability.
@Tainted-Soul
@Tainted-Soul Год назад
if de lidding drops the temp by 20deg C why not sell the chips without a lid and get EK to make a mount watercooling block they they pre fit to the chip. they still can have AIO stuff as in quick connectors a pump that either inline or on the rad but without the think lid. or they could build in micro heat pips that run to the lid. also thought about putting the cache on the bottom and cpu chips on top. the future looks good
@NootNoot.
@NootNoot. Год назад
Hey, hopefully you get better soon! As I think we've discussed before, on figuring out what node 2nd Gen V-cache was on whether it was 5nm on 5nm or 6nm on 5nm process, I was quite surprised they were still using the same node. Although, this is on par with AMDs design/manufacturing philosophy. Once again, AMD has decided to cleverly think of new ways to scale the cache with TSVs as you mentioned, and I think Zen 3s and Zen 4s similar design has greatly contributed to that fact. Their 'min/max' strategy is very strategic and benefited them greatly thus far. Now onto the future with 3rd Gen 3D Cache, I'd like to think once again AMD will take a conservative approach, either that may be another clever design change, or something else entirely. With Zen 5, (and correct me if I'm wrong), the design and architecture will be different to all iterations of Zen thus far. They have probably streamlined cache stacking, accounting for TSV layout and other things. With this change to the architecture thus the layout (CCDs/CCxs), engineering changes to the 3D$ itself won't be radical (or at least be expensive for a new design/manufacture), maybe a change to another process node (7nm on 3nm just sounds too impressive to pull off). These are just tin foil hat thoughts and may not even work as I mentioned lol, but I think AMD may surprise us once again with another clever design that will first, maximize efficiency and profits.
@lunamiya1689
@lunamiya1689 Год назад
it will be interesting if intel releases a consumer grade stacked cache using EMIB.
@DanielLopez-cl8sv
@DanielLopez-cl8sv Год назад
I was wandering why don’t they stack the 3D V-Cache underneath the chiplet
@jack504
@jack504 Год назад
How about moving the cache chip next to the core chip, i.e. no stacking. Similar to 7900xt(x) design. Might result in a slower connection, L4 cache maybe? Would heat problem but need a more complex interposer
@hansolo8225
@hansolo8225 11 месяцев назад
That would increase the latency, reducing the performance.
@winebartender6653
@winebartender6653 Год назад
I do not think we will see compute die size itself shrinking all that much regardless of node shrinkages. The CCD is already quite small in relative terms. With the current unified l3 design, I imagine they could/would add more l2 and l3 on the compute die itself. Then there is the option to add more cores per ccd, add more complex units (Instruction Accelerators as an example) or widen other areas of the chip (larger infinity fabric interconnect for increased bandwidth as an example). There is also the option of moving towards an hbm/mcd or Intel style interposer/chip let type package. There are a vast amount of options out there. Sticking strictly to your points based on CCD size, I don't think it holds much water. Take a look over the past two decades of consumer die sizes and node shrinks and you'll find that things just become more complex and utilize more die area that just shrinking the die more and more.
@shanent5793
@shanent5793 Год назад
3D V-cache on the compute die is a dead end. It's more appropriate to put it on the IO die, or altogether separate. AMD had per-die SDRAM controllers on Zen "Zepplin," causing extreme NUMA effects. "Rome" rolled this back into the IO die, reducing the NUMA levels. The current architecture can't scale synchronization bottle-necked applications past 8 cores, especially when inter-die latency is slower than DRAM access. This has become apparent even on their client products like Ryzen 9. Physical proximity only indirectly constrains latency, at 8 inches (20cm) per nanosecond, distance is a minor contribution to latencies in the range of 40-200ns. Most of the delay takes place in the encoding, decoding and clock recovery at each end of the link, while distance also impacts the energy required to charge and discharge parasitics. Photonics is one way to remove the distance dependent energy cost, and it is already ubiquitous in networking. Take a look at what Xilinx is doing with separate dies for 28/56G transceivers, or Intel Si photonics, to see where this is all headed.
@coladict
@coladict Год назад
On-die water cooling sounds like a disaster in the making.
@chriskaradimos9394
@chriskaradimos9394 Год назад
awesome video
@HighYield
@HighYield Год назад
Thanks!
@SevenMilliFrog
@SevenMilliFrog Год назад
great content. would be nicer if u add english subs
@arthurgentz2
@arthurgentz2 Год назад
Excellent video. Though, would you consider removing the very distracting 'rap beat' background music in the next video please?
@maynardburger
@maynardburger Год назад
Nah it's fine. A bit of background ambience helps a lot. Or if your issue is that it's a 'rap beat' specifically, then well, we all know what you really want to say.
@arthurgentz2
@arthurgentz2 Год назад
@@maynardburger Debased thinking on your part, and i believe you can do better than that considering we're here on this channel. Anyway, i am not so young as to keep up to date with the current rap 'scene', thus have no idea what one would call that 'rap beat', hence my referring to it as 'rap beat'. The way it rattles in the ear is not conducive to 'ambience', as you've put it. There are plenty of better background noise out there to choose from, i merely suggest that he consider them next time.
@dr.python
@dr.python Год назад
The trend will be more cache taking up die space in the future, but 3D cache isn't the solution as it has inherent heat dissipation flaws, best isn 2.5D like FinFET. We need to find a better and more sustainable way to design chips, shifting to RISC architecture like ARM would be a good first step.
@bernddasbrot-offiziell7040
@bernddasbrot-offiziell7040 Год назад
Hi, ich komme auch aus Deutschland
@markvietti
@markvietti Год назад
I hate these New chips ... they run way too hot...even on water.
@agrikantus9422
@agrikantus9422 Год назад
You mean the 7950x3d and 7900x3d?
@greebj
@greebj Год назад
Because heat killing CPUs is a thing? (It isn't, the MTBF of CPUs at TJmax is still orders of magnitude less than the period at which they offer relevant performance at stock voltages. Intel and AMD know this, that's why they have exploited the headroom that was once used for an easy overclock, as "turbo")
@maynardburger
@maynardburger Год назад
Stacking the cache chip underneath the core chip is THE solution to this. It doesn't matter that you have to route power through it since you can make it much less dense and have a ton of freedom in layout. You could also cut down the cache on the core chiplet. The downside is that you basically have to make Vcache 'standard', and AMD seems reluctant to do that, at least through Zen 5, though they are seemingly going this route with CDNA3 so we know it works. But they can gouge consumers more the way they're doing it now.
@craighutchinson1087
@craighutchinson1087 Год назад
Great video
@TheBackyardChemist
@TheBackyardChemist Год назад
I think the most likely way out is the stack inversion you have mentioned.
@pedro.alcatra
@pedro.alcatra Год назад
Is quite hard to pass 1500+ connections thru the cache die. But not impossible. Let's see how they solve this
@johnclondike3896
@johnclondike3896 Год назад
One “positive” that you left out is that the cache on the main level of the chip continues to take up more and more% of the overall Chip. So this means more and more of the chip will be cache going forward, meaning more and more of the chip will be suitable for putting vcache on top. In the end, the way to “solve” this is to simply design the main level L3 cache size so that it is big enough to fit whatever amount of vcache you desire. If you don’t have enough room for the vcache by 5%… just add 5% more L3 die size. Because memory isn’t scaling anymore the chip sizes won’t decrease much anyway so the problem isn’t that big of a deal IMO.
@RobBCactive
@RobBCactive Год назад
Zen4c aka Sienna might object to the "cache bloat" being inevitable, in hyperscaling they're better cutting L3 cache and giving more % area to lower clocked core logic that's layed out denser too, because it doesn't have to reach high boost clocks.
@mathyoooo2
@mathyoooo2 Год назад
@@RobBCactive True but I doubt it'd make sense to put a cache chiplet on zen4c
@dan2800
@dan2800 Год назад
They could slap some HBM memory on to the IO die too to be like big L4 cache I think that the flipping it could have potential by making big main L3 die putting on top the cores and stacking more L3 on the L3 to use minimal amount of structer silicone
@user-fs9mv8px1y
@user-fs9mv8px1y Год назад
I wish they did that for APU's
@adamw.7242
@adamw.7242 Год назад
Great analysis, glad to see your channel growing
@dazza1970
@dazza1970 Год назад
Im no tech designer or expert.. but if the 3d cache becomes bigger than the ccd chiplet then why not just increase the ccd size by making it a single 16 core as apposed to the current 8 core and so smaller nodes make it smaller.. but doubling the cores will give you a larger overall space to put the cache, and i know we dont need a 16 core chiplet yet.. but it would delete the infinity fabric latency times.. unless AMD go for broke and do a 32 core cpu for desktop.. which would be mad.. but amazing too..
@Centrioless
@Centrioless Год назад
Latency is a problem
@wawaweewa9159
@wawaweewa9159 Год назад
​@@Centrioless less than what it is now
@hansolo8225
@hansolo8225 11 месяцев назад
Increasing the chip area drastically reduces the wafer yield. Translation: Smaller chips a much cheaper to produce than larger chips.
@builtofire1
@builtofire1 Год назад
i dont care about 0.5GHz boost, i care more that the L3 cache is per CCD, if the thread scheduler will send the thread to another CCD, this will have huge performance impact. so all this heat dissipation problems are not that important compared to threads constantly changing CCDs.
@Psychx_
@Psychx_ Год назад
If shit hits the fan, the CPU and cache chiplets can always be put onto an interposer and the cache used as L4 instead of L3 in order to reflect the added latency. As long as it provides sufficiently lower latency than DDR5, there'll still be performance benefits.
@AdrianMuslim
@AdrianMuslim Год назад
Will 7800X3D age poorly or wont be future-proof because of low clock speed/single core performance? (For gaming)
@maynardburger
@maynardburger Год назад
No. Loss of 400Mhz or so isn't nothing, but it's a very limited reduction in performance that will remain relatively constant no matter what going forward. Older processors usually become outdated through lack of features, instructions, general IPC deficits and raw core/thread scaling, not through a minor disadvantage in clock speeds like this.
@AdrianMuslim
@AdrianMuslim Год назад
@@maynardburger X3D or intel, which will be more future-proof in gaming and which will be faster in the long run? .
@Gindi4711
@Gindi4711 Год назад
If SRAM does not scale anymore with N3/N2 etc. the 32MB L3 on the CCD do not get smaller any more so the 64MB VCache on top do not need to get smaller either. If AMD decides they need more L3 (for example to support 16k cores per CCD) they will probably need to increase L3 on both their VCache and non VCache lineup. With 48MB L3 on CCD and 96MB on top it will still work. But as the price gap between leading edge and N7 increases further AMD will want to move to a design with no L3 on CCD and everything stacked on top and this is where things get complicated. What I am wondering: In the long term I see AMD using a Meteor Lake like approach: .) Having an active base tile (N7) for fast and energy efficient die2die communication and put all L3 there. .) Compensate additional L3 latency by increasing L2 per core.
@wawaweewa9159
@wawaweewa9159 Год назад
Even though 5nm vcache would provide little perf benefits, wouldn't it allow the vcache to be made thinner and thus less thermally constraining?
@kognak6640
@kognak6640 Год назад
There's very little heat produced in L3 cache in first place, it doesn't matter how much Vcache chip reduces thermal conductivity. Bulk of the heat is produced in cores and there's just blank silicon pieces on top of them. Unless material of blanks are changed, there's not much AMD can do.
@BatteryAz1z
@BatteryAz1z Год назад
5:23 CCD = core complex die.
@HighYield
@HighYield Год назад
AFAIK, CCX = "core complex" & CCD = "CPU Compute Die", at least according to AMD's own ISSCC slides.
@magottyk
@magottyk 29 дней назад
Moar on die cache. Scale the on die L3 (+L2 +L1) cache up so that the Vcache is only covering cache. 48MB+64MB would also help non V cache processors. The question is not if but when. Zen 5 has 640KB +8MB (48+32+1MB) L1+L2 vs Zen 4 512KB +8MB (32+32 +1MB) vs Zen3 512 +4MB (32+32 +512). Perhaps Zen 6 will increase L2 to 1.5MB per core. Also consider that there is some sram scaling to 5nm, so Zen 5/6 Vcache could well be made on 5nm.
@theminer49erz
@theminer49erz Год назад
AH!!! HOW DID I MISS THIS?!! RU-vid has been recommending almost exclusively crap I would never watch or stuff that I have already watched multibe times. Fantastic algorithm you got the google! I had to check when I thought "why haven't I seen any videos from [you] in a while, I hope he is ok!" and searched for you. It didnt even come up in the auto fill. Thats lame!! Great video! I don't know enough about such engineering to suggest a way to do it. However perhaps down the road, maybe not even that long with AI helping out, they could find a way use photons/optical data transmission to link chips/chiplets horizontally adjacent. I could see that at least being faster or at least as fast as physical connections. Although I will admit I wouldn't be surprised if the the process to do so would slow it down and am just describing a type of Quantum Computer. I am a little Bummed that RDNA3 didn't have as much of a jump as I was hoping for, but that is on me. I set those expectations. I am going to get one still though. I have come to terms with the reality of the situation. In fact I think it is more important to support them now than ever. It's the first chiplet based GPU. There was absolutely no real world data on how it would perform because of that and plenty of potential for unforeseen issues to arise. If I (we) want to have a card perform well thw way we use them, then we need to get one and use it that way so that they can get the data needed to improve the next one. If I recall correctly, there was at least one physical hardware issue with the launch cards too. Maybe if we are lucky, we will see a refresh like the 6050 series with fresh chips that are redesigned to fix that problem. There also seemed to be some performance left on the table too according to the specs, so maybe some driver updates after they have enough real world usage data will be available to help use more of the potential? Idk I am optimistic though that AMD will prevail. Not vs Intel/Nvidia, I couldn't care less if they "win", I just lile them as a company and like to see them create awesome stuff! I even really like their "failed" products like the old AM2+ APUs and especially their GX 9590!! The 9590 catches a lot of crap but I'm sure 90% of that is based off of standard benchmarks or YT reviewer regurgitation. I admit it is a touchy chip, but with the right configuration and use case, it is actually a really nice chip! I have my old one running my "offline home assistant"/Server and it is fantastic! I can run AI for my video surveillance, automated greenhouse environmental controls/hydroponics systems, and chicken coupe!! Granted it is leveraging my two 8GB MSI RX480s to do a lot of that, but even when I'm running a game server, streaming plex to multible devices, and downloading a file, there is absolutely no sign of lag or any other issues. I'm sure many can do that, but I appreciate it's ability to do that after like 6 years of gaming and general use. Sorry I digress as per usual. I'm glad you didn't dissappear and hope yiur annoying cough goes away asap! Looking forward to your next video and YT better let me know this time!!! Your's is one of the very few channels I watch as soon as possible after I see it uploaded! I'm glad others are picking up on the quailty of thw content as well! Be well! And good job on the Leppards Deutschland!!!
@junofirst01
@junofirst01 Год назад
Just put some dumb silicone aside future smaller cores which the first floor cache can sit on. This will increase the distance to the cache but should solve heat dissipation.
@christophermullins7163
@christophermullins7163 Год назад
Seems like the average person sleeps on AMD. Their cpu team is very clever and have been hitting it out of the park regularly for a while now.
@IncapableLP
@IncapableLP 10 месяцев назад
0:36 - No, this shows, that memory cells scale really badly with newer process-nodes, which is a huge issue at the moment.
@BAJF93
@BAJF93 Год назад
Silicon on both sides of the PCB? That's one more unreasonable workaround.
@aeropb
@aeropb Год назад
nice video but ccd is core complex die and ccx is core complex
@lovelessclips433
@lovelessclips433 Год назад
I read the title "it is too big?" My girl reply from the other room. "Not even close"
@alb.1911
@alb.1911 Год назад
Thank you. 🙏
@ChristopherBurtraw
@ChristopherBurtraw Год назад
I'm so confused now. Is the L3 on the base die, or on the chiplet? Or is it on both, essentially doubling it? I can't find the info clearly online, it's either too general overview of 3D V cache, or too detailed for someone with my limited expertise on the subject. Also, why is it called "V" Cache?
@HighYield
@HighYield Год назад
The base die has the CPU cores with 1MB L2$ each and all eight cores share 32MB L3$. Then the 3D V-Cache chiplet adds another 64MB of L3$ on top. So in total its 32MB (base die) + 64MB (cache chiplet) = 96MB L3$.
@ChristopherBurtraw
@ChristopherBurtraw Год назад
@@HighYield thank you! I recall learning about it from a previous video, I must have just forgotten! Do you know why it is called "V" cache?
@kirby0louise
@kirby0louise Год назад
@@ChristopherBurtraw V is short for Vertical, because the V-Cache is literally directly above the conventional 2D cache. They are building vertically instead of horizontally
@ChristopherBurtraw
@ChristopherBurtraw Год назад
@@kirby0louise thank you so much, it all makes way more sense
@VoldoronGaming
@VoldoronGaming Год назад
Seems to be fix for this is more cores in the CCX so the sram doesn’t outgrow the ccx chiplet.
@maynardburger
@maynardburger Год назад
You dont really need more cores, but you can keep up the die size by making each core much wider/more powerful(at a given process node).
@heinzbongwasser2715
@heinzbongwasser2715 Год назад
@mrdali67
@mrdali67 Год назад
Definetly think they will NEED ondie water cooling if they are going to go further in stacking cache and ccd's on top of each other to solve the heat dicipation problems. Stacking is propably still the solution for the forseeable future to make CPU's and gpu's more powerfull. They have been talking about bio chips for about 4 decades and nothing makes me believe they are any close to achieving breakthru in this department. It has been about 40 years since I read an article in a science magazine in the early 80's during my teens about bio chips that ran on sugar water and was based on a bio neural network and it still seems more fiction than science 40 years later. I don't really get why a company like Intel still refuses to see they are bangin their head into a brickwall to keep the brute force tactics of forcing higher clocks onto a monolithic die using huge ammount of power and creating more problems for them self each chip generation trying to compete.
@koni_ey
@koni_ey Год назад
Just also wanted to drop a comment. Great channel and thanks for the motivation to study for my computer architecture exam tomorrow ;)
@oskansavli
@oskansavli Год назад
Why are they putting the extra cache on top in the first place? This is a desktop CPU so why not just put them side by side and make it slightly bigger?
@HighYield
@HighYield Год назад
Latency, if you put it to the side it wouldn't retain the L3 cache speed.
@justinmacneil623
@justinmacneil623 Год назад
Thanks for an interesting review. I suspect that future iterations might end up with the whole L3 cache separated out in a chiplet rather than the current mixed situation with 1/3rd in the CCD and 2/3rds on a separate cache chiplet. Presumably with a larger L2 to compensate for slightly increased L3 latency.
@GustavoNoronha
@GustavoNoronha Год назад
Or maybe they'll reduce the size of the v-cache and add another level of cache, L4. I remember when chips did not have even L2, maybe it's time for a new level.
@NoToeLong
@NoToeLong Год назад
@@GustavoNoronha - Intel had L4 cache on some of their Broadwell CPUs back in the day, with an extra 128MB die separate from the main die.
@greebj
@greebj Год назад
Each hop to next level of cache adds latency, as does increasing the structure to support a larger cache size. The Broadwell L4 EDRAM was only about half that of a trip out to dram (DDR3 at the time), which is pretty slow for cache. It's always a tradeoff. Navi31 would have had dual stacked vcache on the MCDs for 192Mb infinity cache, but AMD decided it the minimal performance gains weren't worth the added cost
@MaxIronsThird
@MaxIronsThird Год назад
It will be like the 7900XTX GPU, CCD on the center and MCD surrounding it(separate chips) and there will be only 3D-cache on top of the L3 dies.
@klaudialustig3259
@klaudialustig3259 Год назад
Gute Besserung!
@5poolcatrush
@5poolcatrush Год назад
I wonder why they've made a frankenstein's monsters of 7900~ and 7950~ X3Ds maing cache only on one die. I believe guys who aim for such top end processors expect to get full pack not some crippled thing. And gamers would be perfectly happy with 5800X3D anyways. Being AMD fan myself, i absolutely hate such intel-like behavoir from AMD.
@GustavoNoronha
@GustavoNoronha Год назад
Some would argue that this is actually uncrippling it, as you get one set of 8 cores that can clock higher, so you get the best of both worlds in such a system if you have the proper scheduling (which is early days, but will evolve). I think the use cases that would benefit more from having the cache on all CCDs lean more towards workstation, so we will likely see a ThreadRipper3D at some point.
@5poolcatrush
@5poolcatrush Год назад
@@GustavoNoronha thing is, "proper scheduling" is a crutch and workaround needed for those cripppled things to work. Why not just deliver proper hardware that can handle the tasks on its own? I suppose thats nothing more than cursed modern marketing trends that affect products in weird ways rather than selling reasonable products made as they meant to be from purely technical standpoint
@GustavoNoronha
@GustavoNoronha Год назад
@@5poolcatrush it's not a crutch if you can really benefit from both high cache and high frequency. If you think like that you could say that SMT is a crutch as well for not being able to add more proper cores. But it's in fact a design decision, weighing a lot of different trade-offs.
@blkspade23
@blkspade23 Год назад
@@5poolcatrush Most of the things you'd genuinely need more than 8 cores for would benefit from the higher frequencies and not the additional cache. Not even all games will use the additional cache. It makes sense to also care about games while having many other uses for a 16 core. You could already see from how the 5800X3D is worse in so many common applications compared to the 5800X, that the hit from dual v-cache wouldn't make sense. Having a die that can clock higher is a good thing.
@6SoulHunter9
@6SoulHunter9 Год назад
I was traveling and I missed this video. Great analysis, great info. Those are somethings that I wondered. I like this channel a lot because it talks about aspects of engineering that we mortals can understand. Also, I think that you greatly improved your pronunciation. Not that it was bad before, but now you have less accent which is more pleasant for ears and lots of viewers put great emphasis on that. Keep improving!
@RobBCactive
@RobBCactive Год назад
Cool! I couldn't see why they would use 5nm in that channel poll/answers, but that they're STILL using 7nm for the cache is hilarious! Imagine if you're involved in the "fastest game GPU" and you're overtaken by someone using such a mature and unfashionable process, even the IOD has moved to 6nm! Overall I'm pretty relaxed about this, already in Zen4 the heat density of the cores is causing enthusiast comments, "the chiplets are close together, bad for thermals" despite the IOD often being the truly hottest part of the die under many workloads, then there's der8auer's claims on the thicker IHS being "a mistake" and if you use his delidding tool and direct die cooling your thermals improve. Yet OTOH review journalists tried out using the Wraith box coolers with Zen4 and were amazed at how little the performance impact was, though the chips were running hotter than spec. Recently der8er had an Intel engineer in who actually works on the sensor placement and power management for boosting who explained why thermal targets have become a thing, because if you're not aggressively boosting you're leaving performance behind on the table. Now for me, the future involves power constraints, I just don't see an i9 13900K as superior even if it's slightly faster than an Ry 9 5900x3D using half the power.
@Stopinvadingmyhardware
@Stopinvadingmyhardware Год назад
GaN is going to be awesome
@ejkk9513
@ejkk9513 Год назад
The problem is that the x86 instruction set is at the limit of what we can do with it. AMD tripled the L3 cache of the Zen 4 chips, and all we got was at best 15% better performance. Look at what Apple did with their M1Arm chips. It's incredible how efficient and powerful the RISC based instructions set is. I hate Apple... but M1 is brilliant. Imagine that scaled up to P.C! They're still making regular, large performance increases while retaining the amazing efficiency it is known for. I know Intel and AMD are aware and nervous about an ARM future. The problem will be compatibility. All this x86 code will have to be rewritten, or they will have to use a compatibility layer like Rosetta. The obvious problem with that is the performance degradation rendering the performance increases inert. If they can introduce a compatibility layer that won't degrade performance... Intel and AMD will be in big trouble. X86 is bloated and far too inefficient for use going forward. It's a dead end. Intel and AMD know that. All they're doing is blasting the power draw and keeping it on life support.
@mirkomeschini80
@mirkomeschini80 Год назад
Why not put the L3 on the io die, and 3d vcache on top of it, instead of compute chiplets? Another solution could be stacking all them on top of the iodie, so io die (with L3) on bottom, vache and ccd's on top. Is it possible?
@davidgunther8428
@davidgunther8428 Год назад
They could put the cache chiplet under the logic chiplet, then the cores would be closest to the heatsink.
@Maxxilopez92
@Maxxilopez92 Год назад
You are the new AdoredTV for me. Keep going!
@HighYield
@HighYield Год назад
Interestingly, AdoredTV also touched on this in his last video :)
@gameoverman9610
@gameoverman9610 Год назад
@@HighYield It is also the delivery, a bit more grounded. When I am more in a relaxed mod I can follow your subjects, but for high energy delivery I feel AdoredTV. Both has its place.
@markvietti
@markvietti Год назад
jim shows more of his feels towards the product manufactures.
@maynardburger
@maynardburger Год назад
Way better than AdoredTV. Honest, not manipulative, not defensive, not trying to pretend he's some major insider with advanced knowledge of products.
@VideogamesAsArt
@VideogamesAsArt Год назад
Just found your channel, very interesting analysis. Intel meteor lake will have cache at the bottom so we will see how that compares!
@wolfiexii
@wolfiexii Год назад
Wake me up when I can get the cache on both CCD - it's a waste right now because it turns off the second CCD to ensure games run on the high cache load. They need to fix this silly nonsense.
@maynardburger
@maynardburger Год назад
I agree they should have just put Vcache on both and taken the clock hits. Yes, there would be a few workloads that might have like 5% less performance than the normal Zen 4 products, but so what? People clearly are buying the Vcache option because they think the benefits will be much bigger in higher priority workloads(for themselves).
@Centrioless
@Centrioless Год назад
​@@maynardburger that will make the product a zero sum game (with higher price tag), since 3d cache also only gives you ~5% performance increase
@mrlk665
@mrlk665 Год назад
I think if they can make a 3D silicon base it can slove the problem or make the 3d cache on separate die like io die
@TGC1775
@TGC1775 Год назад
I struggle to keep my 5800x3d cool. I can’t imagine a 8800x3D if they don’t fix the heat.
@stebo5562
@stebo5562 Год назад
Got mine on a basic hyper 212, no problems with cooling. What cooler are you using?
@hansolo8225
@hansolo8225 11 месяцев назад
Undervolt your cpu and use a liquid cooler, mine never goes above 65 degrees under full load.
@BaBaNaNaBa
@BaBaNaNaBa Год назад
Also why not stack CPU CCX above 3D VCache?
@Vaxovillion
@Vaxovillion Год назад
Learning so much thank you!
@MasterBot98
@MasterBot98 Год назад
What do you think about a hypothetical 7600x3d?
@HighYield
@HighYield Год назад
Would be a great gaming CPU, but as you can see from the 7900X3D, it doesnt quiet match the 8-cores. Since AMD already has 6C-X3D chiplets for the 7900X3D I think the only reason they dont offer a 7600X3D is market segmentation.
@MasterBot98
@MasterBot98 Год назад
@@HighYield id love 7600x3d if it was higher clock speed than 7800x3d
@jelipebands1700
@jelipebands1700 Год назад
Come on Amd put this in a laptop
@RM-el3gw
@RM-el3gw Год назад
thanks for the insight
@pf100andahalf
@pf100andahalf Год назад
Excellent video.
@MaxIronsThird
@MaxIronsThird Год назад
Inverting the CPU configuration is a really good idea.
@winebartender6653
@winebartender6653 Год назад
Largest issue I see with that is noise, capacitance issues and voltage stability. AMD already bins the CCD quite heavily for the vcache chips to be able to maintain decent clock speed at lower voltage thresholds. Flipping the stack would makes this even more important or lose more clock speed/core performance.
@anonymouscommentator
@anonymouscommentator Год назад
while i am very impressed by the gains that 3d vcache has made, i have to say i find it to be a rather "lazy" approach. amd's "just slap more cach on it" has a similar vibe to intels "just turn up the power". in fact both are running into their thermal limit though i have to admit that even though it is not easy, the chiplet approach is way more sustainable in the long run and will most likely be the standard going forward. personally, i see vcache much more of a built in accelerator as it only helps in games and even regresses the multicore performance a bit (while drastically reducing the power needed). it doesnt really feel the same as ipc/clockspeed improvements through architectural changes to the core design.
@maynardburger
@maynardburger Год назад
The idea that it only helps in games is patently false. In fact, a large majority of Vcache stacks getting made will be going to Epyc processors, not Ryzen. There are plenty of different workloads that benefit from more L3, it's just not *quite* as universal as something like higher clocks or something. I'd even go as far to say that AMD are being stingy by NOT going with Vcache as standard. The amount of workloads that benefit more from an extra 400Mhz than the L3 is not actually that big, and certainly the performance disadvantage is pretty small even in such situations. But AMD is desperate to retain every drop of performance potential possible in order to keep up against Intel.
@anonymouscommentator
@anonymouscommentator Год назад
@@maynardburger we dont have to kid ourselves. finding a productivity benchmark where the 7950x3d is notably faster than the 7950x is very, very rare. generally, the x3d variant is a few percent slower than the normal one. sure there are exceptions but its faaar from the norm. as to why amd is putting them on epyc: 1. large servers often run programs custom built for them. this means that such a program could very well use the vcache even though i cannot. 2. vcach drastically reduces the power needed to a point where it close to halves it. this is huge for datacenters. 3. maybe on a $10.000+ cpu amd can afford to put in more cache which only costs them a couple of bucks as explained in this video.
@asmongoldsmouth9839
@asmongoldsmouth9839 Год назад
I just did my research on the 3D v-cache for the 7000 series procs. There is no problem with the v-cache. It performs as well as everyone expected it to.
@asdf_asdf948
@asdf_asdf948 Год назад
This is completely incorrect. SRAM cells do not shrink the same amount as overall logic going from 7nm to 5nm. Therefore the L2/L3 areas of the compute chiplet will not shrink beyond the 3d cache chiplet.
@HighYield
@HighYield Год назад
I don't know what you are talking about, but the fact that SRAM scales worse than logic is literally the entire basis for this video. There is still a lot of logic on the CPU chiplet, which continues to scale down in size and as a result, the entire CCD will get smaller, while the L3D will stay a similar size. Especially for future versions, where you want more cache on the L3D chiplet.
@asdf_asdf948
@asdf_asdf948 Год назад
@@HighYield your main contention in the video is that the L2/L3 area of the compute chiplet will shrink... hence the 3d chiplet of the x3d is too big to fit over it. That is completely incorrect as you yourself acknowledge that SRAM does not shrink along with logic
@longdang2681
@longdang2681 Год назад
@@asdf_asdf948 Currently the 64MB V-cache is put over the 32MB L3 cache + other electronics logic. Whilst 32MB(of the 64MB) V-cache will still sit nicely over the underneath 32MB of L3 cache. The remaining 32MB V-cache will likely have an overhang problem in future models, as the electronics logic underneath it will shrink faster in area taken. I don't think it will be a big problem as AMD can simply use the area for additional logic, or more L2/L3 cache.
@maynardburger
@maynardburger Год назад
@@asdf_asdf948 There's still room to shrink the current on-die SRAM cells for the compute chiplets. These Vcache chiplets are testament to that with their custom, higher density memory design which is more likely at the limits of what can be achieved(on 7nm). It is definitely becoming much harder, though.
@ChadKenova
@ChadKenova Год назад
Love my 7950x3d so far was on a 5950x 3080ti then got a 4090 so i picked up a 5800x3d when it got $320 and now picked up a 7950x3d when it launched at microcenter. Running 32gb’s 6200mhz cl30 ram and a b650e aorus master I see no point for 95% of the people to get a 670 or 670e this time.
@pinotarallino
@pinotarallino Год назад
Hey stepCPU, you V-cache is soo big...
Далее
Zen 4 X3D is great - but has one Big Problem
16:27
Просмотров 97 тыс.
AMD ZEN 6 - Next-gen Chiplets & Packaging
16:37
Просмотров 187 тыс.
Meni yerga urdingda
00:20
Просмотров 487 тыс.
Deep-dive into the technology of AMD's MI300
17:40
Просмотров 61 тыс.
How this tiny GPU invented the Future
18:00
Просмотров 221 тыс.
Next-Gen CPUs/GPUs have a HUGE problem!
8:59
Просмотров 199 тыс.
AMD Core-Parking problems FIXED once and for all!
24:17
Why AMD's first Hybrid-CPU is Different
10:05
Просмотров 159 тыс.
I built my own 16-Bit CPU in Excel
15:45
Просмотров 1,4 млн
RDNA3 - what went wrong?
24:00
Просмотров 53 тыс.
CPU vs GPU vs TPU vs DPU vs QPU
8:25
Просмотров 1,8 млн