An EPYC Disclosure

TechTechPotato

Подписаться 130 тыс.

Просмотров 33 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

28 сен 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 163

@paxdriver Год назад

TTP - the only man on RU-vid who can render video in a black/white pinstripe without massive dithering lol well done bruv

@desertfish74 Год назад

I heard you like cores so we put cores in your cores so you can core while you core.

@hocek11 Год назад

I'd like to see that MI300A accelerator scalled down to basic computing, something like next gen more powerful APU for the mainstream.

@ThomasTomiczek Год назад

This is not an MI, this is basically waht the Ryzen 8000 will be.

@longrove5710 Год назад

Supposedly next year AMD will be releasing a 16 core Zen 4+ APU with 40 RDNA 3+ cores currently codenamed Strix Point Halo.

@marcin_karwinski Год назад

@@longrove5710 I wonder if AMD officially stated now that they are going to do so, as in "we're releasing it next quarter" coming from Lisa Su, what would be the reaction from Intel and nVidia? Maybe the former would turn to their Arc departement or rather the Ponte Vecchio team with Xe-HPC for tiled integration into the CPU design, an in-house updated i7-8809G descendant, and the latter would release to the general, as in average Joe, public something like a supped-up child of Jetson industrial "board" and Grace-Hopper HPC module...

@DLTX1007 Год назад

Instinct is GCN evolved minus all the graphics bits... so no way it's going to be any use for basic computing (and also, CDNA evolved from the idea that graphics don't fill up the pipeline)

@DLTX1007 Год назад

@@marcin_karwinski They already are but Intel's graphics ability is rather stretched thin (Meteor lake will run proper Arc Alchemist) by their graphics ambitions which aren't quite turning a dime

@klfjoat Год назад

6:10 Another reason for the -S SKU is for some customers like my last company. Because of processor data leakage vulns, we refused to do sensitive computing (and 60% of what we did was sensitive) in the cloud on shared hardware with SMT enabled. That meant we ran on expensive dedicated hardware instances until the CSP gave us non-SMT instance types.

@fredbluntstoned Год назад

Burr like Purr. Burr Ga Mow! Lawnmower go Brrr!!! Burr Ga Mow! Just say Ga like Go changing 'o' sound for 'a' sound.

@Timer5Tim Год назад

I am sure my great hreat grandchildren will never forget the day ROCm for windows comes out if only i could see it in my lifetime.

@hammerheadcorvette4 Год назад

Lol

@hammerheadcorvette4 Год назад

If you are using ROCm, you are definitely using Linux.

@CRBarchager Год назад

1:35 Thank you for clearing this up. There have been some obscurity about this whether or not there was more than just change to the cache.

@andreaolivo523 Год назад

Bèrgamo is correct (not Bergámo). Same with Génoa (not Genóa), albeit in Italian it's spelled Genova. Source: I'm from Bergamo

@nolimit7582 Год назад

1. Why did you remove your Cerebras+G42 video? Did you accidentally say something superfluous? 2. Please ask Cerebras about the form of their chips. Have they investigated the possibility of making them full-wafer (circular) rather than squared? Unusual forms do not make sense for many chips because of cutting difficulties, but in the case of full-wafer chips, the form factor does not seem to be a limiting factor.

@ToTheBrightFuture Год назад

@TechTechPotato Yo man, please wink if you can't answer the first question and answer only the second one. Thank you :D

@jiamingzuo9416 Год назад

Then did amd back then did low clock speed to pump in more cores as strats，or actually limited on the performance just due to their architecture？Very curious about that

@JOAOPENICHE Год назад

Finally understand Zen4c thank you

@El.Duder-ino Год назад

AMD is clearly going BIG with HPCs with MI300 variants by the end of this year. Its quite obvious a healthy competition is much needed with Nvidia's as well as Intel's enterprise accelerators. I would give my "noob" amateur prediction that we will see chips within 10 years with enough cache to replace operating memory starting with enterprise and followed by consumer market. A race to high speed and high capacity cache-like memory is equally important as chip architecture improvements. Especially after it was confirmed it isn't possible to further shrink current design of the cache memory and thus it makes sense to stack it vertically like AMD's V-Cache. Seeing future chips running with memory which has no or very tiny bottlenecks is wet dream of every enterprise and consumer computer designer. Seeing chips with more than enough cache with or without combination of the HBM memory is in my opinion a right way to go to completely utilize processor capabilities and performance unless new disruptive technology like memristors is broadly adopted. Will see what might come first, but memory improvements r very much needed as well.

@nicustroh Год назад

Thanks!

@AntonioBarba_TheKaneB Год назад

you pronounced Bergamo correctly, approved!

@denvera1g1 Год назад

You know, when they teased Genoa X i said something along the lines of 'Remember the Xeon X5698, that -HPC- HFT focused Westemere EP that had 4 of the 6 cores disable, so those 2 cores could have all that L3 cache and run at 4.4Ghz instead of 3.6Ghz, i would love to see a spiritual successor to that in Genoa X' And well, thank you AMD, i know that not that many customers would need that cache, but i sure love that you did it anyway

@esra_erimez Год назад

Seymour Cray would be proud

@JustAboutToEat Год назад

Love the Xibit intro!❤

@MK-xc9to Год назад

With the next shrink and Zen5c we may be back to 12 CPU Chiplets ( with 16 Cores ) = 192 Cores /384 Threads , maybe even with 2 or 3 Layer of V-Cache on Top , if the Latency Penality is dwarfed by Performance Gains in some Apps . Thats the beauty of AMDs modular System , AMD can react very quickly to Customer needs . AI = some Accelerators will be the next which AMD will integrate , AMD currently is behind in this regard vs Intel but i guess we will see it with Zen 5

@Jacob-hl6sn Год назад

I put cores in your core so you can core while you core.

@bartoszskowronski Год назад

don't start talking faster than 1s into the video, because on mobile can cut out that.

@lil----lil Год назад

What's your specification!?

@felixlucien7375 Год назад

Rocm totally sucks right now, went and bought nvidia gpu for that reason, communication with customers also lacking, I really hope they put in the required investment into the software side of things for AI because great hardware goes to waste without software support

@scarletspidernz Год назад

Will they be bringing the shrunken Xilinx Media engine and Ai to consumer Gpus in next gen? It's very much needed to get AMD to compete equivalently in the Consumer gpu space with Nvidia

@marcin_karwinski Год назад

Now if only they released Bergamo-based dual chiplet Ryzen processor with 32 highly efficient cores and 64 threads in a desktop platform... and a mixed processor with 1 chiplet Genoa-X-based, so *X3D and 1 with efficient multitude of cores... a 24 mixed cores all supporting HT and with exact same instructions support, 8 favoring games and memory intensive loads, and 16 for more efficient multicore workstationy loads... Since both of these chiplets are more power efficiency oriented than the regular Ryzen high clock ones, maybe the power draw would be the same as the X3D current SKUs... and the higher clocked RAM in the desktop system would help to somewhat negate or alleviate a bit the Bergamo-based chiplet's L3-deficit.

@ChristianHowell Год назад

AMD has given themselves a level of flexibility for design never seen before... At this point they can actually use 4C/5C for desktop laptop machines, a mix of 4/4C or 5/5C... I think they left out a lot from the Instinct presentation to keep Nvidia in the dark... The biggest change needed is a mixed precision engine where they can do 4/8bit INT for inference... But It seems realistic that they will look to Xilinx for inference and push Instinct for training and AI... Having AVX512 is a huge boost for EPYC 300A I think will be great for local execution so rather than loading instructions into the Root CPU, the APU can load the instructions locally for an even greater speedup... They will need the software to be aware and applicable drivers but it will speedup execution by 2X at least...

@NarekAvetisyan Год назад

I want the Threaripper version.

@marshallmcluhan33 Год назад

Chippy chiplets better have crazy bandwidth and 8-bit Float (FP8) cores.

@xlr555usa Год назад

Do you ever put potatoes in your Guinness?

@MoonshineOctopus Год назад

Looking a bit pale, Ian. Burning the candle at both ends? BTW, interesting summary of the new AMD tech 😎

@TechTechPotato Год назад

Lol just the lighting. I stood too far forward

@PAYPALMEAARONLYSTILA Год назад

k g b Kristy got back

@Kenneth_James Год назад

that's hardcore

@MakeSh00t Год назад

i want 120000cores with 12000ghz and 10.000 fps in games.

@erkinalp Год назад

Zen Far See 😁😁😁

@__-fi6xg Год назад

I want rocm to work on windows, i dont want to own a nvidia card, i dont care if its a 6000 or 7000 series.

@PAYPALMEAARONLYSTILA Год назад

And help me get my stolen Bitcoin stolen by mass state in 2015 in court case from leominster ma

@mrfarts5176 Год назад

Good tp see a proud gay man like this.

@rivox1009 Год назад

Well, here we are, 1GB+ cache, 256 threads... damn

@shanent5793 Год назад

I hope Microcenter gets the 12-core 1.1GB Genoa-X rejects

@AtaGunZ Год назад

lmao

@EricInTheNet Год назад

For heavy HPC workloads (stochastic dynamic programming for hydropower plants) that are floating point unit limited, the hyperthread cores are more than useless, they slow down the computation - so we typically turn them off in cloud environment that give us access to such machine configurations. So it will be welcome to have machines with no HT!

@paulblair898 Год назад

Same with CG solvers in basically any FEA problem, although the bottleneck switches to memory bandwidth once the problem goes over a few million DoF.

@TankEnMate Год назад

You don't need to turn off SMT (HT is Intel), you can use cgroups to limit which CPUs your workload runs on. This also applies to vCPUs (you can control through cgroups which CPUs the guest can run on). So really it's a trade off between a ~9% reduction in CPU cost (which is only part of the cost of a blade) and the flexibility to either supply or not supply SMT to your guests.

@stephan553 Год назад

Multiple threads per core (hyper-threading) also is a potential vector for side channel attacks. Remember how Intel recommend disabling HTT during the Spectre/Meltdown fiasco? Having HTT disabled already on the hardware is a definite security benefit for a number of applications to be run on dedicated cloud hardware. And these folks care _much_ more about security than CPT.

@ChristianHowell Год назад

AMD designed for SMT which meant threads were tagged so that every thread had its own execution space and couldn't get access to data from any other thread real or virtual...

@GeekProdigyGuy Год назад

@@ChristianHowellwhile they dodged the OG Spectre/Meltdown, there have been many speculative execution vulnerabilities found on AMD CPUs. it doesn't inspire confidence even if SMT is (or rather currently appears to be) technically safe.

@ChristianHowell Год назад

@@GeekProdigyGuy There haven't been many that don't require physical access that I remember and Zen 4 fixes even perform better than Retpoline...

@rougenaxela Год назад

I tend to think Zen4c-based workstation (or laptop) parts might be interesting for some workloads.

@user-lp5wb2rb3v Год назад

imagine 4x4 zen 4c with a 1GB "adamante" / "Telum" like VL2$ layer per cpu, and each core 1mb L1$, in a 50mm2 silicon package I really want 32 cores + 64 RDNA3 cu in a single "APU package" probably 120-170w, and 16c + 40cu in a 15-28w package, but most importantly a 6/12 core package + 24cu in 7-12w or something for laptops (these would be defective cores, low frequency)

@-szega Год назад

Saying Intel's HBM parts are exclusively for HPC and not for "technical computing" so you shouldn't compare HBM to X3D is certainly very clever from AMD's marketing. Kinda horseshit, but clever.

@aravindpallippara1577 Год назад

L3 cache vs hbm would mean even niche wins for amd vs more utility wins fo intel It's question of size vs speed, l3 has smallest size but fastest speed, hbm is next and after than ram, so it's just pick whichever one for your workload deal here

@alb.1911 Год назад

Bérgamo, you did well!

@supabass4003 Год назад

In Soviet Russia, CPU computes you!

@supabass4003 Год назад

BTW Ian, you have the best LAN tan in the business.

@TechTechPotato Год назад

Overexposed and more suited to the calming yellow lights of a fab

@sirmongoose Год назад

In Soviet Russia we still manufacture 50 nm die

@xlr555usa Год назад

50 nm dies? Rock on ! Where will we go smaller then 1 nm? It is getting hot and stuffy in here. Can't we press an AI Easy Button. I need to take a nap.

@ekene626 Год назад

Not funny and makes no sense😢

@macronomicus Год назад

For ROCm best success it could help if they embraced the smaller scale creative market with their GPU's, CUDA is the main reason I dont buy AMD GPU, since I need it for 3d work & other creative apps, also there's loads of local Ai stuff NVIDIA gpu are good at. Its a difficult choice I would imagine, but that extra energy & inflows would assure ROCm gets regular attention & broader adoption.

@GraveUypo Год назад

yeah i've decided not to go AMD on my next upgrade for the same reason. it's taken them too long to do anything to catch up. I still think that for games they have the edge when it comes to cost effectiveness, but at this point i'm very unwillingly going to pay the nvidia tax to get cuda on my desktop so i don't have to keep using my much slower laptop for that.

@Tuasmanque Год назад

They've been starting to (there is official support for the RDNA workstation GPUs and latest gen consumer ones, alongside unofficial support for the last couple generations of consumer ones), but progress has been painfully slow... I wonder if this is one of those areas where AMD needs to be willing to bite the bullet and invest a lot of $$ into their software/library dev/packaging side in order to gain a much larger reward/not lose out on future market share. Presently that doesn't appear to be the case, but I'd be real happy to be proven wrong.

@ladislavseps4801 Год назад

AMD's current problem is that they have 2 different architectures: CDNA for massive compute and RDNA for graphics. Improvements for rock stable and highly optimized ROCm on CDNA won't do much for home and small scale compute on RDNA.

@Tuasmanque Год назад

At the driver and lowest level software side maybe, but the gap being discussed here is mostly at higher levels (e.g. think PyTorch and its direct dependencies). In that context differences between programming against CDNA and RDNA are not as significant, certainly not much more so than working with consumer vs data center cards on the Nvidia side. That's why AMD is able to maintain ROCm support on RDNA for a lot of the AI and HPC stack with a relatively tiny software team.

@eljoanthonynj6817 Год назад

I want proper hardware acceleration in pytorch and I'm good.

@ПётрБ-с2ц Год назад

06:00 so basically free SMT performance is wasted because cloud providers do not know how to sell it lmao

@seylaw Год назад

Will you cover Intel's recent announcements regarding APX and AVX10, too? It would also be great to know what AMD thinks of these extensions and if it will take half a decade for them to support these or if adoption will be faster than with AVX-512.

@Anton1699 Год назад

The good thing about AVX10.1 is that it is feature-equivalent to Sapphire Rapids’ implementation of AVX-512. So a piece of software that targets AVX10.1/512 or AVX10.1/256 can also check the relevant AVX-512 feature bits and can use that codepath if either one is supported.

@seylaw Год назад

@@Anton1699 AVX10.2 is the more interesting of the two, IMHO, as it brings advanced vector capabilities across the whole range of products, not just server SKUs. It also probably means no proper AVX-512 support with desktop products for quite some time still.

@Anton1699 Год назад

@@seylaw My point was that if you write a piece of software that targets AVX10.1/256 now, and you don't use any of the instructions not supported by Zen 4, then your code will run on AMD CPUs from Zen 4 onwards, Intel Server CPUs and Intel Client CPUs once they introduce AVX10 support.

@seylaw Год назад

@@Anton1699 Sorry, I was talking in more general terms about both ISAs from a user's perspective. And while I get the value you describe for developers that want to run forward-looking code now, I cannot see this being the optimal way, as a 512-bit vector length was a major feature before that just got downgraded (and integration of that feature in E-cores would have checked that box). So Zen 4 users won't get served optimally with that approach everywhere. And as I understand it (I am not an engineer) AVX10.1 and 10.2 are still not as flexible as ARM's SVE2 which means that code would not automagically make use of 512-bit vector lenghts when found on the P-cores, or have I missed something?

@Anton1699 Год назад

@@seylaw You've got that right. A developer has to check the AVX10 CPUID leaf to determine whether they can execute 512-bit wide AVX10 instructions. So you do need two seperate code paths if you wish to target both AVX10/256 & AVX10/512. I personally think the 512-bit vectors were the least exciting feature of AVX-512 and I honestly don't understand why Intel chose to make that the eponymous feature. AVX-512 introduced so many basic instructions that were missing from SSE/AVX for no good reason. One of the more noteworthy applications where AVX-512 provides large speedups over AVX2 is the PS3 emulator RPCS3 and it lets you choose between 256-bit wide AVX-512 and "full-fat" AVX-512. I just hope that Intel and AMD manage to introduce AVX10 quickly and I really hope they do not produce SKUs that do not support it. Intel made SSE4.2-only Celerons and Pentium CPUs until very recently.

@anon_y_mousse Год назад

Of course you're going to pronounce non-English words wrong, you're British. If I ever hear a Brit pronounce a whole sentence of French, Spanish or Italian correctly, I just may die of shock.

@aniksamiurrahman6365 Год назад

So, ROCm coming of age? OK, I understand its not about GPU, but really, how is that not relevant?

@superscuba73 Год назад

Here's hoping AMD takes the "garbage" bin chiplets and sends them to desktop and HEDT.

@goodiezgrigis Год назад

Look man, I'm still rocking 2800X and not even looking for a replacement. Are you flexing on me?

@PAYPALMEAARONLYSTILA Год назад

Them I'll Mary dual citizen and lodge my money in outher countries next time plus my own country here in usa

@juggernautjunky Год назад

I'm curious about the MI300A and specifically why AMD landed on 24 CPU cores being the correct amount? Why not use Zen 4c chiplets to go to 48? This isn't a criticism, I just wanted to understand what objectives were driving this specific design. On a separate note, Hi Ian, would you be up for doing an explainer, or even deep dive, on Intels recent AVX10 & APX announcements? I'm curious about your thoughts on their long term benefits, or if these changes will only really benefit compilers.

@Pushing_Pixels Год назад

They probably went for higher clockspeeds on the CPU component than Zen4c allows.

@SirJohn2024 Год назад

Bérgamo is spot on...😏

@Steamrick Год назад

Hugging Face is such an odd name for a company...

@christophermullins7163 Год назад

Moore Cores!

@Veptis Год назад

I plan to build a workstation around October this year. And I feel very unsure about any decisions I made so far. I got an A750 from an Intel Giveaway, but that's currently very useless for model inference. (it's great for doing stable Diffusion) but language model inference is really broken. Intels own numbers have their Gaudi2 being 300% ahead of the GPU Max, and I can't buy either. It seems like getting Nvidia L40 or RTX 6000 Ada is the simple solution because software will just work. But I haven't looked at AMD. Or just CPU. Intel has done a lot of marketing for SPR inference, but that technology isn't available on their client CPUs which I want for gaming. Also the ROCm PyTorch support is Linux only. It's native, but Linux Only. Making this not very useful for end user inference. Intel said that IPEX Windows native is happing near the "end of the year". But they were able to give me any timeline directly. Intel is publishing more and more stories about selling tiny GPU clusters with PVC. But I feel like they are missing GPU model inference for end users on Windows.

@tominmoreau8546 Год назад

linux is now preaty user frendly if you don't want to use advenced features. a simple dual boot if you don't want to deal with the wine stuff with a shared drive for everything that you need on both OS.

@Veptis Год назад

@@tominmoreau8546 that seems to be a likely reality as a developer. And WSL still requires Dual boot due to anti cheats not allowing Hyper-V. But this will not help with enduser experience. Right now the only viable experience for end users is to own an Nvidia GPU as CUDA will run by default and run without installing any toolkit, setting up wsl2 or getting docker (maybe openVINO runtime). It seems like Intel is focussing on CPU and perhaps even VPU for client inference. No confidence in GPU inference, means their iGPU still struggles.

@lhl Год назад

If you're looking to do any serious ML, especially if you're looking at data center GPUs, I'd *highly* recommend running it in on a dedicated (headless) Linux box. You should run the numbers though, even for inference, once you account for power costs and if you depreciate your hardware at 2-3y, you'll probably be better off from a cost perspective to use cloud GPUs vs local (but everyone should run their own utilization numbers).

@Veptis Год назад

@@lhl this is a purely developer machine, I don't plan to run inference all day or even do week long training runs. Perhaps a few hours of fine tuning but that's all. Yes, it's much more cost effective to simply run on a cloud instance and in fact - I could do that right now for free. But it's very far removed from efficient development for stuff locally. And I can't run a model server on my cloud instances since they are really limited in terms of Internet access, so sending all of that via ssh seems like a stretch. Might look into it, but the one node I can easily access is just a 24GB RTX5000 which won't run 15B models and you have to keep caching off. for large evaluation runs I can request largest instances. But I need to develop all my tasks first and make sure they come up with results that mean something. Using a tiny model on CPU locally for development doesn't do the trick anymore because the tasks are difficult enough for a gpt2 fails 100%

@lhl Год назад

@@Veptis For your use cases, a 24GB card sounds fine. If you need a local card, your best bang for buck will probably be a used 3090 (~$700) right now. It will run SDXL easily, and you can fit 4-bit quants of 33B models w/ no problem (exllama is most memory efficient, llama.cpp will let you offload layers if you are extending context or something). You should also have no problem running QLoRA or other 4-bit fine-tuning. I have a 24GB card in my workstation and have no problem running StarCoder 15B (@q8, basically no perplexity loss, but also check out CodeGen25-7B which performs pretty close at half the size) models or any of the 30B class models at 2K+ context, however you *will* run out of VRAM if you are running Windows and driving a display/running apps like browsers from your card. You can always get a second 3090 and get good scaling and be able to run a 70B+16K context. Remember, that the cards you've mentioned are $7000-10000, a lot to pay if you're just dabbling (eg, it sounds like you could pay about 10% of that and do what you want locally). Also for those following along, at home, $7000 is ~10h/day of A100 80GB cloud compute for 2 years at Runpod's spot prices right now. Plug in a spreadsheet with your home KW/h power costs if you're looking at things from a cost/perf perspective, but suffice to say, for most people it'd be a lot cheaper to rent GPUs when you need them. (I understand how having local hardware can be convenient though. I'm building a bunch of latency sensitive apps myself and it's nice to be able to have something under the desk. Toasty and pricey, though.)

@PAYPALMEAARONLYSTILA Год назад

Cia central intelligence aaron

@IntegralTriangle Год назад

Do you wear white makeup?

@Pegaroo_ Год назад

Is AMD using different chiplets for Consumer and Enterprise a good thing for consumers? Or do you think we'll see consumer products using the more power efficient chiplets too?

@DileepB Год назад

Zen4c should work well in laptops.

@Pegaroo_ Год назад

@@DileepB I was thinking maybe Atom and Celeron like use cases too

@shepardpolska Год назад

Zen 5c is supposed to be in APUs, so it seems there will be no Enterprise specific CCDs as long as AMD thinks they will sell in the Consumer market

@Pegaroo_ Год назад

@@shepardpolska They are going to prioritise profit so if they can make a wafer of Enterprise specific chiplets or a wafer of Consumer specific chiplets they are going to prioritise Enterprise. But done a bit of googling and Zen4s integrated graphics is part of the i/o die so if Zen5 is the same the chiplets will be same for both so consumers can get the cast off chiplets that don't make the spec for enterprise. Still interested to see if any of the "c" variants make it to consumer products

@shepardpolska Год назад

@@Pegaroo_ It's like this currently. Ryzen is using the same CCDs as EPYC but with a different IO die, and probably ones that are binned worse for energy consumption. It's like this with each chiplet based Zen.

@orthodoxNPC Год назад

#FebruaryTan

@solidreactor Год назад

About the Zen4 vs Zen4c chiplet designs and regarding the limits of I/O, Memory and Compute parts were each scales differently with transistor shrinkage. For example I/O drops off around 12nm, cache memory (S-Ram) at around 7nm-10nm however the compute seems to still scale well with even lower size transistors. With that said, the question that pops up in my head is *"Wouldn't it be better to have each CCD with one of the CCX be performance & memory dense while the other CCX be core dense like Zen4c?"* In a sense that would make the design of the 2 different CCX (core complex) in a single CCD (chiplet die) be like an "internally" heterogeneous design, mix of different types of CCX in one die, where half the cpu is like Zen4 and the other half Zen4c, in the same die. I believe Zen5 (or perhaps later) could have 2 different CCX designs in a single CCD (chiplet) where the first CCX having something around 4-8 high performance cores with a lot of L2 and L3 cache that utilizes an architecture optimized for performance instead of die area; While the other CCX could have very little cache, use different logic transistor libraries aimed for density (less frequency and less energy) and instead use the die space for more cores instead, cores that actually *shrinks* better with smaller nodes. Going further with transistor shrinkage this gap between "Performance CCX" and "Efficiency CCX" will only increase. I am guessing that this second CCX could have up to 32 cores by itself or something around those numbers. Remember SRam cache doesn't shrink well after around 7nm, however logic transistors (the compute) does and that is the key to this idea. And this gap will increase over time. Going further with this idea you could perhaps just remove L3 cache all together in the "efficiency CCX" and instead only use the 3D V-Cache, resulting in one CCX basically have no die area for cache and fill it with dense and efficient "high compute per area" design. This would open up additional levels of heterogeneous design, where AMD could build different chiplets (CCD) by having different CCX design. One CCD could perhaps have 2 of the performance CCX (e.g. 8 cores) and another CCD have 2 efficiency CCX (e.g. 32-64 cores) and one could have a mix one of each (e.g. 24+ cores). Next heterogeneous level after the CCX designs would be the CCD level, where AMD could mix these different types of CCDs in same package and as well as having HBM chiplets, AI chiplets e.t.c. Maybe a Zen5c would only have these efficiency chiplets, Epic Zen5 have a mix and Threadripper have a more bias to performance and power hungry chiplets? I know I have been rambling :) but I do think this is an interesting idea in its core, where you work around the different limitations of I/O, Cache and Logic transistor die shrink capabilities.

@gatocochino5594 Год назад

"Wouldn't it be better to have each CCD with one of the CCX be performance & memory dense while the other CCX be core dense like Zen4c?" It would not.

@shepardpolska Год назад

The issue with that is I don't think there is any benefit to that. The whole idea of Zen4c is for it to be used where you need high density. If you split the CCX like you suggest you might not even fit 4 Zen4 cores, since that would make the CCD bigger then it already is. Zen4c CCD is already bigger then Zen4. It would be more realistic to have a Zen4 and a Zen4c CCD in consumer CPUs. It can be done already and would probably benefit the platform more overall.

@DileepB Год назад

Did they provide a slide deck?

@alexmannen1991 Год назад

id love a mix of consumer cores 6 high perf and 8 watt efficicent

@Justathought81 Год назад

Excellent breakdown thank you!

@jaffarbh Год назад

AI acceleration is half-hardware and half-software. Nvidia got CUDA, Intel is working on it's OneAI. I wonder what does AMD have to offer in this space? If nothing, then (in my opinion), throwing AI here is nothing but poor marketing that might make some fan boys happy but experts won't are unlikely to be fooled.

@shepardpolska Год назад

Doesn't ROCm work for AI for a while now? Not on windows, but on Linux for now.

@jaffarbh Год назад

@@shepardpolska Interesting. Something is certainly better than nothing. Is seems very basic, perhaps that's why OpenCL is still very popular among AMD users.

@shepardpolska Год назад

@@jaffarbh I didn't look into it much but from what I have seen ROCm does rather well on linux, and Windows support is planned, so yeah.

@dekeferrell7265 Год назад

Drink a Coores while you enjoy your cores in cores!

@michahojwa8132 Год назад

zen4c - c stands for cheap :)

@woolfel Год назад

it's nice to see AMD push to AI workloads, but they really need to beef up their software support for ML. George hotz tried to get RDNA working with tinygrad a few months back and it sucked. The driver kernel panic, but it looks like Lisa responded to george. You really need a full software stack and good support for ML frameworks. Without that, adoption isn't going to compete against CUDA. I find it interesting AMD AI chip has the same memory as M2 Ultra. My money is on Apple getting their software stack for ML in good shape before AMD.

@jannegrey593 Год назад

Did you see latest video on Roc5m from Wendell? One from like 3 days ago. He did show that implementation for AMD has really moved a lot in the past 6 months. If it keeps going at this rate, it should be very good competitor to CUDA in 12 months or so. For now it "only" runs Stable Diffusion at better quality than NVIDIA. SD is not as important to AI as Inference or training, but it is a good sign. Yes, they are still behind. But maybe if you will be buying computer privately in a year - it will be a decent competitor? For HPC it's not a problem, you just write your own software. But for "normal" users? You will need hell of a good support and also get people from other companies to support it well. CUDA mostly runs so well, because it's supported by like million 3rd party applications. If AMD can get it to the level where they have even 10% of such support, just in important places, they can start taking away market share from NVIDIA. Mindshare will be more difficult. I really hope that AMD could create something like Jetson Nano - affordable, cheap, small units that can be used for training purposes, but aren't completely useless in of themselves.

@metalhead2550 Год назад

4 vs. 4c: Looking at those floorplans, if they're to scale, there's no way that's just a change of node corner, there's going to be some re-pipelining and re-balancing in there at least.

@velo1337 Год назад

when will the intel xeon max 9480 actually be released?

@TechTechPotato Год назад

The Xeon Max CPUs are for HPC customers only. You won't be able to get them in a workstation.

@velo1337 Год назад

we also run our vcpus with either HT deactivated or we only sell the real cores and use the HT Cores for overhead. the 2nd option becomes useless however, when you have 128 ht cores as "overhead". you can also safe some power not running HT. in our tests, HT cores only perform at around 20-25% of a real core anyway

@saultube44 Год назад

I think it's a mistake to design to reduce the frequency, it should be 4.5 Ghz All core, that would give 50% more cycles, which will add value and performance; and for these days, isn't but low frequency, so shouldn't be a big deal. I don't see a design bottleneck at all IMHO SMT is 1.3C performance and 2 Threads execution, so just add this consideration to customers, but I understand the requirement, but it's irrelevant if the software is tailored properly to the client's needs. Zen4 EPYC Genoa I think the cores should go 24/48/96, makes more sense and it's more beneficial for the users

@prajaybasu Год назад

AMD's numbers certainly don't include the acceleration from the IP blocks in SPR

@benjaminlynch9958 Год назад

Phoronix benchmarks do include the accelerators, and there were only a couple AI inference workloads that SPR was even competitive in. Across the hundreds of benchmarks in the Phoronix test suite, that was less than 10 workloads, and on average the Zen4C system was almost 2x what SPR was able to do. Unless your workload specifically benefits from those accelerators 90% of the time, SPR is dead on arrival.

@prajaybasu Год назад

@@benjaminlynch9958There is more to acceleration than AI inference. Majority of the internet still runs on regular ol' crypto and databases. I'm talking about QAT/IAA/DSA.

@j340_official Год назад

Why didn’t intel figure this out, shrink golden cove (and leave avx512 in tact) to run at a lower frequency and thus consume less power rather than introduce gracemont with no avx512 abilities? Amd clearly understands what levers they can pull to optimize a design for a given workload. Amd is solutions oriented and seems to have a desire to build customized silicon solutions for a given workload rather than remaining “general purpose”. Intel needs to adopt a similar mentality. For instance, amd figured out that gaming workloads could benefit significantly from additional l3 cache. Hence zen 3d. Data centers could benefit from additional density, hence bergamo. Where are the intel equivalents for gaming, data center? Will be interesting to see what intel does with chiplets/tiles in meteor lake and beyond.

@scarletspidernz Год назад

Probably coz Intel was a mess internally and scrambling for any idea that would work, and some engineers working on the P & E core method had more success than the shrinking method and also e cores for laptops and nucs so that got greenlit over other projects

@johnkost2514 Год назад

MI300A would make for an awesome workstation in CAD or especially AI/ML. I bet is will run Crysis pretty well also..

@djvillan Год назад

Boring

@ossiedunstan4419 Год назад

Long as they keep it out of RIG and my Privately owned network , then i will not have to sue them. i only use the basic for AMD graphics , as i do not need more fucking PC companies trying to take my data and use my band width when they do not have permission or any legal right in Australia, I pay for my band width l Also paid for 7,500 dollars worth of ANMD hardware which does not give them legal access to any of my equipment or past the socket , If i catch them i will SUE them , then sell of the companies assets . Their will be no more AMD.

@callofdutyfreak10123 Год назад

The problem I personally have with these Cloud cores from AMD is the fact that, if they were able to fit everything into the 4C footprint, then why didn’t they ship all Zen 4 chips using the C cores? Because they perform worse (less cache) and are clocked lower…

@morosis82 Год назад

Uh, you answered your own question. They have completely different target markets, Zen4 is performance oriented while 4C is efficiency oriented. When you need to do a lot of compute, Zen4 will be better. When you're running node and pushing out web stuff, 4C will be good enough while being more efficient and lots cheaper.

@callofdutyfreak10123 Год назад

@@morosis82 I know, I was answering my own question…

@shieldtablet942 Год назад

From benchmarks I've seen, I think Zen 4C is like the console APUs and cuts the vector exec width in half. Also interesting they dont list the vector latency just the FPU.