This Server CPU is so FAST it Boots without DDR5

ServeTheHome

Подписаться 747 тыс.

Просмотров 142 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

27 окт 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 374

@DigitalJedi Год назад

I worked on this CPU! Specifically the bridge dies between the CPU tiles. I figured I'd share some fun facts about those CPU tiles here for you guys: Each CPU tile has 15 cores. Yes, 15. The room that the 16th would occupy is instead taken up by the combined memory controllers and HBM PHYs. There is not one continuous interposer. Instead, each CPU tile sits on top of EMIB "bridge" dies as I call them. this strategy is more similar to Apple's than AMD's, or even Meteor Lake's. This is because Sapphire Rapids is so enormous that it exceeds the reticle limit of the machines that make normal interposers. There are 4 CPU tiles, but and 10 bridges. The tiles each have 5 connections, 3 on one edge and then 2 on the neighboring edge. 2 of the tiles are mirror images of the other 2. You can get a diagonal pair by rotating one about the center axis 180 degrees, but the other 2 have to be mirrored to keep the connections in the right place.

@ummerfarooq5383 Год назад

Can it play starfield

@marcogenovesi8570 Год назад

@@ummerfarooq5383 can Starfield play?

@DigitalJedi Год назад

@@ummerfarooq5383 There is enough PCIE and RAM for 7 players to each have the P-cores of a 12900K and their own full bandwidth 4090.

@johnmijo Год назад

@@DigitalJedi thanks for you insight, always nice to see engineers talk about the work they do ;) I'm busy playing Starfield and porting it to my C128, why because I think that Z-80 will work as a nice co-processor to the 8510 CPU, ha :p

@GeekProdigyGuy Год назад

any special reason why there's an asymmetric 3+2 bridges instead of having 3 on both sides?

@stefannilsson2406 Год назад

I hope they evolve this and bring it to the workstation xeons. I would love to have a unlocked xeon with built in memory.

@jondadon3741 Год назад

Yo same

@stefannilsson2406 Год назад

@@startrekkerll5635 What do you mean? You still have memory slots that you can put memory in...

@L0S7N01S3Deus Год назад

Considering new AMX instructions and all that bandwidth afforded by HBM, it would be very interesting to see benchmarks for AI tasks, like running stable diffusion or llama models. How would they stack up against GPUs performance wise, or power and cost efficiency wise? Would be very relevant in current datacenter GPU shortage!

@Mr76Pontiac Год назад

One of the nice things about "Serve the HOME" (Emphasis on HOME) is that we get to have a glimpse to see what we'll be running in our HOMES as low end servers in 30 years.... I'm 5 minutes in and I can't imagine the cost of those things when they come to market, not to mention the REST of the hardware costs.

@maxhammick948 Год назад

Without the RAM slots taking up width, you could pack a HBM-only server incredibly dense - maybe 3 dual socket modules across a 19" rack? Not many data centres could handle that power density, but it would be pretty neat to see

@RENO_K Год назад

💀💀 the cooling on that bad boy is gonna be insane

@sanskar9679 7 месяцев назад

@@RENO_K with 3m's liquid that boils at almost 50 celcius you could maybe pack almost a thousand per rack

@shammyh Год назад

Great content Patrick!! Been waiting to hear about these for a while... And you always get the cool stuff first. 😉

@ServeTheHomeVideo Год назад

This one took a long time. Partly due to the complexity but also moving STH to Scottsdale and doing like 40 flights over the summer. I was hoping to get this live before Taiwan last week.

@Gastell0 Год назад

Damn, that localized memory is incredible for SQL instance/shard, web server cache and so much more. HBM memory runs at lower wattage than DDR memory, with significantly higher bus width and lower frequency required to achieve high bandwidth (afaik). p.s. Didn't show the bottom of it even once =\

@aarrondias9950 Год назад

Bottom of what?

@Gastell0 Год назад

@@aarrondias9950 the cpu module/pcb

@aarrondias9950 Год назад

@@Gastell0 1:01

@Gastell0 Год назад

@@aarrondias9950ooh, that was in introduction, I looked over again everywhere but that, thanks!

@gsuberland Год назад

On the topic of 1000W power draw, I believe these are the same CPU power delivery topology that Intel showed a while back during some of the lab tours (e.g. I believe one of der8auer's videos in the extreme OC labs showed this off), where you have a relatively small number of VRM phases on the motherboard providing an intermediate package voltage, followed by a massive number of on-die power stages (100+) parallelised into a huge segmented polyphase buck converter, which helps reduce ohmic losses and PDN impedance by moving the regulation closer to the point of load on the die. The combined continuous output current of the on-package converters appears to be 1023A, logically limited by the number of bits in the relevant power management control register. This kind of current delivery would be unworkable with a traditional VRM, but since the phases are physically distributed around the package the average current density is heavily reduced.

@chaosfenix Год назад

I hope this is something that filters down to consumer parts. Especially for APUs with integrated graphics we are pretty clearly getting to the point where they are being limited by memory bandwidth. The Z1 extreme with 8 CPU cores and 12 GPU cores is only about 5-30% faster than the Z1 with only 6 CPU cores and 4 GPU cores. These two chips are meant to operate in the same power limits and are running the same architectures. Given all that you would think that something with 3x as many GPU cores would be much faster but that just isn't the case and it is my guess that it is probably due to memory bandwidth. GPUs are bandwidth hungry and there is a reason GPUs pack their own specialized memory. I wonder if combining this with an APU couldn't let that iGPU stretch its legs to its full potential. Here is hoping.

@ummerfarooq5383 Год назад

I want to someone run starfield on it just for show. Of course let the cpu be overclocked to 5ghz

@chriswright8074 Год назад

Amd instinct

@DigitalJedi Год назад

This issue is that HBM is very expensive, and doing HBM right means a pretty much ground-up design for your chip to not only fit in the PHYs for the kilobit+ bus, but also the differences in controllers and possibly dual controllers if you still want DDR5 options. I've worked with HBM, and when you get to the class of connection density it requires, you need to spend the big bucks for a silicon interposer. Radeon Fiji did this, Vega and VII, and The Titan V come to mind. That is a whole massive die you need to make and then stack at least 2 other dies on top of. An HBM APU sounds awesome I agree, we even saw a glimmer of it with the i7 8809G, which had a 24CU Vega MGH GPU and 4GB of HBM. The more practical approach for right now though would be something with a dedicated GDDR controller, even just 128-bit 8GB would be plenty, as that is already around 288GB/s of bandwidth you aren't fighting the CPU over.

@-szega Год назад

Meteor Lake has hundreds of megs of L4 cache in the interposer, presumably mostly for the iGPU and as a low-power framebuffer (somewhat like the M1).

@chaosfenix Год назад

@@DigitalJedi Yeah I know there are definite issues. HBM has a 4096 bit bus which is gigantic compared to anything else and is why you need the complex interposer. Intels EMIB looks interesting and may help in that respect but we will have to see. Personally I would not have the option for additional DDR5. This would be replacing it. Many systems already use soldered memory so this would simply be an extension of that. I would dare say 90% of consumers don't bother upgrading the RAM on their computers anyway so if it is balanced properly it wouldn't be much of an issue.

@BlackEpyon Год назад

Some of us remember when CPUs had L2 cache external to the CPU. Then the Slot 1 had the cache integrated onto the same card as the CPU, and when the Pentium III came out, L2 cache was completely internal to the CPU die. I don't see external RAM going away any time soon, just because of how useful it can be to just add more RAM, but this seems to be following the same evolution, and the performance it brought. Perhaps one day we'll see internal RAM on consumer CPUs as well!

@RENO_K Год назад

That's seriously cool

@fangzhou3235 11 месяцев назад

No the original Pentium III (0.25um Katmai) does not have on die L2. It only comes in the 0.18um Coppermine version, which was super cool. The 500Mhz coppermine can OC to 666MHz without a sweat.

@maxniederman9411 8 месяцев назад

Ever heard of M-series macs?

@Strykenine Год назад

Love a good datacenter CPU discussion!

@edplat2367 Год назад

I can't wait to 5-10 years from now when see this come to high end gaming machines.

@Alex-wg1mb 2 месяца назад

or to buy the used xeons max from aliexpress with specially crafted motherboards for midi tower cases

@sehichanders7020 Год назад

8:53 I always figured HBM was the endgame for the entire Optane thing. Too bad it never really panned out since it had mad potential and could have changed how we think about, for example, database servers all together. Intel sometimes is so far ahead of themselves even they can't catch up to them (and then something like Arc happens 🤦‍♀)

@TheExard3k 11 месяцев назад

HBM gets wiped as any other memory on power loss. It has nothing to do with Optane and persistent memory

@sehichanders7020 11 месяцев назад

@@TheExard3k It's not about persistency. But when your persistent storage is so fast and low latency as Optane was supposed to be you can get away much much smaller memory pools, hence you can use faster HBM. The entire promise behind Optane was that it is so fast (especially IOPS wise) that you don't need to keep your entire application's data in memory.

@noth606 3 месяца назад

@@sehichanders7020 Well, it is a bit higher level in a sense, tiered pipelining is not _either_ HBM _OR_ Optane, correctly used both will be a boost just at a different level, Optane has been used as RAM lower tier and HBM here as higher tier, so from the CPU looking for data it would go L1, L2, HBM, RAM, Optane, mass storage if it needs to go that far with each tier being progressively slower and larger. It would work very well most likely, just be hella expensive and a bit of a bear to backend manage. But it would boost performance a lot for almost all data intensive types of applications since you'd never loose more clocks than absolutely necessary to get the needed data, so the CPU would be waiting a lot less time than it does now, which is very advantageous. Now if the data is not in L2 or L3 you have a immediate hit of 10+ clocks to check RAM and if you draw the short straw you have to go fish in mass storage SOL.

@cy5911 Год назад

Can't wait to buy these 5 years from now and use it for my homelab 🤣

@SchoolforHackers Год назад

Exactly my thought.

@hermanwooster8944 Год назад

I remember you telling me this episode was coming a few weeks ago! The idea of memory-on-a-chip would be sweet for the consumer audience. It was worth the wait. :)

@ServeTheHomeVideo Год назад

Took a little longer than expected because of a trip to Taiwan. I hope you have a great week

@BlackEpyon Год назад

Similar to how L2 cache used to be external to the CPU, then moved adjacent to the CPU with the Slot 1 and Slot A, and then moved completely internal to the CPU die, gaining performance with each evolution.

@Superkuh2 Год назад

64GB is kind of small for any AI workload that would take advantage of the memory bandwidth.

@GeekProdigyGuy Год назад

Compare it to GPU VRAM - sure top of the line GPUs have slightly more but H100 is pretty industry standard and has 80GB. Considering CPUs are definitely going to have way lower throughout than GPUs it doesn't seem like capacity would be the issue.

@ThelemaHQ Год назад

its a HBM2e also works like VRAM, its superfast btw my P40 24GB tesla with GDDR5 gets 2,50 sec in stable diffusion, while P100 16GB with HBM get 0,8 - 1,5 now imagine i use double P100

@Superkuh2 Год назад

@@ThelemaHQ stablediffusion isn't really memory bandwidth limited. Things like, say, transformer based large language models are.

@BusAlexey Год назад

Yes! Waited long time for this monster cpu

@shiba7651 Год назад

Pfff the cpu in my server is so fast it boots with ddr3

@stefannilsson2406 Год назад

Same! And it only takes like 10 minutes!

@CobsTech Год назад

While I work with virtualisation a lot compared to specific high performance workloads, this has always begged the question for me, even when playing around with a legacy Xeon Phi 5110p CoProcessor, how would a chip like this handle memory failure? Nowadays whenever we have memory failure, ECC kicks in as a first resort and then you have options such as Memory Mirroring so your workloads can continue with a reduced amount of available memory. How would a chip like this handle it, say, one of the HBM packages was defective or outright didn't work, does the BIOS of the system have any form of mirroring? Considering this is four seperate packages working as one, would this prevent the chip from booting up at all? Great coverage though, always fun to see what new products in the HPC sector brings to the table.

@skunch Год назад

if the memory fails, throw it out. This is the way now, integration of core components at the sacrifice of modularity and repairability

@autohmae Год назад

I don't know if this system supports it, but CPU hotplugging exists. Maybe the least useful way to do it, but that would be 1 way

@thatLion01 Год назад

Amazing content. Thank you intel for sponsoring this.

@MrHav1k Год назад

Good call out of the Intel Developer Cloud there at the end. It's so important to try these kinds of systems out to see if you'll even benefit from these features before you go out and drop a massive bag of $$$ on procuring one.

@magfal Год назад

Does AMD have a similar service? I've been wondering about the benefits of buckets of L3 cache.

@MrHav1k Год назад

@@magfal AMD doesn’t offer anything like the IDC to my knowledge. Just another edge Intel’s size and resources can deliver.

@shanent5793 Год назад

@@magfal Supermicro has their Jumpstart remote access, they can lend you an AMD server. Bergamo was even available pre-release

@OVERKILL_PINBALL Год назад

Interesting CPU for sure. All about finding the best use case. I was thinking this CPU might also be used to drive faster networking if it is using the HBM memory. Not sure if that was tested.

@__--JY-Moe--__ Год назад

thanks 4 the tech vid Patrick!! wowee 4 Intel Xenon Max!! gota get a few!! giddy up!!

@waldmensch2010 Год назад

I had testet Xeon Max a few months ago for kvm/vmware and did not performed well. this is only for hpc useful, nice video

@EyesOfByes Год назад

So, GDDR6X has higher latency than standard DDR5. How is HBM2e in this sense?

@ytmadpoo Год назад

I'm wondering how it would do running Prime95. With multiple cores per worker, it can hammer the memory pretty hard so the throughput of HBM should significantly boost the per-iteration speed, assuming the clock rates of the cores are decent. Tuning the worker threads to stick with the NUMA nodes would give the ideal performance (4 worker threads, each using all 14 cores on the same NUMA node). We did some similar tests way back when on a Xeon Phi and it was pretty decent although the HBM on there was much smaller so it still had to go out to "regular" memory quite often which slows things down. I've found that going over regular DDR4, it only takes a couple of cores in a worker to saturate the memory bus, although you do still get marginal improvements as you add cores. By the time I got above 10-12 cores per worker though, you can actually see a degradation as the individual cores are just sitting there waiting for RAM so the overhead can make iteration times drop.

@gheffz Год назад

Thanks!! Subscribed, All.

@ServeTheHomeVideo Год назад

Thank you

@jmd1743 Год назад

Honestly it feels like once AMD did their monster sized CPU chip everyone stopped caring about keeping things conventional like how it took on couple to make everyone start dancing at the school dance.

@stevesloan6775 Год назад

I’m keen to see full high performance computers on die utilising a derivative of this tech.

@jlficken Год назад

I love enterprise hardware! I'm still rocking E5-26XX V4 CPU's at home though 😞

@степанстепаненко-б1э Год назад

you are saying words faster than this processor can handle. I wanted to see traditional tests of this processor in aida64, senibench, 3dmark

@nobodhilikeshu4092 Год назад

My computer boots without DDR5 too. Nice to see they're starting to catch up. ;)

@EyesOfByes Год назад

8:52 My thought is why Apple didn't try to aquire the Optane ip and patents. Then we wouldnt have to worry about write endurance, and also an even lower latency SoC in combination with the massive amount of L2 Cache Apple has

@uncrunch398 Год назад

Optane drives have failed to write endurance being exceeded. Being used as DRAM extensions IIRC. Its best placement is as a large swap space or cache for tiered storage to preserve endurance and power on time of other tiers. Intel stopped production / development and sold it due to it not selling well enough. The purchaser IIRC was a company primarily focused on memory. Enterprise and high end prosumer SSDs serve sufficiently where it fits best for a tiny fraction of the cost per cap.

@Teluric2 Год назад

Because Apple knows they have no chance in the HPC biz , Apple rules where the looks matter.

@georgeindestructible Год назад

The ventilation in these looks great.

@davelowinger7056 Год назад

You know I imagine the CPU of the future. It would be a CPU sandwich. With 4 to 64 firewire ports. first Northbridge. Now system memory

@thomaslechner1622 Год назад

What is the cinebench results, single and multi? That is all that counts at the end of the day....

@matsv201 Год назад

I use to work developing telecom servers that was ultra efficient. Just run on normal intel i series CPU:s. The one we had go down to 10W for the whole board with a full intel Xenon CPU if the memory was removed. With the memory they draw like 40 watts. (This was quite a while back, like sandy bridge era)

@benedicteich8697 4 месяца назад

Just got my hand one ES Version. Can‘t wait to run it..

@RR_360 Год назад

I would love to have one of those old servers in your studio.

@Veptis Год назад

Isn't that also the kind of Xeon where you pay to "unlock" some of the accelerators and frequency curve? Also it's not really a workstation part sadly. Intel is marketing their Xeons for workstation, while I want a GPU Max 1100 (PVC-56) as a workstation card. I got hopes for announcements next week. Intel is demoing it on InvelDecCloud and I had a chance to try it. I believe my workstation will still get a i9 14900K with custom look cooling (slight chance of tec)

@ServeTheHomeVideo Год назад

This is not unlockable from what I understand

@ted_van_loon Год назад

Ram in a APU would eventually also greatly reduce cost. HBM ofcource is expensive and such. but it might become normal to see APU's become the new general CPU's and make APU's more like SOC's, essentially it allows to add in many more features and ram in the cpu allows for much simpler and cheaper motherboards and such. meaning that ram integration in low end chips allows to make super cheap and power efficient chips(more normal memory modules). that said despite HBM being much more expensive on these high end systems it is great actually many years ago when HBM and HBM2 where still cheap to make(cheap enough to be used in mid tier gaming gpu's) I also recommended doing essentially the same using something like hbm directly in a cpu.

@gl.72637 Год назад

Is this comparable to the Nvidia Grace ARM based CPU with 144 cores that Linus tech tips showed 3 months back? Or just Intel trying to catch up? Would like to see a video about comparing the server against server.

@ServeTheHomeVideo Год назад

This has been in production and is being installed into the Aurora supercomputer which will likely be the #1 in the world in November. Grace Superchip you cannot buy yet (we covered it on the STH main site) despite the hype.

@El.Duder-ino Год назад

Enterprise and personal chips will continue to be even more tightly integrated and they'll mimic more a motherboard than chips we see today (also with the size). Just check Cerebras chip... memory system is still way behind the compute.

@whyjay9959 Год назад

Do you think DIMMs could disappear in favor of embedded DRAM and CXL memory?

@ServeTheHomeVideo Год назад

I think CXL memory in the PCIe Gen6 generation will have more bandwidth and be more interesting, but some applications will still like locally attached. More interesting is if there is optically attached memory.

@ted_van_loon Год назад

sleep states probably are a early version problem, since it likely has to do with the memory needing constant power. in the future with a motherboard which supports 2 seperate cpu voltages at the same time(based on pin groups) or if the cpu's have some added in logic then it should probably work. ofcource they might not have given it priority since honnestly a cpu like this right now makes most sense in a server. while it is also great for videoeditting and 3d moddeling and rendering and simulating, most such softwares likely don't support it well enough yet, and while good and well maintained FOSS software like blender might support it quite rapidly and quite well. many companies who have shown to be very slow and ignorant in adopting new tech like adobe(even though they seem to accept AI pretty well now), and things like solidworks which still don't understand modern computers have more than 1 cpu core.

@SilverKnightPCs Год назад

I just don't understand where in the current Marketplace it makes sense to buy xeons. You can buy a AMD Epyc with double the core count and half the power consumption and usually 3/4 the price

@billymania11 10 месяцев назад

Goes to show you there is more to these decisions than a PC benchmark. I can imagine it gets quite complex comparing all the features.

@gusatvoschiavon Год назад

I would love to have an arm CPU with hbm memory

@ServeTheHomeVideo Год назад

That is powering the former #1 supercomputer: www.servethehome.com/supercomputer-fugaku-by-fujitsu-and-riken-revealed-at-no-1/

@shadowarez1337 Год назад

Hmmm Nvidia should take a stack of that HBM2e for a new shield console. And they are sorta hybridizing the next consumer cpu with on-die ram like apple did with the M1-2 SoC's interesting times ahead I can get a frequency Tuned Epyc with enough cores and cache to build out a nice fast NAS.

@richfiles Год назад

I wish Apple would adopt this memory style for their Apple Silicon SoCs. No current Mac has upgradable memory. You buy the SoC configured with a memory capacity from the factory, and that's it... Sure would be nice to have off the factory floor fast RAM, and _user expandable_ memory expansion slots for future upgrades. I really am liking the direction Intel is going with these!

@billymania11 10 месяцев назад

Everybody thinks Apple is being stingy or playing games with RAM. Memory of that type can't be slotted. Because of timing and signal propagation, the LPDDR memory has to sit close to the CPU and be soldered. Which in a way leads to HBM memory. I think that will happen and Apple might do that in the consumer or PRO space at some point.

@richfiles 10 месяцев назад

@@billymania11 what are you even talking about. Numerous laptops and desktops have slotted RAM. Your high sleed RAM remains factory determined, as part of the SoC, and "slow" RAM can be slotted in at a later date by the user. Many computers have used Fast/Slow RAM configurations. Every modern computer already does this, to andegree, with Cache. This is merely adding one more later between. SoC fast RAM, and slower socketed RAM.

@billymania11 10 месяцев назад

Sure Rich, whatever you say.@@richfiles

@richfiles 10 месяцев назад

@@billymania11 i am literally describling what is inside laptops. _today..._ I work in a PC repair shop. I have been building and repairing computers most of my life. My first computer repair was in 1989. Look up how Cache memory works. Computers have had different amounts of different speed memory on and off die for decades. Most CPUs have at least 2 or 3 levels of cache memory, plus the external RAM accessed through the memory controller (also on die with modern CPUs). Some computers (mostly long ago) had both fast and slow RAM, accessed directly by the CPU for the fast RAM and through a memory controller for the slow RAM. The Amiga did this. Even many modern PCs can do this. If you have a matched pair of faster RAM modules in a pair of DIMM sockets on one channel, and a slower matched pair of RAM modules in the DIMM sockets of a separate memory channel, then many CPUs will be able to run each channel at it's best speed. There is no reason you can't have a high speed memory controller with some channels directed to on SoC chiplet RAM (HBM or HBM like), while _ALSO_ having some memory channels reserved for slower slotted RAM (either in SODIMM or the newly developed CAMM socket). There is literally no reason a computer manufacterer can't do this, particularly in lower factory memory configurations, where less high speed Factory installed chiplet SoC RAM is installed. You say "sure", like it's something unbelievable... I work on laptops every weekday. More have slotted RAM than don't, and some already solder some ram on board, and have a secondary slot for expansion. No reason you cant have some higher speed RAM on the SoC, as configured from thenfactory, and use other memory channels for slower socketed RAM. I'd LOVE to have sockets in my Mac Studio, so I could add to the already present 32GB of high speed RAM... But YES, Apple is being stingy, because they are profiting on people buying the RAM they _expect to use someday_ right now, while it's still expensive, rather than just buying the RAM they know they need to be high speed, and adding slower RAM in the future to aleviate usage for miscellaneous tasks, freeing up the high speed RAM for more intensive tasks.

@ravnodinson 5 месяцев назад

What kind of place would be using something like this and what would they be running on it? This kind of tech is fascinating to me and I don't even know what it's used for.

@ServeTheHomeVideo 5 месяцев назад

Often supercomputer clusters. See the new Intel Aurora supercomputer as an example.

@ravnodinson 5 месяцев назад

@@ServeTheHomeVideo It is amazing!! 2 billion billion calculations per second. One thing that interests me that was mentioned being done by Aurora was Dr's studying neurology and mapping out the brains neurological pathways. What does the program running that even look like and also that it needs such mind bending computational power? I know I'm in way over my head, but to me it's such awe inspiring work.

@matthiaslange392 Год назад

With the tiles it looks a little like the chip, that's pulled out of Schwarzeneggers head in Terminator 2. 😎

@kenzieduckmoo Год назад

So what I’m seeing here is apple complaining they couldn’t add ddr5 slots to the Mac Pro cause of unified memory was just their engineers not being allowed to do it

@ServeTheHomeVideo Год назад

Yes, but they also would need DDR5 controllers on chip

@tomstech4390 Год назад

Imagine if AMD started adding HBM2E or HBM3 (that samsung connection they have) onto their Epyc.. aswell as the 1152MB of L3 cache and the 96 fast cores.

@IamBananas007 Год назад

Mi300 APU

@tomstech4390 Год назад

@@IamBananas007 24 cores, but yeah fair point. :D

@post-leftluddite Год назад

Well, Phoronix published reviews of the Newest AMD Epycs inluding Bergamo and they literally destroyed even the HBM version Sapphire Rapids chips....so apparently AMD doesn't need HBM

@VideogamesAsArt Год назад

@@tomstech4390 their MI300C has no GPU cores at all and is 96 Zen4 cores with HBM, but it's unsure whether they will release it since there might be not enough demand for it since their V cache already gives them a lot of memory on-die

@GeoffSeeley Год назад

@2:23 Ah, so Intel isn't above "gluing" together chips like AMD eh? Ya Intel, we remember.

@ServeTheHomeVideo Год назад

You know I was sitting in the front row when that presentation was given in Oregon back in 2017

@billymania11 10 месяцев назад

Kind of a long time ago. Things can change in that length of time right Patrick?@@ServeTheHomeVideo

@Jerrec Год назад

HBM is the future. I wonder how long it takes until it reaches consumer CPU's. Though upgrading RAM wouldnt be possible then anymore.

@whyjay9959 Год назад

CXL could allow upgrading RAM then.

@Jerrec Год назад

@@whyjay9959 HBM2 has got a Bandwidth from 420 GB/sec. There is quite some way to go for PCIe to allow CXL Ram Expansion at that speed. PCIe7 x16 only manages 240 GB/sec. PCIe7 isnt even out yet, and HBM3 is already beginning rollout 2024 with a whopping 512GB/sec Bandwidth. Even the latency on the Bus would be way too high, even if the Bandwidth would be reached. With HBM memory expansions die out. CXL only helps for "slow" DDR5 and DDR6. The HBM standard even states that RAM must be on the processing logic die.

@whyjay9959 Год назад

@@Jerrec I think you mean bytes? Found a chart showing 128 gigabytes per second for PCIe gen6 x16. But sure, it's all a tradeoff. CPU-integrated chiplets get inherent performance advantages from having the shortest simplest connections but cannot be changed, so they will probably continue to be combined with slower, more flexible types of memory as preferred.

@Jerrec Год назад

@@whyjay9959 I dont get your point. You are right with PCIe 6 x16 and 128 gigabytes. I wrote about PCIe 7 that comes out 2025. You are right, I mean Bytes. Sorry about that. If I can correct it, I will. Anyways, not considering HBM3 in 2025, it means HBM2 runs on 25% or maybe 50% speed. Thats not a tradeoff, that is ... unusable for such a memory.

@LaserFur Год назад

I wonder how long it will be before the system boots up with just the cache and then a ACPI message tells the OS when the main memory is online. This would help with the long DDR5 training time.

@bradley3549 Год назад

Something like that would be valuable in the consumer market I reckon. Servers are already notorious for long boot times so I don't think there is a lot of incentive at the moment to enable a fast boot.

@PingPong-em5pg Год назад

"HBM memory" resolves to "High Bandwidth Memory memory" ;)

@realpainediaz7473 Год назад

good catch 😆

@shanent5793 Год назад

What is so difficult about the integration that Intel does but AMD does not? Why is this harder to do than AMD Instinct HBM or Versal HBM? If HBM is used as cache how many sets does it support and how long does it take to search 16GB of cache for a hit?

@lukas_ls Год назад

It’s "3D" Stacking, that makes it much more expensive. It’s similar to HBM Packaging (but still different) and not just a couple of Dies glued together on the same package. AMD could so it but they want lower costs. AMD uses these packaging techniques but not in Ryzen/EPYC CPUs

@ThelemaHQ Год назад

i still wait Xeon comeback, ive been stick with xeon till GOLD 6140, before switch to red team EPYC 7742 dual

@TheAnoniemo Год назад

Can't wait for ASRock to create a mini-ITX board for this and just have no DDR5 slots.

@matthiaslange392 Год назад

This Xeon's will Serve The Home - all homes at once 😎 But who needs this power? Usually the storage is the slowest part of a system and you better invest in faster storage than in faster CPUs. Most of the time several cores are idling. But i'm sure there are some strange physics-simulstions as a usecase... simulating earthquakes, weather or nuclear fusion... or simply having the fastest minecraft-server of all 😉

@simonhazel1636 Год назад

Question on video quality, everything looks fine except Patrick's face is super red, but everything else looks fine, and pictures in the video of Patrick looks fine.

@simonhazel1636 Год назад

Just to note it's only on the 4k youtube setting, if I bump it down to 1440p or 1080p, the issue disapears

@noth606 3 месяца назад

We are getting more bandwidth using on carrier mounted direct attached HBM than with off carrier DDR5 somewhere on the board - crazy! *** Ehm no, if you DIDN'T - it would be crazy, and not just that, it would be a failed design that should just be binned and never released, since anyone buying it would be certifiable and should be put in a padded room for their own and others safety... It is a cool design, following the roadmap/wishlist/path to glory that was laid out many years ago of which Optane is also part. Intel has been at it quite some time, they have explained it and reiterated it many times for those who stop and listen. They are building systems to allow for multiple tiers of storage where you'd have L1 - L2 - HBM2 - RAM - OPTANE - SSD cache - SSD - HDD sort of, all in one system where data is promoted or demoted between the tiers depending on how *close* to the actual CPU cores it needs to be since closer = faster. It's like a Guru and disciples in a sense, with the disciples being the memory tiers sitting in ever larger circles around the Guru dispensing the wisdom and word of god. The closer one sits, the easier and clearer it is to hear and understand what the Guru says.

@KiraSlith 9 месяцев назад

I'm usually an Intel hater, but man Threadripper and the GPU mining boom destroying Phi completely messed up superscaler development, and AMD just never filled that market niche back out again for Epyc (they had this nice locked down with Opteron), so there was just a pit there ARM was slowly trickling into like groundwater. Database hosting apps really needed these bulk core chips ages ago, but it's good we're at least getting something comparable now in the form of Xeon Max.

@majstealth Год назад

damn these 2 cpus alone have half the ram each of my esx have - wow

@maou5025 9 месяцев назад

Can you do some gaming benchmark with HBM only? To see infinite money performance lol.

@CyberdriveAutomotive Год назад

I like how Intel made fun of AMD for using chiplets, saying they're "glued together" and now they're doing it lol

@ServeTheHomeVideo Год назад

You have to remember I was one of the people in the room when that presentation was made (and we had EPYC 7601 already in the lab at the time)

@exorsuschreudenschadenfreude Год назад

sick bro

@ZanderSwart Год назад

as a Xeon 2650v2 daddy this makes me proud

@Clobercow1 Год назад

I'm curious how well this thing can run Factorio. It might set records. That game needs hella cache and memory bandwidth / latency.

@berndeckenfels Год назад

Not running ddr5 to save on cooling sounds not very realistic - Who would want to run 100 cores with no additional memory

@velo1337 11 месяцев назад

are you doing a follow up with this cpu?

@ServeTheHomeVideo 11 месяцев назад

We will have a video with Xeon Max in it later this week.

@velo1337 11 месяцев назад

@@ServeTheHomeVideo would be nice to get some superpi, cpuz, 7zip and geekbench benchmarks for the 9480

@aarcaneorg Год назад

I called it! Less than a year after I asked when we would be able to boot servers without even needing to add RAM right here on one of your videos, and here we are! Somebody saw my comment and made it happen!

@bradley3549 Год назад

Hate to burst your bubble, but the CPU design timeline is such that they would have been actively working on this CPU design for *years* prior to review samples being available.

@aarcaneorg Год назад

@@bradley3549 be that as it may, a lot of things, like the ability to use the onboard cache like system ram, are minor revisions that can be made in firmware or opcodes, the kind of tweaks that can happen at the end. The extra cache was planned for years. Booting from it was my idea.

@bradley3549 Год назад

@@aarcaneorg You're definitely not the first to think of integrating ram and CPU and then booting from it. That's been a feature of CPUs for a LONG time. Just not x86 CPUs. Sorry.

@MNGermann Год назад

“I will use this photo that I took at Intel event and I look awesome “ :) :P

@alastor2010 Год назад

Isn’t using HBM to cache DDR5 just like using DRAM to cache DRAM?

@ServeTheHomeVideo Год назад

In a way, yes. But think of it more as caching slower/ higher latency/ higher trace power far DRAM to faster/ lower latency/ lower trace power close HBM. There is a big difference between access over a few mm on package and going out of the package, through the socket, through the motherboard, through the DDR5 socket, onto the DDR5 module and so forth.

@lamhkak47 Год назад

Is it possible to apply such design to GPU? A bit like HBCC for AMD but you can install DIMM modules on the GPU to give extra RAM for various purpose, such as running large AI models, running heavily modded KSP and try novice shitty program that memory leaks for no reason.

@IBM29 9 месяцев назад

I wonder how long it takes to amortize engineering / development / fab setup at $13,000 each...

@ServeTheHomeVideo 9 месяцев назад

Also how much is shared with the standard SPR parts since a lot of the difference is in packaging

@pete3897 Год назад

115 pounds?! Wow, that's really cheap ;-)

@scarecrow5848 Год назад

3:14 "thats why intel does it but AMD doesnt." Wrong, AMD started doing chiplets back in 2015 with the... uh... i forget the name of it. Ill edit the comment lol. It was one of their GPU's. And also starting in 2023 with their 7000 series GPU'S theyve gone back to doing chiplets. Still no HBM to replace VRAM entirely yet but its still a chiplet design for the core.

@billymania11 10 месяцев назад

We have to give credit to AMD on this one, regarding chiplets. The danger though is the approach AMD chose to implement chiplets. Later designs like Intel's might be superior in a range of functions not initially considered. I do expect a pendulum swing in favor of Intel as their approach gets validated.

@certifiedbruhmomento Год назад

Can it run Crysis tho?

@nikolausluhrs Год назад

Many people dont do any tweaking because corprate IT doesnt allow us to. Also support agreements

@Tyranix97 Год назад

Yeah , but how well does it game?

@lordbacon4972 Год назад

Actually i was wondering if Intel Xeon Max would be a good gaming CPU?

@uncrunch398 Год назад

No sleep states are needed for any platform, though preferred when running on battery. A workstation or gaming PC benefits from disabling them. Except for power choking of unused cores to boost those heavily used. Or cooling is insufficient, so sleep states are needed to help for that. Lacking them is not a reason for not using the same CPU as in this video for those workloads. What is always relevant is the performance per cost. Or just performance if cost doesn't matter.

@czolus Год назад

So, like the now-defunct Xeon Phi?

@SP-ny1fk Год назад

Yeah yeah yeah but when can I expect this in my homelab? lol

@shoobidyboop8634 Год назад

When will this be available for desktop PCs?

@jacquesb5248 Год назад

can it run crysis?

@uncrunch398 Год назад

I foresee people trying this with everything they'd ever do, at least within 64GB DRAM, without DRAM.

@hgbugalou Год назад

This is the future. Its inevitable all ram will be on the CPU.

@stephenkamenar Год назад

you keep talking about memory bandwidth. but what about latency? i think this would help applications that need low latency memory a lot more. ddr ram latency is sooooooo slowwwwwwwww. bandwidth is fine

@m5a1stuart83 Год назад

But how long does it take to compile C++ Project?

@sykoteddy Год назад

I just find it hilarious that the dies looks like the windows logotype, no I'm not a Windows or Microsoft fanboy, rather the opposite.

@AlexandruVoda Год назад

Well, that is certainly a chip that will not serve the home, but is very cool however.

@gh975223 Год назад

why would i care about sleep states on a workstation? the cpu should never go to sleep!

@SB-qm5wg Год назад

115lbs in a 2U. That's a thick boi 💪

@concinnus Год назад

Seriously. And it's not even water cooled! IME, 115# would be ~5U. 2U was ~60#.

@miigon9117 Год назад

I think "without ramstick" is a better title than "without DDR5"

@JohnKhalil Год назад

First official Windows cpu!

@lifefromscratch2818 Год назад

Why would you want to use it without RAM sticks?

@UnreasonableSteve Год назад

If your workload fits in the on-chip memory, why bother with additional ram?

@ewenchan1239 Год назад

It is AMAZING to me that for CFD, the AMD Genoa X still TROUNCES this Xeon Platinum Max, even despite the HBM2e memory at nearly DOUBLE the Xeon Platinum 8490H baseline, whereas the Max, running in caching mode, actually performs WORSE than the baseline, and with the HBM2e only, it performs slightly better, but nowhere CLOSE to what the Genoa X is able to do. That, for me, is a much better marketing slide for AMD than it is for the Xeon Platinum Max.

@billymania11 10 месяцев назад

LOL! If you say so.

@ewenchan1239 10 месяцев назад

@@billymania11 The data shows that. Watch the video.