Intel Xeon D's Go-FAST Feature

ServeTheHome

Подписаться 757 тыс.

Просмотров 54 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

30 окт 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 89

@JeffGeerling 2 года назад

At some point people will stop caring about CPU cores, and they'll build out a 5,000 core chip where there's a core that accelerates every single application that runs on the platform ('Minecraft core', 'ffmpeg core', 'MS Word core', 'Torrent core'). There will just be one CPU core to coordinate all the app cores. Heck, Apple's kind of down that path with how they use like 6 or 8 different specific cores for neural, ISP, prores, etc.

@ServeTheHomeVideo 2 года назад

100% accelerators will continue to be a bit deal. The desktop Meteor Lake next year will start to take folks down that path since, in theory, you can have a different HP and Dell Core chip.

@JeffGeerling 2 года назад

@@ServeTheHomeVideo I just hope we don't end up with a thousand proprietary cores that can't really be used unless you're locked into one specific OS/driver layout. I don't think we'll have enough people willing to do the insane feats of reverse engineering the Asahi Linux group is doing on the M-series processors.

@jfkastner 2 года назад

@@JeffGeerling probably not - developing a new core - and getting the silicon made in quantity, running w/o too many 'erratas' - is much more expensive than just replicating an existing design where you increase the # of cores

@MrHav1k 2 года назад

I got a good chuckle out of that, and what you describe may be a bit extreme, but the general concept is correct and probably where we're headed.

@rkan2 2 года назад

I wonder what this would look like for some games I play :P Could I get game specific cores for my Flight Simulator which really likes that one thread... Maybe someone figured out a way to offload it to some FPGA already :D

@jfkastner 2 года назад

Interesting, thank you! Ultimately it all boils down to code - many devs are happy that their creation runs w/o problems - even if you give them the compiler updates for QAT they might not use them, validating code and exchangeability with other platforms will hold them back

@ServeTheHomeVideo 2 года назад

Totally, but it is also a chicken and egg problem. Without access to accelerators, people do not develop for them.

@jfkastner 2 года назад

@@ServeTheHomeVideo True, I wish they could all get together AMD, Intel, ARM, IBM and create a common accelerator concept

@AlexSchendel 2 года назад

Awesome to see this video! Very excited to see SPR ship! Also, very minor nit: your TLS throughput graph for perf per thread says QAT hardware at 18T rather than 16T. I just started working as a firmware engineer at Intel for OpenBMC (feels like I just started but it's been over a year sheesh) and it's definitely exciting to see the platform we worked on getting into the hands of reviewers and customers haha.

@christopherjackson2157 2 года назад

I was starting to question the existance of sapphire rapids lol Good work

@ServeTheHomeVideo 2 года назад

We published more on them earlier this week. That part got shoehorned into this video

@christopherjackson2157 2 года назад

@@ServeTheHomeVideo Ill have to remember to check the website more often :)

@typeer 2 года назад

it feels like 10 years from now we're going to have a dedicated accelerator for just about every workload

@owlmostdead9492 2 года назад

Well just imagine we would still be trying to play games on a CPU..

@Iamdebug 2 года назад

This is the theory of the apple M1 chip. It's definitely headed that direction.

@sailorbob74133 Год назад

AMD phoenix will have a neural accelerator

@ocelotmadness6287 Год назад

@@Iamdebug the m1 chip has less specialization not more

@BobHannent 2 года назад

AMD should be building a response to QAT which is a chiplet that they can place next to the IO Die, even on embedded chipsets. Possibly even a CCD which replaces a normal CPU CCD, perhaps creating an asymmetric design with one CCD focused on performance and the other on acceleration.

@ServeTheHomeVideo 2 года назад

Maybe Pensando?

@bad__syntax 2 года назад

Diggin the new set! The channel growth continues to impress.

@ServeTheHomeVideo 2 года назад

Thank you Ryan!

@8bit-meiko 2 года назад

the 2% performance deficit is what i saw with my ryzen 1700x back in the day when i was testing with cinebench r15 multithread, in bios setting it my ryzen 1700x as 4+0, or 2+2. 2-3% difference to measure the fabric inpact. edit: i never really posted on it, cuz i thought it was too little of a margin of error at the time, and was too lazy to do more than 3 runs for each of the 2 config types i did.

@ServeTheHomeVideo 2 года назад

On the networking side, it can get higher. We were at ~30Gbps. At 100Gbps where it is closer to a PCIe Gen3 x16 speed, that figure goes up.

@8bit-meiko 2 года назад

@@ServeTheHomeVideo that's quite aperformance downgrade, but then again, thats a usecase that taxes the fabric more, so introduces a bottleneck there. its interesting to test this kind of stuff nonetheless! :D

@jaffarbh Год назад

One thing that could be a catch! For any new instructions, virtualisation support is crucial. If the hypervisor doesn’t support the new instructions, or the overhead is too great, this would defeat the purpose. Granted, this isn’t an issue for physical appliances but many appliances these days are virtual.

@jeekie22 2 года назад

Great video! Is it a lot more efficient power wise to use the QAT accelerator vs more cores?

@ServeTheHomeVideo 2 года назад

Massively, especially on bigger chips. It can be hundreds of watts per server difference on Sapphire while also getting better performance

@gowinfanless Год назад

Excting!!I expect for the Xeon D2700 for such a long time,and the idea is to make an AI server with dual D2700 +1TB Ram which will be strong enough!

@emf7301 2 года назад

Sounds like these new Intel SR chips with on die QAT will be great for high density front-end web or proxy servers. Will be interesting to see how general performance stacks up with current and next gen Epyc, and if AMD decides to add similar hardware function to io die.

@gummy1204 2 года назад

1:36 We've come full circle ladies and gents. We're back to the flappy doodles we had on the P4 coolers

@ServeTheHomeVideo 2 года назад

The new chips use so much more power. Wait until we can show you Genoa in about two weeks

@JohnAngelmo 2 года назад

I the Intel QuickAssist Adapter 8970 the latest adapter? And is there any new coming out on the market soon?

@ServeTheHomeVideo 2 года назад

Hi John. Thank you for being a member! That is the latest. The 8970 is the faster x16 adapter. The 8960 is the slower x8 adapter that sometimes I flash on screen in these videos. I am not sure if there are going to be new adapters since Xeon D has it built-in, and Sapphire Rapids will have a much faster version built-in. Top-end SKUs are >2x what the 8970 can do in some of our tests. A few reference pieces for you: - Intel QAT Cards by Generation: www.servethehome.com/intel-quickassist-parts-and-cards-by-qat-generation/ - This week's Sapphire Rapids AMX and QAT performance: www.servethehome.com/hands-on-with-intel-sapphire-rapids-xeon-accelerators-qct/

@JasonsLabVideos 2 года назад

HEY! thats not fair, you changed your shirt to many times in this video :). Good video Patrick !

@ServeTheHomeVideo 2 года назад

And more camera angles!

@JasonsLabVideos 2 года назад

@@ServeTheHomeVideo Good ones too!

@jfkastner 2 года назад

he cut off western canada & alaska

@JasonsLabVideos 2 года назад

@@jfkastner HAH !

@NVMDSTEvil 2 года назад

Epyc 3451 is essentially just a low clocked Threadripper 1950x. There is a Zen 2 variant of the low power Epyc series (7D12, there are retail versions of it), curious why they didnt suggest using it? Does the motherboard support setting the memory into "channel" or "die" interleave modes? This forces NUMA on the Zen1/1.5 platform which can offer better performance in some applications/operating systems that have trouble controlling NUMA associations by themselves (also Zen 1/1.5 have a bit of an issue on this by themselves so the application/OS is not always at fault). Both modes need to tested as the which one will work best depends on the CPU and memory loadout. For a 1950x for example "channel" is the correct mode, while a 2990wx needs to use "die" mode.

@ServeTheHomeVideo 2 года назад

The 7D12 is not a soldered chip and is physically *much* larger. In many embedded applications, those are completely different segments. On the NUMA side, the reason this took days to do was partially the process of going through different thread placements and core configurations. This was not just a "run once and see how it goes" type of exercise. One number may be presented, but a lot more work goes on behind the scenes.

@NVMDSTEvil 2 года назад

@@ServeTheHomeVideo Setting thread association at the OS level isnt good enough for Zen 1/1.5 cpu's, they will often send the data to the wrong memory channel if the correct NUMA mode is not forced at the BIOS level. Not knocking anything you're doing, its all good work and I know how much testing you're doing as i've spent several months playing with my 1700x, 1950x and 2990wx learning what makes them perform best for what I use them for and that the NUMA bug is further down than most software i've tested seems to be able to handle. All that said the Epyc zen1/1.5's might handle a little differently and the software you're testing might not be affected by the problem at all, i'm just concerned it is affected is all. Dont take my comments too hard, zen 1/1.5 is old as we both know, so really it does need a replacement from a newer architecture for the soldered to the board segment.

@magneticshrimp7429 2 года назад

Unless the QAT software ecosystem has improved drastically since I last looked at it a few years ago I dont see QAT really happening outside of niches like SAN vendors that have big dev teams. If they dont integrate QAT with say the mainline linux kernel network stack (say for ktls or ipsec) it will see very limited uptake. No normies are going to run IPSEC under some weird DPDK-setup.

@ServeTheHomeVideo 2 года назад

Later in this video, we showed this is going into mainstream Xeon's starting with Sapphire Rapids. That will fix the chicken-and-egg problem with software adoption.

@ennio5763 2 года назад

Is ISA-L delivering the same compression ratio performance for the large speed increase it delivers purely using software ?

@will891410 2 года назад

I have some 20 Xeons i bought from China, they are pretty good for the money.

@robeckel4965 2 года назад

It's been 9 months since Ice Lake Xeon D was released, and I still can't buy one. Supermicro lists them as "Coming soon". Asrock lists them as "Preliminary". I don't know of a retailer that's selling them. Seems like vaporware.

@_Steven_S 2 года назад

Not just the Ice Lake D's. I've had a X11SDV-8C-TP8F on backorder for the last 3 months, after waiting from around November last year for them to become backorderable 😕

@brianpark4039 2 года назад

Why no performance number for AMD + QAT external card?

@ServeTheHomeVideo 2 года назад

The point of this was built-in integration where you do not need add-in cards. That is a big deal for the embedded market that these parts are targeted at. We already added a 100GbE NIC for the EPYC because the onboard networking was too slow. If we did an additional QAT card for the AMD EPYC 3451, then we have two cards extra for AMD. Do we then need an add-in QAT card for the Xeon D since we added one for AMD?

@brianpark4039 2 года назад

@@ServeTheHomeVideo It would be an intereseting comparison as far as performance comparison is concerned. There was another video about QAT on the server platform which did not show benchmark for QAT external card on AMD platform. Seems like someone is trying hard not to show the full picture. With that said, I generally like your videos/articles/insights. It just seemed odd that the external card was show but not used for benchmark.

@LordApophis100 Год назад

But where are they, almost a year after the announcement I couldn't find them anywhere.

@CRCinAU 2 года назад

If you're doing 30+Gbps of IPSEC traffic, then you're not going to be a home user in the first place :) You'll be able to pick and choose your platform. None of this really matters at the home...

@wskinnyodden 2 года назад

You realize that these particular EPYC cores are Zen1 or best case scenario Zen1+ right?

@ServeTheHomeVideo 2 года назад

100%. STH was the first site to review the EPYC 3000 series and I was one of the few folks at the launch in London. We have reviewed several platforms based on them too. That is why I asked AMD if there was a replacement before doing this

@__--JY-Moe--__ 2 года назад

thanks 4 the info! great presentation! have a great day bruh!

@ServeTheHomeVideo 2 года назад

Thank you. You too!

@camerong4944 2 года назад

Mainframe has had this since 2012 right on the CPU.

@cromefire_ 2 года назад

"it's like with aes-ni [...] today everybody just assumes it's there" Cries in cheap arm NAS CPUs that are basically useless for encryption, because the don't have AES-NI. Seriously though, Synology still maintains a feature table that features AES-NI. You might not think about it, but if you buy one of those without, you have use everything unencrypted... Which is of course, really bad, even for a consumer deployment.

@skaltura 2 года назад

AMD really needs something similar by the sounds of it -- then again, can't you just add QAT card to a AMD system?

@ServeTheHomeVideo 2 года назад

The point is more that this is integrated, not an add-in PCIe card. In the embedded market, there are very limited PCIe slots and often smaller form factors.

@peelthebananna9827 2 года назад

What are your thoughts on nvidia just rechipping the a series for the new chip?

@ServeTheHomeVideo 2 года назад

Which new chip? Hopper has not been announced beyond the H100. Ada Lovelace parts will start with the L40 in the data center. These are more for compression/ crypto so they are for a different segment.

@peelthebananna9827 2 года назад

@@ServeTheHomeVideo I have not been staying up to date on the chip names but the same chip in the 4090. Pny revealed what the new quadros will be. The exact same as the amphere with the ada Lovelace chip

@ServeTheHomeVideo 2 года назад

Yes, the RTX 6000 Ada and the L40 have the same architecture as the 4090. That is very common for the old "Quadro" line and the newer professional cards.

@jannegrey 2 года назад

I do hope that AMD makes something with Pensando, because it's ridiculous that you have to put intel NIC to "even the odds" just a little. AFAIK Genoa should be coming this quarter (though to what markets it's hard to say) and it does have "some kind of" accelerators. But more accelerators are coming on AMD with Turin only. Whether something equivalent to QAT will be available - IDK. I leave it to experts like you. That is one of the reason why AMD despite their advantages in performance has like 10-15% of server market share (I think, I don't know which source is credible honestly). Because Intel has most of the ecosystem already in place. Even though SP is delayed, people are still waiting on it. Really missed opportunity by AMD, though IDK how fast they could have built their own ecosystem, given that their financial problems were mostly gone maybe 18 months ago. So they were careful with spending. But this really hurts their market share. I hope that you will soon be able to review Genoa. I'd like both companies to be competitive and have approximately even market share - that would give them money for R&D and they could innovate.

@ServeTheHomeVideo 2 года назад

Multiple Genoa CPUs were within 3m of me when I had the yellow shirt on in this one

@jannegrey 2 года назад

@@ServeTheHomeVideo That is at least some good news. One would hope that they will also have something for this particular sector of the market, because being forced to use first Zen EPYC's without there being something to replace them is a lost opportunity. I do realize that AMD has to be careful with how much capacity they order, so they don't lose money, but I think they could be a little more flexible now - given that they will use N6 and N5 nodes for all of their stuff.

@YouCanHasAccount 2 года назад

At work we run 100Gbit/s of AES on EPYC Rome CPUs with plenty of capacity to spare. I really don't see the point of this crypto accelerator. AES-NI is plenty fast for almost any workload. If you need to go 400GbE it makes more sense to run crypto on the network card like with Mellanox ConnectX 6, to avoid doing extra data copies through the CPU L3 cache.

@Mr.Leeroy 2 года назад

x86 switches? Throughput per watt?

@vamwolf 2 года назад

Edge lvl data for crypto. Like a small town is building etc used cases

@YouCanHasAccount 2 года назад

@@Mr.Leeroy CDN nodes. Probably around 250W to push 100Gbit/s HTTPS traffic but that's previous gen Zen. With an offloading NIC you can get much better figures. Check Netflix's IBC talk about how they build their 400G and 800G nodes.

@Mr.Leeroy 2 года назад

@@YouCanHasAccount No, I mean these might be the cases where it makes sense.

@AD34534 2 года назад

Heterogeneous computing is the future.

@ServeTheHomeVideo 2 года назад

Yes!

@forrestnorrod1547 2 года назад

@@ServeTheHomeVideo Agreed, but one downside of putting workload specific accelerators into the CPUs themselves is the burden it adds to the chips that aren't used for those workloads. That's particularly true for features added to every core - you had better be darn sure that those transistors are going to get enabled and used broadly as the area will take away from the number of cores, which are widely useful. AMD cores are generally less silicon area than Intel's for a given level of performance. That area/performance advantage, along with the chiplet approach, is what has enabled AMD to deliver much higher core count server CPUs.

@ServeTheHomeVideo 2 года назад

@@forrestnorrod1547 Our team cannot wait to show your new gear as well Forrest. Hopefully we can show Pensando one day too! Have a great day.

@ashtonlipscomb1295 2 года назад

wow that was impresive

@synaptichorizons 2 года назад

What vendors will be providing the least expensive way to get into two socket Sapphire Rapids Servers running Windows 2019-2023 Server OS. Was hoping to get one from Dell but my Dell Server technical sales guy doesn’t think any thing new will be happening till 2024! I need AI Server Acceleration now. PCIe Gen5 RAM DDR5 memory at 😢44Mhz

@ServeTheHomeVideo Год назад

Check back in ~2 weeks. We will have a Dell SPR review after the SPR launch next week. That embargo is a few days after the Intel SPR launch.

@bzmrgonz 2 года назад

Am I the only one who feels like intel dropped some coin for this video???

@forbiddenera 2 года назад

Calgary Corpus? *Calgarian wondering wtf*

@6581punk 2 года назад

This is the problem with Intel, the keep adding more and more facilities, instructions etc.. to try to speed up their chips. Not strip them down and make the core instructions and architecture more efficient.

@ServeTheHomeVideo 2 года назад

Zen 4 adds a lot of previously Intel instructions

@AlexSchendel 2 года назад

It's the game that Intel has been playing for years and it's where the industry is heading. People want to keep seeing performance increase exponentially, but as we're hitting the limits of transistor and power scaling, the only way to keep improving performance and efficiency is through dedicated hardware. From your logic, we would be encoding videos in software and rendering games on our CPUs because adding more accelerators and instructions is bad. Clearly, it's a flawed argument. I think that as we move forward, accelerators will only become more important to improving performance and efficiency.

@stefanl5183 2 года назад

"Not strip them down and make the core instructions and architecture more efficient." Did you forget IA64? Intel tried that. They realized x86 had been extended way too far and it was time to make a break from it and start with a fresh architecture. But the public rejected it. Then AMD comes along and extends x86 yet again, even further adding 64 bit extensions to already overloaded architecture. So, Intel had no choice but follow along. You can't blame Intel for this. You have to blame the public in general for demanding backward compatibility and to a lesser extent AMD for adding the 64 bit extensions and thus continuing to keep it alive. Intel's plan was to kill off x86 and move us to IA64 instead. The P4 was going to be the end of the line.