Intel QuickAssist is a "Cheat Code" for Server Performance

ServeTheHome

Подписаться 757 тыс.

Просмотров 34 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

30 окт 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 133

@christopherjackson2157 2 года назад

I always assumed something like this would be put on the nic instead of as a separate card. Neat video!

@ServeTheHomeVideo 2 года назад

That is a wise observation.

@geekinasuit8333 2 года назад

You are correct, so-called smart nics will include different accelerators, but the best will be re-programmable ones such as FPGA based designs where you can load on different accelerators that you actually require, including bug fixes and upgrades.

@MatthewGP 2 года назад

Patrick, I love it when you do such awesome stories on niche products. The reviews of a $75 device one day and $150,000 device two days later really make me enjoy your website and RU-vid channel. I wonder if this would be a dream job for me...

@ServeTheHomeVideo 2 года назад

We will probably be adding an 18th team member in Q4. Shoot me a note if you are serious and what you are looking to do and if you are thinking part time or more full time. We usually have folks start part-time to see how they actually like reviewing hardware.

@xgeko2 2 года назад

Thank you so much for this video!! I started diving into qat a few month’s ago and learned the hard way about the support for the different generation of cards lol!

@ServeTheHomeVideo 2 года назад

There was even a difference in the like v1.5 and v1.6 that mattered when we did it in 2016-2017.

@asoto1516 2 года назад

Loved this, fantastic presentation of the power of accelerators in hardware. Great work!

@ServeTheHomeVideo 2 года назад

Thank you. Glad you liked it.

@BrianG61UK 2 года назад

@@ServeTheHomeVideo Too much waffle at the beginning for me.

@jolness1 2 года назад

I love when Patrick welcomes me. It never fails to make me smile for some reason. His enthusiasm is infectious.

@AdrianSandu_d 2 года назад

The intro always makes me think "QUICK, Somebody call a doctor. I think he's gonna have a stroke..."

@ServeTheHomeVideo 2 года назад

Ha! I have to record before I have coffee in the morning or very late at night just to tone it down to where it is today.

@babugowda1683 2 года назад

The perfect tech channel i ever needed. Loved your way of presentation 😍

@ServeTheHomeVideo 2 года назад

Thanks a ton. Have a great day.

@prashanthb6521 2 года назад

Thanks for being honest about sponsorship. It increases your credibility.

@AlpineTheHusky 2 года назад

People asking why intel is so popular in the server space when AMD is "Just better". Well I think this is a good point

@BenKistner 2 года назад

Very informative video! I had no idea that Intel made a QAT PCIe card. Thanks!!

@ServeTheHomeVideo 2 года назад

Super fun is that these are PCHs on the PCIe card

@MasonzeroDigitalWorks 2 года назад

Glad you enjoyed your time in Hillsboro. I live here with my wife and she works on the Jones Farm campus :)

@ServeTheHomeVideo 2 года назад

Awesome! I was there during the heat wave at the end of June for this.

@jamescrichton6163 2 года назад

Fantastic job! Top info - beats all the documentation and marketing blurb. Looking forward to the next video.

@1idd0kun 2 года назад

But, can't you use a QuickAssist add-in card on a AMD system? How does that compare to the native QAT from Xeon CPUs?

@ServeTheHomeVideo 2 года назад

AMD's solution is Xilinx/ Pensando.

@VTOLfreak 2 года назад

You can. I just tested a QAT add-in card on an AMD system using SQL Server 2022 RC0 which can offload backup compression with QAT. Works like a charm. No idea how much faster or slower the CPU version of QAT is since I don't have access to that kind of hardware. I also have allot of questions on how this is going to work with a hypervisor sitting in between. I don't want to pass the QAT device to just one VM, I want all VM's to be able to use QAT acceleration. Does the CPU version show up as a discrete PCIe device or is it more like a instruction set extension?

@susanparr1006 2 года назад

@@ServeTheHomeVideo the question was not about AMD hardware, but rather about using the Intel QAT hardware on AMD hardware, which is a supported modality. It is a shame that STH didn't think of that...

@ahinson 2 года назад

Let's see the AMD Xilinx/Pensando version of this next then!

@ServeTheHomeVideo 2 года назад

Yes, Soni said we would do the Pensando one soon when I spoke with her last week. It has been hard to get cards but it is on the plan.

@ytdlgandalf 2 года назад

what does this mean for latency, interrupt budget, DMA, etc? These are valid benchmarks, but what happens in total system testing, do you really free up resources which can be used without immediately hitting a next bottleneck, for example interrupts bogging down some subsystems of the platform?

@jfkastner 2 года назад

well researched, nice video, thank you!

@ServeTheHomeVideo 2 года назад

Thanks Jay

@pkt1213 2 года назад

Can you imagine if Patrick drank coffee... 🤣

@jannegrey 2 года назад

That actually got laugh out of me.

@anthonykenneally9256 2 года назад

Great presentation,excellently informed

@ServeTheHomeVideo 2 года назад

Glad it was helpful!

@ajhieb 2 года назад

Watched video. Immediately checked for TrueNAS support. Looks like I won't have much use for this until IX Systems enables support for it.

@ServeTheHomeVideo 2 года назад

It is more likely to happen with TrueNAS Scale since that is Linux based.

@ajhieb 2 года назад

@@ServeTheHomeVideo I would be okay with that. (Especially if it could get Scale up to the same performance as Core)

@alfred4683 Год назад

Really apreciate the effort, Sir! .. One question, could 8960/8970 card be used for many VMs? Or it could only be used for 1 instance?

@InIMoeK 2 года назад

Hey Patrick, what about latency? How much does it add, cause it has to travel multiple times over the PCIE bus

@AfroJewelz 2 года назад

how does qat help with my homelab that behind firewall ,idon't actually need SSL?

@samuelschwager 2 года назад

After seeing that rat's nest I do not feel so bad about my cabling anymore. If it works it works!

@deimosian 2 года назад

cable management for a bench test that's just going to get immediately torn back down is just a waste of time

@RolandoMartins 2 года назад

Cool video! Any chance of the Intel QAT 8970 Card 3 working with pfsense?

@ServeTheHomeVideo 2 года назад

Check out our Netgate 4100 review. pfSense Plus can use QAT. Opnsense also supports QAT.

@jiechenzhao2034 2 года назад

Thanks for your great video! Just to make sure --- this card is attached on PCIe right? Can instruction trigger accelerator ops on this card? How to use it? As a I/O call, or through instructions?

@Darkk6969 2 года назад

I've been wondering about the QAT feature in pfsense as my status screen is showing it as "NO". Thank you for explaining it in detail.

@amrrahmy123 2 года назад

is the acceleration used when using Java or C# or nginx with their main libraries out of the box, or did you implement specific intel dependencies/libraries to take advantage of quickassist?

@jaffarbh Год назад

Interestingly, Intel has a more "mainstream" acceleration called Quick Sync for video encoding and decoding. When it works (i.e. on supported video codec), it makes a huge difference. AMD seems to completely neglect this market for some reason.

@aliancemd Год назад

It doesn’t neglect it. They have AES-NI but he chose to compare Intel with hardware acceleration against AMD without AES-NI(hardware acceleration).

@jaffarbh Год назад

@@aliancemd Interesting. I assumed that "Quick Assist" was NOT part of the AES-NI instructions set. Worthing researching

@aliancemd Год назад

@@jaffarbh It is not part of AES-NI but it is a competing technology in the workflow that was presented in the video. Also, the video uses CPU + Dedicated QAT PCI hardware to compare against AMD without using AES-NI.

@jaffarbh Год назад

@@aliancemd Fair point. Intel has already embedded QAT into the latest Xeons. The real question is whether equivalent QAT accelertion exists in AMD processors. In any case, this is a specialist market and not something eveyone needs. May be AMD doesn't see the need to dedicate silicon for it.

@Summanis 2 года назад

I am very much not an expert in these things. With QAT's compression acceleration, could such hardware be used to accelerate disk access in a desktop environment? I'm not necessarily asking from a practical standpoint, merely wondering if we might see something like it in a future chipset/CPU so they can advertise that your SSD will be XX% larger or faster since it would be less data going across a bottleneck.

@ServeTheHomeVideo 2 года назад

Probably less common in the very near term for the desktop, but this is what storage vendors use QAT for.

@Fast_studyIQ 2 года назад

Intel will be a step ahead in competition 😎

@_droid 2 года назад

Cool. I had not heard of these but have been looking for a long time for a way to cheaply accelerate my older servers. My servers run a lot of modern hardware like high speed NVMe and they can't keep up. I think this is what I need. Will these basically work on any machine? I mean it's just a PCIe card, right?

@kenzieduckmoo 2 года назад

It’s weird seeing QAT talked about in servers when I’ve only seen it used in graphics rendering using the intel igpu in desktop processors

@ServeTheHomeVideo 2 года назад

That is Quick Sync. Intel went on a "Quick...." binge for a bit.

@jr8656 2 года назад

Interesting. How much and what kind of work is needed to use it ? Do software need to be recompiled ?

@_Steven_S 2 года назад

If Intel made NICs with QAT that can be used in embedded Epyc systems... 😁

@ServeTheHomeVideo 2 года назад

That would be a DPU/ IPU at this point.

@deimosian 2 года назад

@@ServeTheHomeVideo Well, no not really, there'd still be no CPU core complex on the card so it wouldn't pass your own 8 point 'is this a DPU?' checklist.

@ServeTheHomeVideo 2 года назад

Ah I meant more like Mt. Evans would be the closest product, or the FPGAs. But if you went FPGA + EPYC embedded, then you would get Xilinx most likely.

@galen__ 2 года назад

5:15 - I remember having this Antec ITX case 👍

@JSLEnterprises 2 года назад

but will it allow pfsense to allow multiple vpn connections without grinding to kb/s?

@deimosian 2 года назад

that should already be possible w/o QAT unless you're running on something really terrible

@kimbentsen 2 года назад

Calm down Patrick. You are drinking too much cofee.

@fnjordy Год назад

Rather important note that the QAT acceleration for NGINX only works with HTTP/1.1. So, enable HTTP/2 or HTTP/3 and AMD wins again.

@captgrant 8 месяцев назад

Can you please do an update for this? Not your full blown speed tests, but more of the proper mix and match what's out there to have it function and somewhat futureproof.

@CalConrad Год назад

Does the 8970 support chacha20/poly?

@AlexSchendel 2 года назад

No way! I work in JF5, wish I could've seen you haha.

@ServeTheHomeVideo 2 года назад

Bummer! I was in the cafeteria and people were saying hi quite a bit.

@RylandBingham 2 года назад

I like to watch many of my RU-vid videos at 1.25x or 1.5x speed… Not Patrick’s videos!

@Lishtenbird 2 года назад

You should try that with Overly Sarcastic Productions ;)

@dorinxtg 2 года назад

Loved the video, but IMHO one thing is missing - if a customer already has EPYC servers or he's planning to buy the upcoming Genoa CPU's, is there something for him related to acceleration? I'm sure there's some serious NDA that has been signed by you and AMD but still, some hints ... ;)

@ServeTheHomeVideo 2 года назад

Pensando and Xilinx.

@bartios 2 года назад

@@ServeTheHomeVideo That's exactly what I thought while you were talking about the accelerator card. The functionality on there will probably be folded into the DPU which also means we will have more vendors putting out products which can do this.

@microcolonel Год назад

As for ciphers, ChaCha20-Poly1305 is not going anywhere really, it is seriously overpowered (20 rounds of ChaCha is silly overkill, 12 is sensible overkill, 8 is probably fine).

@ADB-zf5zr 2 года назад

Duuuuuuude, if you are going to do it, take half or less of the amount of Coke you took before this video :o

@GanryuMVP 2 года назад

It's disingenuous to use performance per thread for accelerators when the accelerators don't scale linearly (or at all) with more threads.

@bluedeath996 2 года назад

What is the AMD equivalent to this?

@ServeTheHomeVideo 2 года назад

Pensando and Xilinx will be

@aliancemd Год назад

17:35 That is so misleading. To remind everyone that AMD(and Intel) supports AES-NI, which has significantly better support, including from hypervisors - there is really no reason to compare hardware acceleration vs no hardware acceleration at all, on hardware that has it.

@nster3 2 года назад

Kind of rubs me the wrong way that you didn't mention AMD's solution, Xilinx/ Pensando (is this available now or soon?) and the Intel QAT card can be used in an AMD system. Looking at the video, one could easily think Intel has a huge advantage over AMD. Hard to believe Intel didn't have a say in this or maybe you were influenced by their sponsorship. I'm not saying do something that sours the relationship, but just mentioning it would have been much better already. Honestly I feel like the AMD results should have been removed as you are comparing apples to oranges, only thing it does is make AMD look bad. Hope you can keep that in mind in the future.

@ServeTheHomeVideo 2 года назад

I asked AMD for its Pensando/ Xilinx solutions but they still have not sent cards. Only so long we can wait. We cover Pensando a lot on the main site

@zachariah380 2 года назад

@@ServeTheHomeVideo so will xilinx and bluefield solutions be able to do this same sort of thing? I'm curious what the implementations will be like on the software stack in order to utilize the offload, and if there is direct hardware support in those cards to accelerate these functions (specific cyphers, etc), and if so or if not, will that affect bandwidth, latency, Max # of connections, and power efficiency. I'd be VERY curious to have these same tests and cases with the same basic base hardware paired with nvidia and amd accel/dpu cards benchmarked the same way (if possible) so that these numbers could be put side by side with them, and show how intel compares to other vendor solutions (and how much work it would be on the software side to implement - like is there native support for each solution in popular products like pfsense, web server stacks, etc)

@georgeashmore9420 2 года назад

Hillsborough 😭 well an attempt was made

@shadowmist1246 2 года назад

How much coffee have you had?

@ServeTheHomeVideo 2 года назад

No coffee. I record these usually between 4:30AM and 7:30AM before any coffee so I do not get too excited

@gg-gn3re 2 года назад

5:15 lmao, using windows to run a bunch of linux software and then... of course.. microsoft word. (edge counts as linux cuz it's chromium)

@brynyard 2 года назад

Wow, that is super weird! A HW accelerator that actually accelerates something! But seriously guys, why the f.. are you trying to make this a comparison against AMD with no acceleration, this is just plain silly.

@ServeTheHomeVideo 2 года назад

We used a faster AMD CPU so when we did things like acceleration via ISA-L AMD was faster due to the clock speed and extra TDP it had. The TDP difference between lower power Intel parts and the higher power AMD SKUs we used is about the same as QAT card TDP. AMD has promised the Pensando solution for example, but has yet to deliver cards and we cannot eBay them. When Pensando cards arrive, we will look at those.

@brynyard 2 года назад

@@ServeTheHomeVideo showing that a specialized HW unit + CPU can draw the same power as another CPU isn't any less silly, and excusing this with AMD not yet delivering similar HW isn't a very good excuse for making a silly and utterly useless "comparison".

@ServeTheHomeVideo 2 года назад

What is the alternative though? No major server vendors support QAT cards in EPYC systems. You can put them in, but that is a one-off unsupported configuration that would be a lab project, not something that people would really deploy. That is why we need AMD's accelerators so we can have real solutions not lab experiments.

@brynyard 2 года назад

@@ServeTheHomeVideo the alternative is to not do silly stuff and only report on what the QAT can do. The fact that QAT doesn't work on EPYC is partially (if not fully) Intel's fault, so trying to put this on AMD is just even more silly. BTW, an accelerator from AMD will perform very differently, and comparing the two would also make little sense, these are very specific SW accelerators, but you probably already know that?

@zachariah380 2 года назад

@@brynyard it's not silly at all. If there is no industry supported alternative for amd systems, this could mean the difference between choosing an intel or an amd platform for a specific application based purely on the amount of resources we've just been shown get used. This may be a huge realization for a lot of people, and may affect purchasing decisions for variously sized projects. In larger data centers, optimizing for a specific use case can potentially mean the difference of a ton of power, latency, number of connections a server can make while still performing work with those connections, so users per server, so then number of total servers, so data center sizing, etc. This may have huge implications for our very Internet-oriented data centers, with all kinds of encryption and very little inter-data center machine-to-machine trust.

@Lishtenbird 2 года назад

0:58 CLI ASMR when?

@Phil-D83 2 года назад

The quickassist cards are $$$ on ebay. Wanted one for my opnsense box

@ServeTheHomeVideo 2 года назад

The 2nd gen cards are not crazy expensive.

@boguslawbochynski 2 года назад

Perf per Wat ?

@ServeTheHomeVideo 2 года назад

The QAT cards are basically server PCHs on a card so they are like 23W TDP parts. That is why I wanted to use EPYC CPUs with 15W each more (30W total) to at least bridge some of the gap.

@antoinereese4295 2 года назад

This is a 25 minute ad

@nanite10 2 года назад

If you're having to spend the dev time to implement QAT within your application, why marry yourself to a hardware specific component when there are fast real-time algorithms like LZ4 and ZSTD when you can get 1 GB/s+ per core? I don't get the feel that the forward looking storage vendors are continuing down the hardware accelerated path here as they get locked into a specific technology and then they're unable to port elsewhere, i.e. cloud.