No video :(

std::simd: How to Express Inherent Parallelism Efficiently Via Data-parallel Types - Matthias Kretz

CppCon

Подписаться 152 тыс.

Просмотров 17 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

27 авг 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 16

@__hannibaalbarca__ 8 месяцев назад

Std::SIMD finally.

@user-lm1ew4ep7k 8 месяцев назад

yeah...

@scion911 8 месяцев назад

I absolutely love this, for my current project/library this is absolutely a game changer for portability.

@Roibarkan 8 месяцев назад

28:17 [slides 31-32] I think “old school c developers” would define Pixel as a union of a single uint32_t and a struct with 4 uint8_t, and try to use this union as a way of simplifying the read-/writing code. Such approaches are undefined in c++ (break strict aliasing rules, I believe). I’m not sure if that C-style state of mind could guide us when designing how c++ should do it. Perhaps we should allow some std::simd for T’s that are aggregates of same-type “vectorizable” member-variables? Perhaps this is a generalization that can implicitly allow simd, mentioned in 48:22. Great talk, thanks Matthias !

@redram4574 3 месяца назад

very useful video

@dat_21 8 месяцев назад

It's a cool concept, but in practice that will mean even more spoon-feeding the compiler to get the code you want.

@eclipse4419 8 месяцев назад

Awesome!!

@Roibarkan 8 месяцев назад

Great talk! It seems that exploiting ILP when using simd can be very beneficial. Will library/compiler vendors be allowed to “do it for us” - e.g. is the default size() of std::simd strictly mandated by the hardware, or will specific compiler/library vendors be allowed to choose larger size() (perhaps based on compiler flags) to exploit ILP? perhaps the ABI tag which was mentioned is able to support such desires.

@blacklion79 7 месяцев назад

Intel's left hand: push SIMD into all languages it could, including many mask defined operations. Intel's right hand: don't give us, simple people, AVX-512 for 10 years.

@PaulJurczak 8 месяцев назад

@4:00 I'm curious why fake_modify/fake_read instead of passing initial x value as a parameter and returning the result.

@cranil 7 месяцев назад

Because the compiler might simply remove the loop if you don’t use it later. And for modify first I think it’s to avoid the compiler pre computing the result at compile time.

@Alexander_Sannikov 8 месяцев назад

if you actually care about the performance of your data-parallel code, your PC has a special massively powerful hardware component that's specifically designed to maximize the throughput of this exact kind of task. it's called a GPU.

@MrHaggyy 8 месяцев назад

Only view systems that have SIMD also have a graphics processor. And if they have one it`s only as much as you need for graphics. Servers, industrial machines, cars, home and kitchen devices etc. pp.

@ckjdinnj 8 месяцев назад

Sending data to the gpu and reading back a result is also a pretty slow so for algorithms that utilize recursion or dynamic programming the gpu doesn’t make for a great resource.

@Antagon666 Месяц назад

Ahmad's law. GPU processing is only ever worth it when the compute time greatly outweighs the serial time (in this case the atrocious pcie transfer times).