This video is part of the HIP workshop playlist. View full playlist here: • Advanced HIP Workshop
Memory that is shared is available for reading and writing from every thread in the block, even if a block consists of more than one thread team. However since threads and thread teams in a block don't always finish work at the same time, there must be block-level synchronisation before threads can read what others have written.
Shared memory is often located in high bandwidth, low latency cache on the compute device and might provide a speed boost by virtue of being in close proximity (in terms of latency and bandwidth) to the hardware threads that execute a kernel.
3 июл 2024