Тёмный

George Hotz | Programming | multiGPU with HIP (or maybe without HIP) | HSA_DISABLE_CACHE=1 | Part 2 

Подписаться
Просмотров 20 тыс.
% 228

Date of the stream 20 Jan 2024.
from $1250 buy comma.ai/shop/comma-3x & best ADAS system in the world openpilot.comma.ai
Live-stream chat added as Subtitles/CC - English (Twitch Chat) - at the bottom - Show Transcript
Sources:
- rocm.docs.amd.com/projects/HIP/en/latest/user_guide/hip_porting_guide.html#threadfence-system
Follow for notifications:
- twitch.tv/georgehotz
Support George:
- twitch.tv/subs/georgehotz
Pre-order tinybox:
- buy.stripe.com/5kAaGL6lk9uX9nW144 (tinygrad.org/)
Chapters:
00:00:00 intro
00:02:40 HIP P2P copy slow
00:15:20 hipDeviceEnablePeerAccess fast
00:17:30 synchronization mechanism
00:21:50 dumb questions = banned
00:23:20 no idea if this is working
00:25:40 hip c++ 11 atomics
00:29:10 cuda atomic sync
00:31:20 global cache flush hip
00:34:20 glc dlc
00:42:00 no idea if access is correct
00:43:00 hip atomic and system example
00:48:10 deep dive in spinlocks
00:49:00 hip cache flush
00:52:10 global_atomic_and_b32
00:56:30 hipDeviceSetCacheConfig
00:58:10 threadfence system
01:03:00 block
01:04:40 D0, D1
01:07:00 events are slow because syscalls
01:10:40 answering smart questions
01:11:25 bad athomics understanding
01:11:45 x86 vs arm difference
01:12:20 HIP event API slow
01:12:50 tinybox vs 8xA100
01:13:40 radeon pro benefit 48GB RAM
01:14:20 H100 price, MI300
01:15:00 buy gaming PC in $1000 range
01:16:25 6x7900 XTX
01:16:45 nvidia software support
01:17:25 tinybox power efficiency
01:18:20 lamma2 70b unquantized on tinybox
01:19:00 tinybox V2 same case, GPU refresh
01:20:00 6 tinyboxes a day factory
01:20:35 nvidia blackwell release timeline
01:26:10 never shipping anything in your life
01:32:30 removing lines
01:40:30 test_multitensor
01:48:20 deeper problems, good abstractions
01:50:55 does not work and it's slow
01:55:00 Hello nobody is talking
02:01:30 event 4
02:03:15 Lana_Lux raid!
02:03:50 introduction to new people, tinygrad
02:04:15 beautiful_mnist.py
02:05:30 tinygrad operations
02:06:40 matrix multiplication in tinygrad
02:07:10 convolution
02:07:55 pytorch difference, dtypes code
02:08:45 the bitter lesson, search machine
02:09:30 DEBUG=2
02:10:20 convolution custom op
02:10:50 tinygrad long term vision
02:11:10 tiny corp business model
02:11:40 tinybox
02:12:00 getting AMD on MLPerf
02:12:15 synchronization between GPUs
02:12:30 international forwarding
02:13:20 tinybox HIP support, full 16 lane pcie 4 GPU multi GPU connectivity
02:14:40 ocp 3.0 slot
02:15:20 tinybox infiniband
02:16:25 direct AMD support
02:16:50 traning over internet
02:17:05 synchronous gradient descent
02:17:50 best chip nvidia H100
02:18:50 tinybox launch apps, tinygrad bounties
02:20:10 no secret software
02:20:45 tinygrad master under 5k lines of code
02:21:30 9x 140mm noctua fans
02:22:30 custom cables, demos on pc
02:24:30 line count
02:25:15 4090 price tracker
02:25:40 decision transformers
02:28:25 small font
02:31:15 python no C
02:36:50 zyn
02:40:30 semaphores for 6 GPUs, thread_sync_global
02:47:10 writing GPU program
02:51:20 DEBUG=2 synchronization points
02:55:00 tiny9, hip cache flush
03:00:40 hip coherent memory
03:03:30 amd_cocclr_copyBuffer
03:06:00 amdgcn sleep
03:11:40 hipEventRecord fix
03:23:05 we do things because we can
03:26:30 hipSyncAndResolveStream
03:28:15 HSA_DISABLE_CACHE=1
03:31:25 sending GPUs back to AMD
03:31:54 it compiles but it does not do anything piano musical
03:32:20 no boost, no cache
03:33:10 whiteboard idea
03:36:00 memory-fence instructions, no HIP rebuild
03:38:50 cache vs no cache
03:42:30 whiteboard
03:44:25 hsa_disable_cache definition
03:48:30 HSA amd flush l2 cache
03:52:00 tinybox not responding
03:53:20 tinybox piano comeback music, HSA driver, fences
03:56:25 tinybox back
03:59:40 need to write GPU driver, threadfence_system
Official George Hotz communication channels:
- geohot.com
- realGeorgeHotz
- georgehotz
- tinygrad.org
- geohot.github.io/blog
- github.com/geohot
We archive George Hotz and comma.ai videos for fun.
Follow for notifications:
- geohotarchive
Thank you for reading and using the SHOW MORE button.
We hope you enjoy watching George's videos as much as we do.
See you at the next video.

Наука

Опубликовано:

 

22 янв 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 29   
@geohotarchive
@geohotarchive 8 месяцев назад
02:03:50 tinygrad intro | 03:31:25 sending GPUs back to AMD | 03:31:54 it compiles but it does not do anything musical | 03:52:00 tinybox not responding | rocm.docs.amd.com/projects/HIP/en/latest/user_guide/hip_porting_guide.html#threadfence-system Threadfence_system makes all device memory writes, all writes to mapped host memory, and all writes to peer memory visible to CPU and other GPU devices. Some implementations can provide this behavior by flushing the GPU L2 cache. HIP/HIP-Clang does not provide this functionality. As a workaround, users can set the environment variable *HSA_DISABLE_CACHE=1* to disable the GPU L2 cache. This will affect all accesses and for all kernels and so *may* have a performance impact. Bounties for tiny corp / tinygrad -> docs.google.com/spreadsheets/d/1WKHbT-7KOgjEawq5h5Ic1qUWzpfAzuD_J06N1JwOCGs/ ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-lnVQsJJFcdg.html Hiring entire stack for tiny corp join if you are interested | ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-lnVQsJJFcdg.html work major source of value in your life Pre-order tinybox buy.stripe.com/5kAaGL6lk9uX9nW144 more info on -> tinygrad.org | github.com/tinygrad/tinygrad comma 3X comma.ai/shop/comma-3x | best ADAS system in the world openpilot.comma.ai | from $999 comma.ai/shop/body the future of people Support George by subscribing twitch.tv/subs/georgehotz | Follow George on twitter.com/realGeorgeHotz to be up to date | Read George's geohot.github.io/blog/ Chapters: 00:00:00 intro 00:02:40 HIP P2P copy slow 00:15:20 hipDeviceEnablePeerAccess fast 00:17:30 synchronization mechanism 00:21:50 dumb questions = banned 00:23:20 no idea if this is working 00:25:40 hip c++ 11 atomics 00:29:10 cuda atomic sync 00:31:20 global cache flush hip 00:34:20 glc dlc 00:42:00 no idea if access is correct 00:43:00 hip atomic and system example 00:48:10 deep dive in spinlocks 00:49:00 hip cache flush 00:52:10 global_atomic_and_b32 00:56:30 hipDeviceSetCacheConfig 00:58:10 threadfence system 01:03:00 block 01:04:40 D0, D1 01:07:00 events are slow because syscalls 01:10:40 answering smart questions 01:11:35 bad athomics understanding 01:11:45 x86 vs arm difference 01:12:20 HIP event API slow 01:12:50 tinybox vs 8xA100 01:13:40 radeon pro benefit 48GB RAM 01:14:20 H100 price, MI300 01:15:00 buy gaming PC in $1000 range 01:16:30 6x7900 XTX 01:16:45 nvidia software support 01:17:25 tinybox power efficiency 01:18:20 lamma2 70b unquantized on tinybox 01:19:00 tinybox V2 same case, GPU refresh 01:20:10 6 tinyboxes a day factory 01:20:35 nvidia blackwell release timeline 01:26:10 never shipping anything in your life 01:32:30 removing lines 01:40:30 test_multitensor 01:48:20 deeper problems, good abstractions 01:50:55 does not work and it's slow 01:55:00 Hello nobody is talking 02:01:30 event 4 02:03:15 Lana_Lux raid! 02:03:50 introduction to new people, tinygrad 02:04:15 beautiful_mnist.py 02:05:30 tinygrad operations 02:06:40 matrix multiplication in tinygrad 02:07:10 convolution 02:07:55 pytorch difference 02:08:25 dtypes code 02:08:45 the bitter lesson, search machine 02:09:30 DEBUG=2 02:10:30 convolution custom op 02:10:50 tinygrad long term vision 02:11:10 tiny corp business model 02:11:40 tinybox 02:12:00 getting AMD on MLPerf 02:12:15 synchronization between GPUs 02:12:30 international forwarding 02:13:20 tinybox HIP support, full 16 lane pcie 4 GPU multi GPU connectivity 02:14:40 ocp 3.0 slot 02:15:20 tinybox infiniband 02:16:25 direct AMD support 02:16:50 traning over internet 02:17:05 synchronous gradient descent 02:17:50 best chip nvidia H100 02:18:50 tinybox launch apps, tinygrad bounties 02:20:10 no secret software 02:20:45 tinygrad master under 5k lines of code 02:21:30 9x 140mm noctua fans 02:22:30 custom cables 02:22:50 demos on pc 02:24:30 line count 02:25:15 4090 price tracker 02:25:40 decision transformers 02:28:25 small font 02:31:15 python no C 02:36:50 zyn 02:40:30 semaphores for 6 GPUs 02:41:20 thread_sync_global 02:47:10 writing GPU program 02:51:20 DEBUG=2 synchronization points 02:55:00 tiny9 02:57:00 hip cache flush 03:00:40 hip coherent memory 03:03:30 amd_cocclr_copyBuffer 03:06:00 amdgcn sleep 03:11:40 hipEventRecord fix 03:22:55 recurring expenses and non recurring expenses 03:23:05 we do things because we can 03:26:30 hipSyncAndResolveStream 03:28:15 HSA_DISABLE_CACHE=1 03:31:25 sending GPUs back to AMD 03:31:54 it compiles but it does not do anything piano musical 03:32:20 no boost, no cache 03:33:10 whiteboard idea 03:36:00 memory-fence instructions, no HIP rebuild 03:38:50 cache vs no cache 03:42:30 whiteboard 03:44:25 hsa_disable_cache definition 03:48:30 HSA amd flush l2 cache 03:52:00 tinybox not responding 03:53:20 tinybox piano comeback music, HSA driver, fences 03:56:25 tinybox back 03:59:40 need to write GPU driver, threadfence_system
@서승우-f4q
@서승우-f4q 8 месяцев назад
everytime i see this guy i am amazed. I love the way he focuses on what he loves. I really really love this field but when i do my project or something, I can't focus on programming only. I really really really respect geo hot. When I started watching him live, I also started eliminating distractions.
@sudhamjayanthi
@sudhamjayanthi 8 месяцев назад
watched a lot of george's streams but finally in this stream starting from 2:04:00 is the best introduction to noobs!
@alexzan1858
@alexzan1858 8 месяцев назад
holy shit thats some ugly python code :D i wish he didnt write it like C
@seriouce4832
@seriouce4832 8 месяцев назад
George you can also disable L2 by turning off the tinybox. Easy fix #AMD
@kompila
@kompila 8 месяцев назад
Man is doing a lot of really hard things. It is fine, I am confused.
@naesone2653
@naesone2653 8 месяцев назад
its asmr for me i understand like 20% of it but its better than music
@BrianRoby-s2s
@BrianRoby-s2s 8 месяцев назад
Did you make driving chill yet?😂😂😂😂😂😂😂😂😂😂😂😂😂
@ethereumnode5641
@ethereumnode5641 8 месяцев назад
Bro says no viewers when he goes live but all my sub money is in shitcoins so I'll never be a viewer
@xmorse
@xmorse 8 месяцев назад
AMD software is ass as always
@evokanivo
@evokanivo 8 месяцев назад
very ass
@the_tanktb977
@the_tanktb977 8 месяцев назад
the piano at the end was crazy hahahhaa. edit** this stream ending was nuts hahahah
@mikestaub
@mikestaub 7 месяцев назад
"If everyone did things just to accomplish things, the world would suck ass." This is actually a profound statement.
@tdoc666___
@tdoc666___ 8 месяцев назад
you know what i was thinking about, the "evolution" of life, did anyone in history ever created something like that from scratch with a program? i mean simulating atoms, particles, mollecules, big bang, evolution from a particle to a human being, has anyone ever tried to do something like that? that will be absolutelly the MOST interesting stuff to program, i think that's too complicated even for Geo hahaha😃
@tython3
@tython3 8 месяцев назад
"Lossless Compression is Intelligence." Disagree. Intelligence is upscaling. Intelligence is being able to go from lossy compression back to the source.
@beefburger888
@beefburger888 8 месяцев назад
next project, a audio hz map with all detail about volume high to low and base and mid and also a evenly spaced tones and you have button like speak loud and slow and quiet and then a 3d model of vocal chord and recherche how sound will be and show a girl pick and it can model the audio based on her face shape neck thickness, and maybe weight.
@dmembrane_
@dmembrane_ 8 месяцев назад
Hack The Planet!
@rohan_devarc
@rohan_devarc 8 месяцев назад
Async is bad!
@asatorftw
@asatorftw 8 месяцев назад
Seeing the last two videos really makes me question if AMD was the right choice for the GPUs in TinyBox. I know his point is to go away from the "tyranny" of Nvidea, but damn this seems like a lot of unnecessary pain.
@PetWanties
@PetWanties 8 месяцев назад
the right path isn't always the easiest one, this will advance the field and markets
@Bromon655
@Bromon655 8 месяцев назад
Is this dude still relevant? Last I heard he was trying to make a Tesla clone, then he vanished into obscurity
@evokanivo
@evokanivo 8 месяцев назад
Dude started a company that built the best ADAS/self-driving software that exists. What are you on?
@HonestGenius
@HonestGenius 8 месяцев назад
TinyBachs
@kuperrr6776
@kuperrr6776 8 месяцев назад
which is the last song George plays with the piano?
@Brettscott1
@Brettscott1 8 месяцев назад
Tinybox come back
@bandr-dev
@bandr-dev 8 месяцев назад
how did he get to the level to understand this i dont get it
@reen6904
@reen6904 6 месяцев назад
College
@bandr-dev
@bandr-dev 6 месяцев назад
@@reen6904 you did not just suggest college to get actual computer science knowledge
@yunuszenichowski
@yunuszenichowski 8 месяцев назад
I don't know if it got fixed yet, but it should be (blockIdx.x * blockDim.x + threadIdx.x) * 4 not + 4 in the copy kernel !!!