Тёмный

Mamba and S4 Explained: Architecture, Parallel Scan, Kernel Fusion, Recurrent, Convolution, Math 

Umar Jamil
Подписаться 41 тыс.
Просмотров 41 тыс.
50% 1

Опубликовано:

 

26 сен 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 137   
@andrewhaslam8785
@andrewhaslam8785 8 месяцев назад
Brilliant - you are easily one of the most lucid and accessible teachers of deep learning.
@ItsRyanStudios
@ItsRyanStudios 8 месяцев назад
this is absolutely FANTASTIC I watched Albert Gu's stanford lecture on state space models/ Mamba, and it was a great high level overview. But I really appreciate you taking it slower, and going farther into detail on the basic/ fundamental concepts. A lot of us aren't mathematicians or ML engineers, so it's much appreciated to be helped along with those concepts.
@umarjamilai
@umarjamilai 8 месяцев назад
Thank you for your kind words. Please share the video in your network, it would help me a lot. Thanks!
@danaosama4247
@danaosama4247 6 месяцев назад
I rarely comment on videos, but this one was worth it. Thank you so much for such a clear explanation. You explained all the nuances that I previously did not understand in a very clear way. God bless you.
@anirudh514
@anirudh514 8 месяцев назад
Your teaching approach is very good. You started from fundamental concepts and went deeper. This helped in gaining intuitions, understanding and avoid confusions in later part. Brilliant!
@SatyanarayanSenapati-b1s
@SatyanarayanSenapati-b1s Месяц назад
Words will fall short to appreciate the work you put to create these videos. Simply BRILLIANT.
@trungquang1581
@trungquang1581 6 месяцев назад
I just read about mamba and wanted to find a detailed explanation video. All you covered in this video is everything I need, thank you so much, keep on cooking
@흰강아지-s4v
@흰강아지-s4v 4 дня назад
this is just a pure art; thanks so much
@sid-prod
@sid-prod 8 месяцев назад
I'm so glad i found this channel, you are a gold mine for such content, please keep them coming.
@jiegong529
@jiegong529 3 месяца назад
You are just too amazing! You can understand these stuff in great detail. Then you take the time and explain to us in educative videos. A true gem channel!
@aruns.v9248
@aruns.v9248 8 месяцев назад
The whole lecture was very intuitive. Thanks for the efforts put into building this video!
@purohitadey-bc9bg
@purohitadey-bc9bg 4 месяца назад
Understanding mamba couldn't be better than this !
@remyshootingstars
@remyshootingstars 8 месяцев назад
🙌 Still working through Transformers from scratch. Hopefully a Mamba from scratch is in the future!
@sari54754
@sari54754 8 месяцев назад
After I saw this lecture, I subscribed your channel. It is the most easy to understand Mamba lecture I've seen.
@AUTO-g7s
@AUTO-g7s 8 месяцев назад
作为一个来自北京的大学生,谢谢你分享的这篇文章解析!best wishes!
@myfolder4561
@myfolder4561 5 месяцев назад
Thank you so much. Lots of useful details yet you curate through them at such a good tempo with easy to follow examples
@a123s1l
@a123s1l 3 дня назад
Thanks for your clear explanation of MAMBA, coming from a control theory background, very much appreciate its usage in LLMs. One small error that I noted was that the A matrix must be N x N to translate the previous N-dimensional hidden states h(t-1) to h(t). I believe the A matrix is also time-varying to produce selective output tokens.
@TheFitsome
@TheFitsome 28 дней назад
some people are just born to teach.
@trevorhobenshield
@trevorhobenshield 8 месяцев назад
Very high quality, this is great. Hard to find good content like this. Thanks Umar!
@mudassirkhan9054
@mudassirkhan9054 8 месяцев назад
Thanks for explaining it in a way that anyone with some high school math background can understand, keep this up!
@arvyzukai
@arvyzukai 8 месяцев назад
This is gold! I really appreciate attention to the details. Thank you Umar!
@celestchowdhury2605
@celestchowdhury2605 7 месяцев назад
Thank you so much for your detailed video and thoughtful thinking of you that we will need help with the equations! You are a savior!
@我我-p3z
@我我-p3z 2 месяца назад
最清晰的讲解!
@Mirai12377
@Mirai12377 7 месяцев назад
very good video!!! thanks a lot for your efforts!!!!
@selayan4985
@selayan4985 3 месяца назад
Such a briliant work you have done. Really learned a lot, thanks!!!
@The_bioinformatician
@The_bioinformatician 6 месяцев назад
This is the best deep learning video I've ever seen. I will surely use some of your slides to teach my students
@akshikaakalanka
@akshikaakalanka 5 месяцев назад
This is really helpful for another talk I am doing on Mamba. Thank you very much for putting this out.
@fabiogomez8250
@fabiogomez8250 8 месяцев назад
Best MAMBA video at the moment!
@wayneqwele8847
@wayneqwele8847 7 месяцев назад
Thank you. I appreciate the approach you took in explaining the major concepts.
@beincheekym8
@beincheekym8 4 месяца назад
Brilliant video! Really clear and with just the right amount of details!
@ankush4617
@ankush4617 8 месяцев назад
Thanks for the amazing work as usual! Keep it up - this is probably one of the highest quality content on LLMs on youtube.
@optomosprime
@optomosprime 8 месяцев назад
Excited for the video. I was searching for a video on Mamba and today I saw this. Your Transformer video helped me alot previously. Keep it up!
@danamics
@danamics Месяц назад
Great job on this video! I learned a lot
@majidemami577
@majidemami577 8 месяцев назад
Excellent video! Thank you. I have watched a few videos about mamba and this one was by far the best.
@mcHsyu
@mcHsyu 8 месяцев назад
Great explanation!! This is the first video that mekes me comprenhad the whole mamba paper.
@nishanthshetty435
@nishanthshetty435 7 месяцев назад
Thanks a ton! Excellent explanation and great analogies to introduce the more advanced material. This is an absolute masterclass on how to teach advanced material.
@prashlovessamosa
@prashlovessamosa 8 месяцев назад
Salute to consistency Thanks Umar sir.
@GenAiWarrior
@GenAiWarrior 8 месяцев назад
Thank you so much for your efforts to make such an amazing video on Mamba architecture !!
@ActualCode0
@ActualCode0 8 месяцев назад
This is one of the best ML explanations I've seen even though I didn't understand all of it but I definitely learnt something new.
@shoubhikdasguptadg9911
@shoubhikdasguptadg9911 6 месяцев назад
Ohhh Man, why did I discover this gem so late :( This guy is a rockstar!
@BooleanDisorder
@BooleanDisorder 7 месяцев назад
Even I understood much of this. I have no education. Thank you! Mamba looks really cool. Especially like the long context and further refinement. It looks like a model that could be made to learn as it goes. Plasticity potential
@Erosis
@Erosis 8 месяцев назад
As others have mentioned, you have a keen ability to explain difficult topics succinctly and completely. Keep up the awesome work! I could of used this when I took a class on time-series modeling! Hah!
@SpandanMishra-z4r
@SpandanMishra-z4r 8 месяцев назад
OMG ! this is such as amazing description , you made my day
@TheRohit901
@TheRohit901 7 месяцев назад
Amazing explanation. I love this video because it covers sufficient depth and explains each concept with proper examples. I've subscribed instantly, and look forward to more such videos on recent papers.
@allengeng6660
@allengeng6660 5 месяцев назад
Very nice talk, thank you.
@raaminakbari
@raaminakbari 6 месяцев назад
Thank you for this great and smooth explanation. I think the model you are showing at 36:14 is valid if matrix A ( and B also to send each input directly to the corresponding ssm) is diagonal. Now in this way each hidden state at different canonical direction ( or different element of the vector) is independent of each other. So if A is not diagonal then assuming an eigen decomposition exist, then we may say there exist an equivalent ssm which can be represented independent ( if we change the basis to eigen basis) .
@kwanhowong5065
@kwanhowong5065 7 месяцев назад
Really an amazing video! You save me a lot of time! Thank you!
@eafadeev
@eafadeev 6 месяцев назад
You're making very useful content, thank you!!! Maybe you could consider using larger text, so that one could read easily from a phone. Also a plus would be if the presentation were white on black (or bright color on black), it is less tiring to look at a dark screen for long periods of time.
@luisrperaza
@luisrperaza 7 месяцев назад
I did learn a lot! Many thanks for making this video.
@pcwang7803
@pcwang7803 5 месяцев назад
Great lecture! It is easier for me to understand the work with your lecture. Can you give one for Reinforcement learning?
@whisperlast6548
@whisperlast6548 6 месяцев назад
This video is of great help!!Thank you very much.
@soroushmehraban
@soroushmehraban 6 месяцев назад
Love it! Keep up the amazing work.
@bulat_15
@bulat_15 7 месяцев назад
Thanks man! This helped me a lot
@팽도리-v6s
@팽도리-v6s 3 месяца назад
Amazing video.
@EkShunya
@EkShunya 8 месяцев назад
i always eagerly wait for your explainer. they are 🤯. thank you :)
@tunatuncer5639
@tunatuncer5639 6 месяцев назад
wow that's a great explanation , thanks for the efforts!
@m1k3b7
@m1k3b7 6 месяцев назад
Brilliant explanations. Thanks.
@artaasadi9497
@artaasadi9497 7 месяцев назад
Thanks a lot that was very useful!
@walidmaly3
@walidmaly3 7 месяцев назад
One of the best! I have one question if we apply conv in S4 on sequence of length L, what will be size of conv layer?
@rezagholipoor7900
@rezagholipoor7900 Месяц назад
It was very informative
@alainrieger6905
@alainrieger6905 2 месяца назад
Awesome video as usual
@amitshukla1495
@amitshukla1495 8 месяцев назад
Absolutely amazing 🎉
@buh357
@buh357 5 месяцев назад
you are the best.
@sayandas13
@sayandas13 2 месяца назад
Awesome explanation. Really appreciate such content. Can you please make a similar explanation video on the Mamba-2 paper?
@akashkumar-jg4oj
@akashkumar-jg4oj 8 месяцев назад
Great explanation!
@НикитаБуров-ъ6р
@НикитаБуров-ъ6р 6 месяцев назад
i've just started watching but guess this vid'll be much usefull
@nguyenhuuuc2311
@nguyenhuuuc2311 8 месяцев назад
Thanks for the awesome content! Hope the next one will be about DPO and coding it from scratch ❤
@umarjamilai
@umarjamilai 5 месяцев назад
You're welcome: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-hvGa5Mba4c8.html
@nguyenhuuuc2311
@nguyenhuuuc2311 5 месяцев назад
@@umarjamilai Thank you!!! You're so talented at research and teaching!!!!
@GoogleColab003
@GoogleColab003 7 месяцев назад
absolutely fantastic
@divgill6062
@divgill6062 8 месяцев назад
Amazing! So detailed. Well done sir
@810602jay
@810602jay 8 месяцев назад
Thanks Umar! 🥰Very amazing learning material for Mamba!
@dotori-hj
@dotori-hj 6 месяцев назад
Fantastic
@杨辉-l2g
@杨辉-l2g 8 месяцев назад
excellent work! Thank you
@Charles-Darwin
@Charles-Darwin 7 месяцев назад
Thank you
@ShubhamAshokGandhi
@ShubhamAshokGandhi 7 месяцев назад
Great explanation. Very through. Loved it. I struggled with understanding the SSM paper. You explained all the bits beautifully
@HosseinKhosravipour
@HosseinKhosravipour 2 месяца назад
very great
@toxicbisht4344
@toxicbisht4344 7 месяцев назад
amazing explanation waiting for new video please upload soon
@edsonjr6972
@edsonjr6972 8 месяцев назад
Excellent video! I'm looking forward if you do a coding one. Thank you so much for your work to the AI community
@umarjamilai
@umarjamilai 8 месяцев назад
Coding one is not very interesting, because the most interesting part is the selective scan algorithm, which is a CUDA Kernel. The architecture is not so different from any other language model. Of course it would be super cool to code the CUDA kernel from scratch ;-)
@SandeepS-i4e
@SandeepS-i4e Месяц назад
Great❤
@123456ewr
@123456ewr 8 месяцев назад
Thanks, i hope you explain rwkv
@passarodavide
@passarodavide 8 месяцев назад
Bellissimo video, grazie!
@umarjamilai
@umarjamilai 8 месяцев назад
Grazie a te!
@umuthalil5001
@umuthalil5001 6 месяцев назад
Hi, I was wondering if you could explain 36:40 a bit more where you talk about multi head attention. From what I understand each head in multi-head attention each head looks at the whole input vector. Our key value and query matrices are all of size Dx(head_size) where D being dimension of embedding, so when we find key say we do key = X @ key_matrix where X is an CxD dimensional matrix, C is context len. This means each head looks at the whole dimension of the embedding D and represents it a head_size vector meaning that arrows going into each head should point at every single input dim.
@mdbayazid6837
@mdbayazid6837 8 месяцев назад
Jazakallah Khairan
@周毅-b1h
@周毅-b1h День назад
I'm very thankful for your explanation of this article, best wishes for you!
@samuelbeaussant3097
@samuelbeaussant3097 7 месяцев назад
Very good lecture ! Thank you very much for putting this for free on youtube :) I have question though, if my understanding of the HiPPO framework is correct, the A matrix is built to uniformly approximate the input signal (name HiPPO LegS in the paper). "Our novel scaled Legendre measure (LegS) assigns uniform weight to all history [0, t]". But however at 41:49 you explain that it is decaying exponentially similarly to HiPPO LagT. Do they opt for HiPPO LagT when moving to s4 and Mamba or am I missing something ?
@kunchangli9319
@kunchangli9319 8 месяцев назад
Brilliant! 太棒了!
@umarjamilai
@umarjamilai 8 месяцев назад
谢谢你!
@belamipro7073
@belamipro7073 8 месяцев назад
Danke!
@umarjamilai
@umarjamilai 8 месяцев назад
Thank you very very very much for your generous support! Let's connect on LinkedIn!
@bryanbocao4906
@bryanbocao4906 3 месяца назад
Thanks for the video! Very informative! Just to check: At @1:03:42, 3. be "... save back the result to HBM."?
@alex-beamslightchanal8743
@alex-beamslightchanal8743 Месяц назад
Thanks!
@venkateshr6127
@venkateshr6127 8 месяцев назад
Please can you make video on optimizers like adam,adagrad,...
@LukasSmith827
@LukasSmith827 8 месяцев назад
you're extremely underrated, I don't think I'll be able to use much valuable info tbh.
@heewoongchoi27
@heewoongchoi27 7 месяцев назад
you are so smart!
@jason988081
@jason988081 Месяц назад
Dear Umar, referring to 53:50, recurrent SSM is indeed similar as prefix-sum (i.e., y=x_0+x_1+....x_N), but I the difference is that h_t=Ah_{t-1}+Bx_t, where h_{t_1} depends on h_{t-2}. I know how Blelloch parallel prefix scan works for calculating the sum of constants, but I do not know how parallel scan works for h_t=Ah_{t-1}+Bx_t. Could you please elaborate on it ? Thank you. @Umar
@andreanegreanu8750
@andreanegreanu8750 2 месяца назад
Hi Professor! Very good explanation as always. However, I have huge difficulties to understand the dimensions of objects. Why the hell A matrix would be of (D,N) dimensions since it is used to project a vector h_t-1 of N dimensions into N dimensions? By the way, why is it written "Represents structured N x N matrix" ?????!!!!
@aamir122a
@aamir122a 8 месяцев назад
As suggestion for your next video you can cover GTP decoder based Multi-model model.
@Huawei_Jiang
@Huawei_Jiang 6 месяцев назад
I have one question in terms of the example which you provided, 'the number of buddies'. I think the function should be like this : b(t)=5squ(3)^λt . please comment to me if I am wrong.
@RaghaVamsi
@RaghaVamsi 5 месяцев назад
You are amazing! How did you learn all this?
@baiyouheng5365
@baiyouheng5365 8 месяцев назад
great😀😀
@andreanegreanu8750
@andreanegreanu8750 2 месяца назад
Anyone here to tell me what is the dimension of h_t? I thought it was a vector (D,1), but according to the slides, it seems it is a matrix of (D,N)!!?? Thanks in advance!
@AkhoNdlodaka
@AkhoNdlodaka 3 месяца назад
PLEASE explain spacetimeformer
@RahulPrajapati-jg4dg
@RahulPrajapati-jg4dg 8 месяцев назад
Hi Umar can please upload the videos regarding details explanation of GPT architecture
@GrifinsBrother
@GrifinsBrother 8 месяцев назад
Need more code from scratch videos!
Далее
Как он понял?
00:13
Просмотров 108 тыс.
Mamba - a replacement for Transformers?
16:01
Просмотров 249 тыс.
The Reparameterization Trick
17:35
Просмотров 20 тыс.
MAMBA and State Space Models explained | SSM explained
22:27