Mamba and S4 Explained: Architecture, Parallel Scan, Kernel Fusion, Recurrent, Convolution, Math

Umar Jamil

Подписаться 41 тыс.

Просмотров 41 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

26 сен 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 137

@andrewhaslam8785 8 месяцев назад

Brilliant - you are easily one of the most lucid and accessible teachers of deep learning.

@ItsRyanStudios 8 месяцев назад

this is absolutely FANTASTIC I watched Albert Gu's stanford lecture on state space models/ Mamba, and it was a great high level overview. But I really appreciate you taking it slower, and going farther into detail on the basic/ fundamental concepts. A lot of us aren't mathematicians or ML engineers, so it's much appreciated to be helped along with those concepts.

@umarjamilai 8 месяцев назад

Thank you for your kind words. Please share the video in your network, it would help me a lot. Thanks!

@danaosama4247 6 месяцев назад

I rarely comment on videos, but this one was worth it. Thank you so much for such a clear explanation. You explained all the nuances that I previously did not understand in a very clear way. God bless you.

@anirudh514 8 месяцев назад

Your teaching approach is very good. You started from fundamental concepts and went deeper. This helped in gaining intuitions, understanding and avoid confusions in later part. Brilliant!

@SatyanarayanSenapati-b1s Месяц назад

Words will fall short to appreciate the work you put to create these videos. Simply BRILLIANT.

@trungquang1581 6 месяцев назад

I just read about mamba and wanted to find a detailed explanation video. All you covered in this video is everything I need, thank you so much, keep on cooking

@흰강아지-s4v 4 дня назад

this is just a pure art; thanks so much

@sid-prod 8 месяцев назад

I'm so glad i found this channel, you are a gold mine for such content, please keep them coming.

@jiegong529 3 месяца назад

You are just too amazing! You can understand these stuff in great detail. Then you take the time and explain to us in educative videos. A true gem channel!

@aruns.v9248 8 месяцев назад

The whole lecture was very intuitive. Thanks for the efforts put into building this video!

@purohitadey-bc9bg 4 месяца назад

Understanding mamba couldn't be better than this !

@remyshootingstars 8 месяцев назад

🙌 Still working through Transformers from scratch. Hopefully a Mamba from scratch is in the future!

@sari54754 8 месяцев назад

After I saw this lecture, I subscribed your channel. It is the most easy to understand Mamba lecture I've seen.

@AUTO-g7s 8 месяцев назад

作为一个来自北京的大学生，谢谢你分享的这篇文章解析！best wishes！

@myfolder4561 5 месяцев назад

Thank you so much. Lots of useful details yet you curate through them at such a good tempo with easy to follow examples

@a123s1l 3 дня назад

Thanks for your clear explanation of MAMBA, coming from a control theory background, very much appreciate its usage in LLMs. One small error that I noted was that the A matrix must be N x N to translate the previous N-dimensional hidden states h(t-1) to h(t). I believe the A matrix is also time-varying to produce selective output tokens.

@TheFitsome 28 дней назад

some people are just born to teach.

@trevorhobenshield 8 месяцев назад

Very high quality, this is great. Hard to find good content like this. Thanks Umar!

@mudassirkhan9054 8 месяцев назад

Thanks for explaining it in a way that anyone with some high school math background can understand, keep this up!

@arvyzukai 8 месяцев назад

This is gold! I really appreciate attention to the details. Thank you Umar!

@celestchowdhury2605 7 месяцев назад

Thank you so much for your detailed video and thoughtful thinking of you that we will need help with the equations! You are a savior!

@我我-p3z 2 месяца назад

最清晰的讲解！

@Mirai12377 7 месяцев назад

very good video!!! thanks a lot for your efforts!!!!

@selayan4985 3 месяца назад

Such a briliant work you have done. Really learned a lot, thanks!!!

@The_bioinformatician 6 месяцев назад

This is the best deep learning video I've ever seen. I will surely use some of your slides to teach my students

@akshikaakalanka 5 месяцев назад

This is really helpful for another talk I am doing on Mamba. Thank you very much for putting this out.

@fabiogomez8250 8 месяцев назад

Best MAMBA video at the moment!

@wayneqwele8847 7 месяцев назад

Thank you. I appreciate the approach you took in explaining the major concepts.

@beincheekym8 4 месяца назад

Brilliant video! Really clear and with just the right amount of details!

@ankush4617 8 месяцев назад

Thanks for the amazing work as usual! Keep it up - this is probably one of the highest quality content on LLMs on youtube.

@optomosprime 8 месяцев назад

Excited for the video. I was searching for a video on Mamba and today I saw this. Your Transformer video helped me alot previously. Keep it up!

@danamics Месяц назад

Great job on this video! I learned a lot

@majidemami577 8 месяцев назад

Excellent video! Thank you. I have watched a few videos about mamba and this one was by far the best.

@mcHsyu 8 месяцев назад

Great explanation!! This is the first video that mekes me comprenhad the whole mamba paper.

@nishanthshetty435 7 месяцев назад

Thanks a ton! Excellent explanation and great analogies to introduce the more advanced material. This is an absolute masterclass on how to teach advanced material.

@prashlovessamosa 8 месяцев назад

Salute to consistency Thanks Umar sir.

@GenAiWarrior 8 месяцев назад

Thank you so much for your efforts to make such an amazing video on Mamba architecture !!

@ActualCode0 8 месяцев назад

This is one of the best ML explanations I've seen even though I didn't understand all of it but I definitely learnt something new.

@shoubhikdasguptadg9911 6 месяцев назад

Ohhh Man, why did I discover this gem so late :( This guy is a rockstar!

@BooleanDisorder 7 месяцев назад

Even I understood much of this. I have no education. Thank you! Mamba looks really cool. Especially like the long context and further refinement. It looks like a model that could be made to learn as it goes. Plasticity potential

@Erosis 8 месяцев назад

As others have mentioned, you have a keen ability to explain difficult topics succinctly and completely. Keep up the awesome work! I could of used this when I took a class on time-series modeling! Hah!

@SpandanMishra-z4r 8 месяцев назад

OMG ! this is such as amazing description , you made my day

@TheRohit901 7 месяцев назад

Amazing explanation. I love this video because it covers sufficient depth and explains each concept with proper examples. I've subscribed instantly, and look forward to more such videos on recent papers.

@allengeng6660 5 месяцев назад

Very nice talk, thank you.

@raaminakbari 6 месяцев назад

Thank you for this great and smooth explanation. I think the model you are showing at 36:14 is valid if matrix A ( and B also to send each input directly to the corresponding ssm) is diagonal. Now in this way each hidden state at different canonical direction ( or different element of the vector) is independent of each other. So if A is not diagonal then assuming an eigen decomposition exist, then we may say there exist an equivalent ssm which can be represented independent ( if we change the basis to eigen basis) .

@kwanhowong5065 7 месяцев назад

Really an amazing video! You save me a lot of time! Thank you!

@eafadeev 6 месяцев назад

You're making very useful content, thank you!!! Maybe you could consider using larger text, so that one could read easily from a phone. Also a plus would be if the presentation were white on black (or bright color on black), it is less tiring to look at a dark screen for long periods of time.

@luisrperaza 7 месяцев назад

I did learn a lot! Many thanks for making this video.

@pcwang7803 5 месяцев назад

Great lecture! It is easier for me to understand the work with your lecture. Can you give one for Reinforcement learning?

@whisperlast6548 6 месяцев назад

This video is of great help!!Thank you very much.

@soroushmehraban 6 месяцев назад

Love it! Keep up the amazing work.

@bulat_15 7 месяцев назад

Thanks man! This helped me a lot

@팽도리-v6s 3 месяца назад

Amazing video.

@EkShunya 8 месяцев назад

i always eagerly wait for your explainer. they are 🤯. thank you :)

@tunatuncer5639 6 месяцев назад

wow that's a great explanation , thanks for the efforts!

@m1k3b7 6 месяцев назад

Brilliant explanations. Thanks.

@artaasadi9497 7 месяцев назад

Thanks a lot that was very useful!

@walidmaly3 7 месяцев назад

One of the best! I have one question if we apply conv in S4 on sequence of length L, what will be size of conv layer?

@rezagholipoor7900 Месяц назад

It was very informative

@alainrieger6905 2 месяца назад

Awesome video as usual

@amitshukla1495 8 месяцев назад

Absolutely amazing 🎉

@buh357 5 месяцев назад

you are the best.

@sayandas13 2 месяца назад

Awesome explanation. Really appreciate such content. Can you please make a similar explanation video on the Mamba-2 paper?

@akashkumar-jg4oj 8 месяцев назад

Great explanation!

@НикитаБуров-ъ6р 6 месяцев назад

i've just started watching but guess this vid'll be much usefull

@nguyenhuuuc2311 8 месяцев назад

Thanks for the awesome content! Hope the next one will be about DPO and coding it from scratch ❤

@umarjamilai 5 месяцев назад

You're welcome: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-hvGa5Mba4c8.html

@nguyenhuuuc2311 5 месяцев назад

@@umarjamilai Thank you!!! You're so talented at research and teaching!!!!

@GoogleColab003 7 месяцев назад

absolutely fantastic

@divgill6062 8 месяцев назад

Amazing! So detailed. Well done sir

@810602jay 8 месяцев назад

Thanks Umar! 🥰Very amazing learning material for Mamba!

@dotori-hj 6 месяцев назад

Fantastic

@杨辉-l2g 8 месяцев назад

excellent work! Thank you

@Charles-Darwin 7 месяцев назад

Thank you

@ShubhamAshokGandhi 7 месяцев назад

Great explanation. Very through. Loved it. I struggled with understanding the SSM paper. You explained all the bits beautifully

@HosseinKhosravipour 2 месяца назад

very great

@toxicbisht4344 7 месяцев назад

amazing explanation waiting for new video please upload soon

@edsonjr6972 8 месяцев назад

Excellent video! I'm looking forward if you do a coding one. Thank you so much for your work to the AI community

@umarjamilai 8 месяцев назад

Coding one is not very interesting, because the most interesting part is the selective scan algorithm, which is a CUDA Kernel. The architecture is not so different from any other language model. Of course it would be super cool to code the CUDA kernel from scratch ;-)

@SandeepS-i4e Месяц назад

Great❤

@123456ewr 8 месяцев назад

Thanks, i hope you explain rwkv

@passarodavide 8 месяцев назад

Bellissimo video, grazie!

@umarjamilai 8 месяцев назад

Grazie a te!

@umuthalil5001 6 месяцев назад

Hi, I was wondering if you could explain 36:40 a bit more where you talk about multi head attention. From what I understand each head in multi-head attention each head looks at the whole input vector. Our key value and query matrices are all of size Dx(head_size) where D being dimension of embedding, so when we find key say we do key = X @ key_matrix where X is an CxD dimensional matrix, C is context len. This means each head looks at the whole dimension of the embedding D and represents it a head_size vector meaning that arrows going into each head should point at every single input dim.

@mdbayazid6837 8 месяцев назад

Jazakallah Khairan

@周毅-b1h День назад

I'm very thankful for your explanation of this article, best wishes for you!

@samuelbeaussant3097 7 месяцев назад

Very good lecture ! Thank you very much for putting this for free on youtube :) I have question though, if my understanding of the HiPPO framework is correct, the A matrix is built to uniformly approximate the input signal (name HiPPO LegS in the paper). "Our novel scaled Legendre measure (LegS) assigns uniform weight to all history [0, t]". But however at 41:49 you explain that it is decaying exponentially similarly to HiPPO LagT. Do they opt for HiPPO LagT when moving to s4 and Mamba or am I missing something ?

@kunchangli9319 8 месяцев назад

Brilliant! 太棒了！

@umarjamilai 8 месяцев назад

谢谢你！

@belamipro7073 8 месяцев назад

Danke!

@umarjamilai 8 месяцев назад

Thank you very very very much for your generous support! Let's connect on LinkedIn!

@bryanbocao4906 3 месяца назад

Thanks for the video! Very informative! Just to check: At @1:03:42, 3. be "... save back the result to HBM."?

@alex-beamslightchanal8743 Месяц назад

Thanks!

@venkateshr6127 8 месяцев назад

Please can you make video on optimizers like adam,adagrad,...

@LukasSmith827 8 месяцев назад

you're extremely underrated, I don't think I'll be able to use much valuable info tbh.

@heewoongchoi27 7 месяцев назад

you are so smart!

@jason988081 Месяц назад

Dear Umar, referring to 53:50, recurrent SSM is indeed similar as prefix-sum (i.e., y=x_0+x_1+....x_N), but I the difference is that h_t=Ah_{t-1}+Bx_t, where h_{t_1} depends on h_{t-2}. I know how Blelloch parallel prefix scan works for calculating the sum of constants, but I do not know how parallel scan works for h_t=Ah_{t-1}+Bx_t. Could you please elaborate on it ? Thank you. @Umar

@andreanegreanu8750 2 месяца назад

Hi Professor! Very good explanation as always. However, I have huge difficulties to understand the dimensions of objects. Why the hell A matrix would be of (D,N) dimensions since it is used to project a vector h_t-1 of N dimensions into N dimensions? By the way, why is it written "Represents structured N x N matrix" ?????!!!!

@aamir122a 8 месяцев назад

As suggestion for your next video you can cover GTP decoder based Multi-model model.

@Huawei_Jiang 6 месяцев назад

I have one question in terms of the example which you provided, 'the number of buddies'. I think the function should be like this : b(t)=5squ(3)^λt . please comment to me if I am wrong.

@RaghaVamsi 5 месяцев назад

You are amazing! How did you learn all this?

@baiyouheng5365 8 месяцев назад

great😀😀

@andreanegreanu8750 2 месяца назад

Anyone here to tell me what is the dimension of h_t? I thought it was a vector (D,1), but according to the slides, it seems it is a matrix of (D,N)!!?? Thanks in advance!