Тёмный
The ML Tech Lead!
The ML Tech Lead!
The ML Tech Lead!
Подписаться
My name is Damien, former ML Tech Lead at Meta and more than 10 years in the field of AI/ML! I share my knowledge of the field to help prepare the next generation of ML Engineers
The Position Encoding In Transformers
13:22
2 месяца назад
Understanding How LoRA Adapters Work!
12:05
2 месяца назад
The Backpropagation Algorithm Explained!
13:58
2 месяца назад
How to Approach Model Optimization for AutoML
10:35
3 месяца назад
Understanding CatBoost!
13:49
3 месяца назад
What is the Vision Transformer?
14:17
3 месяца назад
Understanding XGBoost From A to Z!
26:41
3 месяца назад
The Gradient Boosted Algorithm Explained!
5:59
3 месяца назад
Understanding How Vector Databases Work!
12:25
4 месяца назад
What is Perplexity for LLMs?
4:36
4 месяца назад
Getting a Job in AI: The Different ML Jobs
12:59
4 месяца назад
Working in AI as a Software Engineer!
49:12
11 месяцев назад
Let's Talk about AI with Etienne Bernard!
1:37:20
11 месяцев назад
Комментарии
@jacquelinecroftoon2071
@jacquelinecroftoon2071 2 дня назад
Hernandez Kenneth Walker Jessica Walker Lisa
@jacquelinecroftoon2071
@jacquelinecroftoon2071 3 дня назад
Perez Dorothy Perez Paul Martin Melissa
@bhaskartripathi
@bhaskartripathi 5 дней назад
Great video. But I would have loved if you had spent a minute also on why float32 vs bfloat16 is applied in backpropagation. But the video is still brilliant as always!
@chrisogonas
@chrisogonas 6 дней назад
Incredibly useful! Thanks Damien.
@Zoronoa01
@Zoronoa01 11 дней назад
Thank you for this great explanation!
@AnshumanAwasthi-kd7qx
@AnshumanAwasthi-kd7qx 27 дней назад
I clicked to learn , but you're speaking Embeddings. Speak English 😂
@MahSan-nv4jv
@MahSan-nv4jv 29 дней назад
Solid explanation and well explored on the problem. Please share diverse problems when possible. Thank you so much
@rachadlakis1
@rachadlakis1 Месяц назад
Thanks
@subodhsharma5097
@subodhsharma5097 Месяц назад
Very Insightful!
@nikhilshaganti5585
@nikhilshaganti5585 Месяц назад
Thank you for the video. In the code at 06:20, shouldn't we deduct 1 from cumcount to make sure we are not counting the current row?
@Pedritox0953
@Pedritox0953 Месяц назад
Great video!
@WillMoody-crmstorm
@WillMoody-crmstorm Месяц назад
Holy moly. Thank you. I thought these concepts were beyond me until watching this video. You have a serious gift for explanation
@beincheekym8
@beincheekym8 Месяц назад
thank you for the clear and concise video!
@madhu819-j6o
@madhu819-j6o 2 месяца назад
how to convert a decimal number to a bfloat16 format in verilog
@TheMLTechLead
@TheMLTechLead 2 месяца назад
I have no idea!
@bougfou972
@bougfou972 2 месяца назад
wow, very clear explanation. THank you very much for this format (much more clear than medium article)
@math_in_cantonese
@math_in_cantonese 2 месяца назад
I have a question, for pos=0 and "horizontal_index"=2, shouldn't it be PE(pos,2) = sin(pos/10000^(2/d_model)) ? I believe you used the same symbol "i" for 2 different way of indexing, right ? 7:56
@TheMLTechLead
@TheMLTechLead 2 месяца назад
Yeah you are right, I realized I made that mistake. I need to reshoot it.
@AlainDrolet-e4z
@AlainDrolet-e4z 4 дня назад
Thank you Damien, and math_in_cantonese I'm in the middle of writing a short article discussing position encoding. Damien, feel proud that you are the first reference I quote in the article! I was just going crazy trying to nail the exact meaning of "i". In Damien's video it is clear he means "i" the dimension index, and the values shown with sin/cos match. But now I could not make any logic of this understanding with the equation formulation below: PE(pos,2i) = sin(pos/10000^2i/dmodel) PE(pos,2i+1) = cos(pos/10000^2i/dmodel) If see this as PE(pos, 0) referreing to the first column (column zero) and, say, PE(pos,5) as referring to the sixth column (column 5), with 5 = 2i+1 => i = (5-1)/2 = 2. So "i" is more like the index of a (sin,cos) pair of dimensions. Its range is d_model/2. The original sin (😄, pun intended) is in the Attention is all you need. There they simply state: > where pos is the position and i is the dimension This is wrong, it seems, 2i and 2i+1 are the dimensions. In any case big thank you Damien, I have watched, many of your videos. They are quite useful in ramping me up on LLM and the rest. Merci beaucoup Alain
@TemporaryForstudy
@TemporaryForstudy 2 месяца назад
nice. but i have one doubt. like how adding sine and cosine values ensuring that we are encoding the positions. like how did the author come to this conclusion why not other values?
@TheMLTechLead
@TheMLTechLead 2 месяца назад
The sine and cosine functions provide smooth and continuous representations, which help in learning the relative positions effectively. For example, the encoding for positions k and k+1 will be similar, reflecting their proximity in the sequence. The frequency-based sinusoidal functions allow the encoding to generalize to sequences of arbitrary length without needing to re-learn positional information for different sequence lengths. The model can understand relative positions beyond the length of sequences seen during training. The combination of sine and cosine functions ensures that each position has a unique encoding. The orthogonality property of these functions helps in distinguishing between different positions effectively, even for long sequences. The different frequencies used in the positional encodings allow the model to capture both short-term and long-term dependencies within the sequence. Higher frequency components help in understanding local relationships, while lower frequency components help in capturing global structures. Also, sinusoidal functions are differentiable, which is crucial for backpropagation during training. This ensures that the model can learn to use the positional encodings effectively through gradient-based optimization methods.
@do-yeounlee7202
@do-yeounlee7202 2 месяца назад
Thanks for the clear explanation. I've watched a few of your videos and follow you on LinkedIn, and I can say that you're killing it brother. Also love the simplicity in your infographics that you have in your videos. Do you get them from elsewhere or do you make it yourself?
@TheMLTechLead
@TheMLTechLead 2 месяца назад
I make them myself. Takes me most of my time!
@do-yeounlee7202
@do-yeounlee7202 2 месяца назад
@@TheMLTechLead Respect! What do you use to make them?
@TheMLTechLead
@TheMLTechLead 2 месяца назад
@@do-yeounlee7202 I use canva.com
@karnaghose4784
@karnaghose4784 2 месяца назад
Great explanation 👍🏻
@bassimeledath2224
@bassimeledath2224 2 месяца назад
Excellent. Good ML system design videos are hard to find on RU-vid so really appreciate this!
@TheMLTechLead
@TheMLTechLead 2 месяца назад
Glad it was helpful!
@adityagupta4465
@adityagupta4465 2 месяца назад
Really well explained. You've earned a subscriber 🎉
@passportkaya
@passportkaya 2 месяца назад
not really. I'm a US citizen been all over Europe. I say it's the same .
@TheMLTechLead
@TheMLTechLead 2 месяца назад
How long have you lived in Europe and what countries exactly?
@sebastianguerrero5626
@sebastianguerrero5626 2 месяца назад
nice content, keep it up!
@TheMLTechLead
@TheMLTechLead 2 месяца назад
Thanks, will do!
@EmpreendedoresdoBEM
@EmpreendedoresdoBEM 2 месяца назад
very clear explanation. thanks
@naatcollections7976
@naatcollections7976 2 месяца назад
I like your channel
@TheMLTechLead
@TheMLTechLead 2 месяца назад
Thank you!
@godzilllla2452
@godzilllla2452 2 месяца назад
I've got it now. I wonder why we can't calculate the x gradient by starting the backward pass closer to x instead of going through all the activations.
@TheMLTechLead
@TheMLTechLead 2 месяца назад
I am not sure I understand the question.
@mateuszsmendowski2677
@mateuszsmendowski2677 2 месяца назад
One of the best explanations on RU-vid. Substantively and visually at the highest level :) Are you able to share those slides e.g. via Git?
@TheMLTechLead
@TheMLTechLead 2 месяца назад
I cannot share the slide but you can see the diagrams in my newsletter: newsletter.theaiedge.io/p/understanding-the-self-attention
@zeeshankhanyousafzai5229
@zeeshankhanyousafzai5229 2 месяца назад
@milleniumsalman1984
@milleniumsalman1984 2 месяца назад
too good
@milleniumsalman1984
@milleniumsalman1984 2 месяца назад
great video
@milleniumsalman1984
@milleniumsalman1984 2 месяца назад
good video
@Snerdy0867
@Snerdy0867 2 месяца назад
Phenomenal visuals and explanations. Best video on this concept I've ever seen.
@TheMLTechLead
@TheMLTechLead 2 месяца назад
I am liking reading that!
@IkhukumarHazarika
@IkhukumarHazarika 3 месяца назад
Is it rnn 😅
@IkhukumarHazarika
@IkhukumarHazarika 3 месяца назад
Love the way you teach every point please start teaching this way
@IkhukumarHazarika
@IkhukumarHazarika 3 месяца назад
More good content indeed good one❤
@AbuzarbhuttaG
@AbuzarbhuttaG 3 месяца назад
💯💯💯
@faysoufox
@faysoufox 3 месяца назад
Thank you for your videos
@math_in_cantonese
@math_in_cantonese 3 месяца назад
I will use your videos as interview refresher....... It is so easy to forget about the details when everyday work floods in for a period of years.
@TheMLTechLead
@TheMLTechLead 3 месяца назад
I am glad to read that!
@math_in_cantonese
@math_in_cantonese 3 месяца назад
Thanks, I forgot some details about Gradient Boosted Algorithm and I was too lazy to look it up.
@vivek2319
@vivek2319 3 месяца назад
Please make more videos
@TheMLTechLead
@TheMLTechLead 3 месяца назад
Well I do!
@jairjuliocc
@jairjuliocc 3 месяца назад
Thanks You.Can you explain the entire self attention flow? (from postional encode to final next word prediction). I think it will be an entire series 😅
@TheMLTechLead
@TheMLTechLead 3 месяца назад
It is coming! It will take time
@CrypticPulsar
@CrypticPulsar 3 месяца назад
Thank you, Damien!!
@va940
@va940 3 месяца назад
Very good advice ❤
@va940
@va940 3 месяца назад
Awesome
@elmoreglidingclub3030
@elmoreglidingclub3030 3 месяца назад
Excellent!! Very good explanation. I need to work on my ear for French. But pausing and backing up the video helped. Great stuff!!
@TheMLTechLead
@TheMLTechLead 3 месяца назад
My accent + my speaking skills are my weaknesses. Working on it and I think I am improving!
@elmoreglidingclub3030
@elmoreglidingclub3030 3 месяца назад
@@TheMLTechLead Thanks for your reply but absolutely no apology necessary!! I think it is an excellent video and helpful information. Much appreciation for posting. I am a professor in a business school and always looking for insights into how to teach the technical side of technology in the context of business. Your explanation has been very helpful.
@Gowtham25
@Gowtham25 3 месяца назад
It's really good and usefull... Expecting for training an llm from the scratch for the next and interested in KAN-FORMER...
@astudent8885
@astudent8885 3 месяца назад
ML is a black box but boosting seems to be more interpretable (potentially) if we can make the trees more sparse and orthogonal
@TheMLTechLead
@TheMLTechLead 3 месяца назад
Tree-based method can naturally be used to measure Shapley values without approximation: shap.readthedocs.io/en/latest/tabular_examples.html
@astudent8885
@astudent8885 3 месяца назад
Do you mean that the new tree is predicting the error? In that case, wouldn't you subtract the new prediction from the previous predictions
@TheMLTechLead
@TheMLTechLead 3 месяца назад
So we have an ensemble of trees F that predicts y such that F(x) = \hat{y}. The error is y - F(x) = e. We want to add a tree that predicts the error T(x) = \hat{e} = e + error = y - F(x) + error. Therefore F(x) + T(x) = y + error
@siddharthsingh7281
@siddharthsingh7281 3 месяца назад
share the resources in description
@MCroppered
@MCroppered 3 месяца назад
Why
@MCroppered
@MCroppered 3 месяца назад
“Give me the exam solutions pls”
@py2992
@py2992 3 месяца назад
Thank you for this video !