@@yuxiangwang9624 Likely he is a beginner and your explanations are for someone that already has some idea how these architectures work. There will always be issues trying to match "impedance" between the teacher and the student because one must match precisely what needs to be understood with what is understood with how to understand with with how to explain it and this is context(student/teacher dependent). You are just giving a basic overview of the process and generally beginners need to have their hand held and given many examples with many ideas(since they are blind). Examples and such are the best way for beginners to learn since the words used to explain things are meaningless(they do not understand or know them yet and so do not resonate). Basically you teach a child by showing them rather than explaining to them. If you try to explain to them how something works they will not understand it like you think they will. So ultimately it depends on your goal. Ideally there would be some way for youtube to have a setting in which the student and find videos that match exactly what he needs to learn optimally but that doesn't happen under capitalism(capitalism profits off inefficiencies and having an impedance mismatch is an inefficiency). So you have to accept those inefficiencies(as someone trying to teach you have to understand such things exist as the student likely won't) and realize some people(maybe many) will not understand your explanations for a multitude of reasons while a few will(because they have the right "impedance"). In general though, as someone trying to explain something to someone else all I can say is "Know your audience"("Knowing your audience is all you need"). This means that if you are targeting someone who is blind you must go through every little detail and treat them as a kid. If you are targeting someone that knows X then you can assume X and focus on Y. The more you assume the less reach you have. IMO, your explanation won't do much good for someone who hasn't ever done any actual NN training and used the basic models. E.g., knowledge is also progressive so if you want to understand calculus well you need to know algebra and to understand algebra you need to know arithmetic. Also understanding topology needs calculus but also helps one understand calculus. One of the problems with education is that it's not really streamlined and so it is very inefficient. Everyone is at different levels with very different abilities with very different lives so it can be hard for things to "match up". The good news is that anyone can create videos to try to teach others... the bad news is that the student has to sift through it all(wasting time = inefficiency) to find what works. But you won't fix this problem so you just have to do your best. I guess the "best way" currently to deal with this is to "state your assumptions" at the start. E.g., "I assume you understand neural networks and have done some basic work in them such as training RNN's and are comfortable with linear algebra jargon. The more foreign these things are to you the more you will struggle to grasp what I'm talking about". But also it is sometimes ok for someone to listen to others even when it is above their head as familiarity brings awareness(which is all learning is is becoming aware).
Thank you for the very valuable explanation. But in what f ucking world do laymen speak with dot product , cosine and e to the power of time and time prime? 😅😅😂😂.
This is such a great explanation, do you plan to cover the "DiT: Scalable Diffusion Models with Transformers" paper sometime soon? Thanks a lot for such wonderful and insightful explanations...
pretty okay until andrew's attention slide, then when it comes to your own explanations things become murky, and when you get "explain" the decoder, and then the full codec, you're swiping everything under the rug in a few short seconds when in fact this is exactly the section you should have spent most of time. all in all, a nice video until adrew's slide, basically worthless afterwards
Thanks for the feedback! Will learn to improve :) Would you mind explain in more details on which part I was missing for the encoder details? I can look into those and see if I can add some later!
@@yuxiangwang9624 darn, i got a notification that you responded to my comment, but only the first line of your reply was shown ("Thanks for the feedback! Will learn to improve :)"), and i didn't actually open to see your full reply until now. I will be back to you with the details, sorry for the delay...
Nice vid. Could be when you upscale it works better cause then its like the model is looking at smaller patches. An interesting ablation would have been to consider smaller patch size and check
Awesome video! I've always wanted to delve into ViT but haven't had the time. This video really did help reinforce my understanding, as well as add some really insightful details into all of these new methods. Thanks!