Yuxiang "Shawn" Wang

18
12 952

Комментарии

@user-io4sr7vg1v День назад

Excellent work my friend.

@yuxiangwang9624 День назад

Thank you very much!

@xinyaoyin2238 3 дня назад

谢谢讲解:)

@yuxiangwang9624 2 дня назад

谢谢肯定哈哈哈~~

@deepakkhirey7156 6 дней назад

excellent explanation.. Thank you Shawn!!!

@yuxiangwang9624 5 дней назад

Thank you Deepak!

@xiaojinyusaudiobookswebnov4951 11 дней назад

I learned a lot from your videos. Please keep them coming. They are worth all the time and effort it takes to produce them.

@yuxiangwang9624 11 дней назад

Thank you, will do!

@zeyisun3093 14 дней назад

多头不是把高维度的token切成几片同时处理，最后再连起来？请教为什么可以理解成是问多个问题（向不同的词问不同的问题）？

@yuxiangwang9624 14 дней назад

啊我的理解是这样的--多头并没有把高纬度的token直接切成几个低维度的再分开处理，而是通过不同的线性映射把同一个token转换成不同的子空间去做计算，所以可以相当于是不同的“问题”。你觉得这个说得通吗？

@zeyisun3093 13 дней назад

@@yuxiangwang9624 那最后怎么合在一起？连起来的话，维度就超了

@yuxiangwang9624 13 дней назад

@@zeyisun3093 我是这么想的：比如模型维度512, 做QKV的时候从512被8个不同的映射去映射到8个维度64的头；做完attention以后把8个64的头连起来以后还是512；最后再做一个线性映射从512到512。

@michaelzap8528 6 дней назад

@@yuxiangwang9624你的解释非常合理。看来中国人的思维和理解方式是大同小异的。我的理解是，一旦输入数据embedding 确定了，那么所谓的维度就不变了。在学习过程中，机器通过不同的过滤器，截取不同的信息，然后通过不同的addNorm机器来生成不同的新的信息，但是这些所谓的新信息，其实都是输入数据的组合，它们是可以复原原始的输入数据信息的。因为每次都过滤掉一些信息，也就是丢失了部分信息，所以有时需要再输入或者有不同的过滤器把手丢失的信息找补回来。类似面多了加水，水多了加面。从洋人回复你的帖子看，显然他们并不真正了解这个中国人的思维方式，哪怕一些回帖者一看就是从业者。

@nxlamik1245 25 дней назад

Work on explainging things easily. It seems u have enough knowledw but you made it difficult

@yuxiangwang9624 22 дня назад

Thank you for the feedback! I'd love to work on it. Could you kindly share an example? I'll take a shot in my next video.

@MDNQ-ud1ty 20 дней назад

@@yuxiangwang9624 Likely he is a beginner and your explanations are for someone that already has some idea how these architectures work. There will always be issues trying to match "impedance" between the teacher and the student because one must match precisely what needs to be understood with what is understood with how to understand with with how to explain it and this is context(student/teacher dependent). You are just giving a basic overview of the process and generally beginners need to have their hand held and given many examples with many ideas(since they are blind). Examples and such are the best way for beginners to learn since the words used to explain things are meaningless(they do not understand or know them yet and so do not resonate). Basically you teach a child by showing them rather than explaining to them. If you try to explain to them how something works they will not understand it like you think they will. So ultimately it depends on your goal. Ideally there would be some way for youtube to have a setting in which the student and find videos that match exactly what he needs to learn optimally but that doesn't happen under capitalism(capitalism profits off inefficiencies and having an impedance mismatch is an inefficiency). So you have to accept those inefficiencies(as someone trying to teach you have to understand such things exist as the student likely won't) and realize some people(maybe many) will not understand your explanations for a multitude of reasons while a few will(because they have the right "impedance"). In general though, as someone trying to explain something to someone else all I can say is "Know your audience"("Knowing your audience is all you need"). This means that if you are targeting someone who is blind you must go through every little detail and treat them as a kid. If you are targeting someone that knows X then you can assume X and focus on Y. The more you assume the less reach you have. IMO, your explanation won't do much good for someone who hasn't ever done any actual NN training and used the basic models. E.g., knowledge is also progressive so if you want to understand calculus well you need to know algebra and to understand algebra you need to know arithmetic. Also understanding topology needs calculus but also helps one understand calculus. One of the problems with education is that it's not really streamlined and so it is very inefficient. Everyone is at different levels with very different abilities with very different lives so it can be hard for things to "match up". The good news is that anyone can create videos to try to teach others... the bad news is that the student has to sift through it all(wasting time = inefficiency) to find what works. But you won't fix this problem so you just have to do your best. I guess the "best way" currently to deal with this is to "state your assumptions" at the start. E.g., "I assume you understand neural networks and have done some basic work in them such as training RNN's and are comfortable with linear algebra jargon. The more foreign these things are to you the more you will struggle to grasp what I'm talking about". But also it is sometimes ok for someone to listen to others even when it is above their head as familiarity brings awareness(which is all learning is is becoming aware).

@bubblesaur89 4 дня назад

Your explanation is easy to follow for me

@aga5979 27 дней назад

Thank you for the very valuable explanation. But in what f ucking world do laymen speak with dot product , cosine and e to the power of time and time prime? 😅😅😂😂.

@matin2021 Месяц назад

Hi I am very happy that I was able to find your channel on RU-vid I hope you will make more videos about computer vision keep Going ✌

@yuxiangwang9624 Месяц назад

Thank you for your support!

@abhranilchandra2775 Месяц назад

This is such a great explanation, do you plan to cover the "DiT: Scalable Diffusion Models with Transformers" paper sometime soon? Thanks a lot for such wonderful and insightful explanations...

@yuxiangwang9624 Месяц назад

Thank you for the kind words! That's a good idea and let me look into it. :)

@matthewritter1117 Месяц назад

Incredible content and your style is a perfect mix of confident and relatable. Keep it up!

@yuxiangwang9624 Месяц назад

I appreciate the encouragement :)

@isiisorisiaint Месяц назад

pretty okay until andrew's attention slide, then when it comes to your own explanations things become murky, and when you get "explain" the decoder, and then the full codec, you're swiping everything under the rug in a few short seconds when in fact this is exactly the section you should have spent most of time. all in all, a nice video until adrew's slide, basically worthless afterwards

@yuxiangwang9624 Месяц назад

Thanks for the feedback! Will learn to improve :) Would you mind explain in more details on which part I was missing for the encoder details? I can look into those and see if I can add some later!

@isiisorisiaint Месяц назад

@@yuxiangwang9624 darn, i got a notification that you responded to my comment, but only the first line of your reply was shown ("Thanks for the feedback! Will learn to improve :)"), and i didn't actually open to see your full reply until now. I will be back to you with the details, sorry for the delay...

@oo_wais 2 месяца назад

one of the very few videos i found on youtube that explains the architecture very well

@yuxiangwang9624 2 месяца назад

Thank you so much for the recognition!

@chriso8285 2 месяца назад

Great voice. For fun, audition for a voice actor gig. Would look great on resume. Or on a date or at a conference. Lol

@yuxiangwang9624 2 месяца назад

Lol thanks for the compliment!

@420_gunna 2 месяца назад

Seems like a great video, subbed! 🙂

@yuxiangwang9624 2 месяца назад

Thanks for the sub! Appreciate the recognition ❤️

@s8x. 2 месяца назад

please do more videos like this

@yuxiangwang9624 2 месяца назад

Thank you! Will do :)

@OEDzn 2 месяца назад

amazing video!

@yuxiangwang9624 2 месяца назад

Thank you!

@MrMusk-it5nz 2 месяца назад

You aren't definitely a layman

@tk-og4yk 2 месяца назад

Another Video! Looking forward to watching.

@yuxiangwang9624 2 месяца назад

Haha thank you for your support! It was an old deck I made a year ago, so I might as well record it :)

@tk-og4yk 2 месяца назад

amazing video.

@yuxiangwang9624 2 месяца назад

Glad you liked it!

@EliasVansteenkiste 2 месяца назад

Is there some audio missing around 29:12? Nuancing the best positional embeding. Factorized(+)

@ryuku4966 2 месяца назад

Nice vid. Could be when you upscale it works better cause then its like the model is looking at smaller patches. An interesting ablation would have been to consider smaller patch size and check

@yuxiangwang9624 2 месяца назад

Aha that's a good explanation! Makes perfect sense to me. I appreciate the reply & feel happy that I learned more through sharing!

@AlejandroAristizabal-wo2zg 2 месяца назад

Awesome video! I've always wanted to delve into ViT but haven't had the time. This video really did help reinforce my understanding, as well as add some really insightful details into all of these new methods. Thanks!

@yuxiangwang9624 2 месяца назад

Thanks for appreciating and leaving a comment! :)