I left ML to do web development many years ago because it pays more and I kinda needed the money. I am really glad this channel exists as it helps me keep in touch. . Please keep making videos after you become Dr. Yannic Kilcher :)
@@jean-pierrecoffe6666 The money is good. I just bought an RTX card. I think life as a hobbyist researcher is for me. Not having to worry about tenure and citations or guide is a good thing. I can focus on what to do and when to do it without someone commanding me all the time.
It's great u are uploading daily...How do u manage to keep up with first reading the paper then recording it...This looks like time consuming task. Do you work also?
Great videos, but I won't be telling about your videos anyone from my family. It's a secret between you, me... and all of my ML friends who are now watching :D
Despite of they are aware of i still don't understand why cite LTH in the title (maybe because it's trending now?). Aside from that i've found the heatmaps about bert modules very interesting in a optic of "Module Selection" but the part of good/bad submodel unhelpful because is not what LTH actually says
I have 3 remarks. FYI, I haven't read the original paper because I like you and your explanation more than papers ;-). In 15:49, you said that in the pruning process of Lottery ticket hypothesis it could start with accuracy of 100% and in the beginning of pruning the accuracy could be improved a bit. What is it means by an accuracy of higher than 100%. In 27:32, we should pay attention to the fact that the darkness of these hitmaps are not comparable. For example mean of 3.00 in figure (c) is yellowish but the mean of 3.00 in figure (a) is orangish. In 33:25, if I understand well there is a big assumption that says the more common heads or MLPs means two tasks have more in common. But it only means these tasks profits from almost same architecture not same weights. It's like we conclude two classification tasks which result good accuracy with same MLP architecture have something in common. (In 38:36 you mentioned that the results are without retraining after pruning, however models are fine-tuned with different seeds for each GLUE tasks (paper page 4, second paragraph) so the weights can not be same.)
- Sorry, I meant 100% of the original baseline accuracy, not 100% total :) - True, very well observed, thanks. - It's true what you're saying. I guess the assumption is that fine-tuning doesn't change the weights too much, so if two tasks use the same attention heads, they are likely to share the same information
#AI vs #Powerball 5 of 5 w/Powerball 100% 1 star 100% time 80% brute force 0,0136% win money 197733% We use artificial intelligence forecasts for the sake of money. Can you imagine how many problems we could solve with AI