Тёмный

When BERT Plays the Lottery, All Tickets Are Winning (Paper Explained) 

Yannic Kilcher
Подписаться 266 тыс.
Просмотров 30 тыс.
50% 1

Опубликовано:

 

27 окт 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 26   
@herp_derpingson
@herp_derpingson 4 года назад
I left ML to do web development many years ago because it pays more and I kinda needed the money. I am really glad this channel exists as it helps me keep in touch. . Please keep making videos after you become Dr. Yannic Kilcher :)
@jean-pierrecoffe6666
@jean-pierrecoffe6666 4 года назад
Hey, how do you like your choice so far?
@herp_derpingson
@herp_derpingson 4 года назад
@@jean-pierrecoffe6666 The money is good. I just bought an RTX card. I think life as a hobbyist researcher is for me. Not having to worry about tenure and citations or guide is a good thing. I can focus on what to do and when to do it without someone commanding me all the time.
@ivishal1990
@ivishal1990 4 года назад
It's great u are uploading daily...How do u manage to keep up with first reading the paper then recording it...This looks like time consuming task. Do you work also?
@vinayreddy8683
@vinayreddy8683 4 года назад
he is doing PhD at ETH Zürich!!
@dippatel1739
@dippatel1739 4 года назад
@@vinayreddy8683 that's sums up everything.😅
@ivishal1990
@ivishal1990 4 года назад
I am also doing phd but it is tough to manage time. I am genuinely looking for tips here.
@norik1616
@norik1616 4 года назад
Great videos, but I won't be telling about your videos anyone from my family. It's a secret between you, me... and all of my ML friends who are now watching :D
@YannicKilcher
@YannicKilcher 4 года назад
don't worry, they'll come around ;)
@mikhaildoroshenko2169
@mikhaildoroshenko2169 4 года назад
Just FYI. There is a high-frequency noise in the video all the way until 47:45. Nothing critical, but still a little bit annoying.
@YannicKilcher
@YannicKilcher 4 года назад
I know I've already tried filtering it out, but thanks :)
@videoby1994
@videoby1994 4 года назад
Despite of they are aware of i still don't understand why cite LTH in the title (maybe because it's trending now?). Aside from that i've found the heatmaps about bert modules very interesting in a optic of "Module Selection" but the part of good/bad submodel unhelpful because is not what LTH actually says
@meysam010101
@meysam010101 4 года назад
I have 3 remarks. FYI, I haven't read the original paper because I like you and your explanation more than papers ;-). In 15:49, you said that in the pruning process of Lottery ticket hypothesis it could start with accuracy of 100% and in the beginning of pruning the accuracy could be improved a bit. What is it means by an accuracy of higher than 100%. In 27:32, we should pay attention to the fact that the darkness of these hitmaps are not comparable. For example mean of 3.00 in figure (c) is yellowish but the mean of 3.00 in figure (a) is orangish. In 33:25, if I understand well there is a big assumption that says the more common heads or MLPs means two tasks have more in common. But it only means these tasks profits from almost same architecture not same weights. It's like we conclude two classification tasks which result good accuracy with same MLP architecture have something in common. (In 38:36 you mentioned that the results are without retraining after pruning, however models are fine-tuned with different seeds for each GLUE tasks (paper page 4, second paragraph) so the weights can not be same.)
@YannicKilcher
@YannicKilcher 4 года назад
- Sorry, I meant 100% of the original baseline accuracy, not 100% total :) - True, very well observed, thanks. - It's true what you're saying. I guess the assumption is that fine-tuning doesn't change the weights too much, so if two tasks use the same attention heads, they are likely to share the same information
@smnt
@smnt 4 года назад
Hey Yannic, amazing videos, really have learned a lot from you. What software do you use for these videos to let you markup the pdfs?
@YannicKilcher
@YannicKilcher 4 года назад
I use OneNote
@sharadchandakacherla8268
@sharadchandakacherla8268 Год назад
Just great.
@ДмитрийЛжетцов
@ДмитрийЛжетцов 4 года назад
Can we combine several different fine-tuned "winning tickets" in one model?
@YannicKilcher
@YannicKilcher 4 года назад
Sure why not :D
@rachitbansal7485
@rachitbansal7485 4 года назад
Yannic, what is your PhD thesis about?
@YannicKilcher
@YannicKilcher 4 года назад
TBD ;)
@JoshFlorii
@JoshFlorii 4 года назад
oh what the heck, i subscribed. do you work at google?
@YannicKilcher
@YannicKilcher 4 года назад
I was an intern previously.
@glennkroegel1342
@glennkroegel1342 4 года назад
I hope you're dual booting Linux lol.
@imatimetraveler5760
@imatimetraveler5760 4 года назад
Only post these kinda videos after you win first 🤔🤔🤔
@forecastinglottery6153
@forecastinglottery6153 3 года назад
#AI vs #Powerball 5 of 5 w/Powerball 100% 1 star 100% time 80% brute force 0,0136% win money 197733% We use artificial intelligence forecasts for the sake of money. Can you imagine how many problems we could solve with AI
Далее
The Lottery Ticket Hypothesis and pruning in PyTorch
38:07
Were RNNs All We Needed? (Paper Explained)
27:48
Просмотров 48 тыс.
Generative Model That Won 2024 Nobel Prize
33:04
Просмотров 152 тыс.
Rethinking Attention with Performers (Paper Explained)
54:39
The Oldest Unsolved Problem in Math
31:33
Просмотров 11 млн
Why Does Diffusion Work Better than Auto-Regression?
20:18