Тёмный
No video :(

09L - Differentiable associative memories, attention, and transformers 

Alfredo Canziani
Подписаться 39 тыс.
Просмотров 9 тыс.
50% 1

Опубликовано:

 

28 авг 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 12   
@user-co6pu8zv3v
@user-co6pu8zv3v 3 года назад
Thanks, Alfredo! :) Two hours of lecture passed like a few moments
@alfcnz
@alfcnz 3 года назад
Aww 🥰🥰🥰
@anondoggo
@anondoggo Год назад
Timestamps: 00:00:00 - Motivation for reasoning & planning 00:09:11 - Inference through energy minimization 00:18:08 - Disclaimer 00:19:02 - Planning through energy minimization 00:32:59 - Q&A Optimal control diagram 00:39:23 - Differentiable associative memory and attention 01:01:03 - Transformers 01:08:14 - Q&A Other differentiable attention architectures 01:10:32 - Transformer architecture 01:27:54 - Transformer applications: 1. Multilingual transformer Architecture XML-R 01:30:16 - 2. Supervised symbol manipulation 01:32:14 - 3. NL understanding & generation 01:36:51 - 4. DETR 01:46:47 - Planing through optimal control 01:55:37 - Conclusion
@alfcnz
@alfcnz Год назад
Thanks a bunch!
@buoyrina9669
@buoyrina9669 2 года назад
Have been feeling like in a philosophy class :)
@alfcnz
@alfcnz 2 года назад
😮😮😮
@SanataniAryavrat
@SanataniAryavrat 3 года назад
thanks for sharing Alfredo, you are awesome!
@alfcnz
@alfcnz 3 года назад
🥰🥰🥰
@mortezism
@mortezism Год назад
The answer to the question yann asked by ChatGPT: Germany shares a border with several countries, including Austria, Belgium, Czech Republic, Denmark, France, Luxembourg, Netherlands, Poland, and Switzerland. It is difficult to say which of these countries has the largest commercial exchanges with China, as this can change over time and may vary depending on the specific goods and services being traded. Furthermore, without access to current information, I am unable to provide a definitive answer.
@AdityaSanjivKanadeees
@AdityaSanjivKanadeees 2 года назад
For masking, is there a strategy to remove words instead of random masking, as if the object of interest, eg: curtain @1:29:19 were to be removed from both English and French, wouldn't it make the prediction task much more difficult, as a lot of objects could be substituted in its place.
@oguzhanercan4701
@oguzhanercan4701 2 года назад
Yann gets sad at 1:26:09 while he is talking about attention mecanishm might take the place of convolution at images :/
@alfcnz
@alfcnz 2 года назад
🥺🥺🥺
Далее
Sepp Hochreiter: Memory Architectures for Deep Learning
1:21:22
13L - Optimisation for Deep Learning
1:51:32
Просмотров 7 тыс.
02L - Modules and architectures
1:42:27
Просмотров 22 тыс.
04L - ConvNet in practice
51:41
Просмотров 10 тыс.
Transformers explained | The architecture behind LLMs
19:48