Тёмный

MetaFormer is Actually What You Need for Vision 

Soroush Mehraban
Подписаться 3 тыс.
Просмотров 865
50% 1

In this video, we explore the revolutionary MetaFormer architecture, which extends the highly successful transformer idea from natural language processing into the world of computer vision. MetaFormer introduces a general architecture block that guarantees exceptional performance, regardless of the specific task at hand.
To demonstrate this claim, the researchers behind MetaFormer present an embarrassingly simple yet incredibly effective architecture called PoolFormer. Surprisingly, this architecture outperforms the popular Swin Transformer model with significantly fewer parameters and computations.
Join us to learn about the exciting potential of MetaFormer in transforming the field of computer vision and enabling breakthroughs in a wide range of applications.
Paper link: arxiv.org/abs/2111.11418
Table of Content:
00:00 Introduction
01:46 ConvNeXt resemblance
02:48 PoolFormer block
06:10 Model Architecture
08:16 Result
Icon made by Freepik from flaticon.com

Опубликовано:

 

28 июн 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 6   
@francisferri2732
@francisferri2732 Год назад
Very good video. It is incredible to know how pooling can make the network more efficient
@Raulvic
@Raulvic Год назад
Thank you for sharing
@soroushmehraban
@soroushmehraban Год назад
Glad you liked it!
@rohollahhosseyni8564
@rohollahhosseyni8564 9 месяцев назад
Well explained
@shilashm5691
@shilashm5691 11 месяцев назад
swin-mixer is a another variant of mlp-mixer which uses swish act layer and glu-mlp layer. So basically there are right. Swin transformer is different from swin-mixer
@alihadimoghadam8931
@alihadimoghadam8931 Год назад
great job pal
Далее
ConvNeXt: A ConvNet for the 2020s
11:19
Просмотров 5 тыс.
LISA - ROCKSTAR (Official Music Video)
02:48
Просмотров 41 млн
Olive can see you 😱
01:00
Просмотров 20 млн
Как выжить на 1000 рублей?
13:01
Просмотров 358 тыс.
ViTPose: 2D Human Pose Estimation
22:30
Просмотров 2,9 тыс.
Transformers Explained by Example
15:32
Просмотров 998
Vision Transformer and its Applications
34:38
Просмотров 38 тыс.
LISA - ROCKSTAR (Official Music Video)
02:48
Просмотров 41 млн