MetaFormer is Actually What You Need for Vision

Подписаться 3 тыс.

50% 1

In this video, we explore the revolutionary MetaFormer architecture, which extends the highly successful transformer idea from natural language processing into the world of computer vision. MetaFormer introduces a general architecture block that guarantees exceptional performance, regardless of the specific task at hand.
To demonstrate this claim, the researchers behind MetaFormer present an embarrassingly simple yet incredibly effective architecture called PoolFormer. Surprisingly, this architecture outperforms the popular Swin Transformer model with significantly fewer parameters and computations.
Join us to learn about the exciting potential of MetaFormer in transforming the field of computer vision and enabling breakthroughs in a wide range of applications.
Paper link: arxiv.org/abs/2111.11418
Table of Content:
00:00 Introduction
01:46 ConvNeXt resemblance
02:48 PoolFormer block
06:10 Model Architecture
08:16 Result
Icon made by Freepik from flaticon.com

Опубликовано:

28 июн 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 6

@francisferri2732 Год назад

Very good video. It is incredible to know how pooling can make the network more efficient

@Raulvic Год назад

Thank you for sharing

@soroushmehraban Год назад

Glad you liked it!

@rohollahhosseyni8564 9 месяцев назад

Well explained

@shilashm5691 11 месяцев назад

swin-mixer is a another variant of mlp-mixer which uses swish act layer and glu-mlp layer. So basically there are right. Swin transformer is different from swin-mixer