Тёмный

Tan Nguyen - Transformers Meet Image Denoising: Mitigating Over-smoothing in Transformers 

One world theoretical machine learning
Подписаться 1,9 тыс.
Просмотров 263
50% 1

Abstract: Transformers have achieved remarkable success in a wide range of natural language processing and computer vision applications. However, the representation capacity of a deep transformer model is degraded due to the over-smoothing issue in which the token representations become identical when the model’s depth grows. In this work, we show that self-attention layers in transformers minimize a functional which promotes smoothness, thereby causing token uniformity. We then propose a novel regularizer that penalizes the norm of the difference between the smooth output tokens from self-attention and the input tokens to preserve the fidelity of the tokens. Minimizing the resulting regularized energy functional, we derive the Neural Transformer with a Regularized Nonlocal Functional (NeuTRENO), a novel class of transformer models that can mitigate the over-smoothing issue. We empirically demonstrate the advantages of NeuTRENO over the baseline transformers and state-of-the-art methods in reducing the over-smoothing of token representations on various practical tasks, including object classification, image segmentation, and language modeling.

Наука

Опубликовано:

 

2 фев 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии    
Далее
13+ 2 серия
8:17
Просмотров 281 тыс.
A SMART GADGET FOR CLUMSIES🤓 #shorts
0:21
Просмотров 1,7 млн
MIT 6.S191: Convolutional Neural Networks
1:07:58
Просмотров 46 тыс.
What are Transformer Models and how do they work?
44:26
MIT Introduction to Deep Learning | 6.S191
1:09:58
Просмотров 364 тыс.
GEOMETRIC DEEP LEARNING BLUEPRINT
3:33:23
Просмотров 172 тыс.
Собери ПК и Получи 10,000₽
1:00
Просмотров 2,7 млн
OZON РАЗБИЛИ 3 КОМПЬЮТЕРА
0:57
Просмотров 1,8 млн