MotionAGFormer (WACV2024): Enhancing 3D Human Pose Estimation with a Transformer-GCNFormer Network

Подписаться 3 тыс.

50% 1

In this video, I review the MotionAGFormer paper for the task monocular 3D human pose estimation.
Paper link: arxiv.org/abs/2310.16288
GitHub link: github.com/TaatiTeam/MotionAG...
Table of content:
00:00 Intro
00:10 MetaFormer and GCFormer
01:25 MotionAGFormer
03:52 GCNFormer's Adjacency Matrix
06:34 MotionAGFormer Variants
06:56 Results and Comparison

Опубликовано:

30 июн 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 9

@ericsy78 6 месяцев назад

Your videos are amazingly engaging and one can learn a lot from them, they are also very underrated I hope everyone recognizes your vids and have a chance to learn from you. Thanks a lot, keep up the good work!

@soroushmehraban 6 месяцев назад

Thanks🙂Appreciate it

@yiqian22 6 месяцев назад

Amazing work and great explanation, thanks a lot! 👏

@soroushmehraban 6 месяцев назад

Thanks Yiqian🙂

@user-pi8vj1yy8h 5 месяцев назад

Great work! But I don't know how to train using the 2D ground truth of the Human3.6M dataset as input, I didn't learn it in the README.MD, can you tell me ?Thanks a lot!

@soroushmehraban 5 месяцев назад

Thanks. Answered the question here: github.com/TaatiTeam/MotionAGFormer/issues/12

@jialiangxu1657 10 дней назад

Hi, I'm still a bit confused so could you please tell me how do you solve the 3D pose judder. The 2D pose contains the judder problem, but I can not find it after lifting to 3D pose in the demo video of your code. Thank you.

@soroushmehraban 8 дней назад

Hi Jialiang, Throughout training the model also sees 2D poses with jitters but as the ground truth output, it sees motion capture 3D and we have a velocity loss (we multiply by 20 to make it 20 times more important than MPJPE), that make the model estimation to have the same velocity as the ground truth and penalizes it if it has jitters. So the model in addition to lifting the input from 2D to 3D and inferring the underlying 3D structure, it also has to denoise the input.