Тёмный

Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM | Jared Casper 

@Scale
Подписаться 18 тыс.
Просмотров 3,6 тыс.
50% 1

In this talk we present how we trained a 530B parameter language model on a DGX SuperPOD with over 3,000 A100 GPUs and a high speed Infiniband interconnect, and how we can scale to even larger models. We explore three types of parallelism: data, tensor, and pipeline and how these different types can be composed to achieve maximum efficiency. Our approach allows us to perform training iterations on a model with 1 trillion parameters at 502 petaFLOP/s on 3072 GPUs (per-GPU throughput of 52% of theoretical peak). We discuss challenges that we faced when training the 530B Megatron-Turing NLG model and give practical advice on how to successfully train very large language models.

Опубликовано:

 

29 сен 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 3   
@prajyot2021
@prajyot2021 4 месяца назад
Need more such detailed content Jared. Appreciate your Work. Thanks Mate
@voncolborn9437
@voncolborn9437 8 месяцев назад
Being an old-timer on computer ops (from back in the 80s), I find this whole new world of computer operations totally facinating. It really is hard for me to wrap my head around the size and performance of these systems. My hat is off to you guys. I'm watching and learning a little, too.
@kazimejbaulislam9185
@kazimejbaulislam9185 9 месяцев назад
amazing explanation! Thanks
Далее
Optics in AI Clusters - Meta Perspective
19:28
Просмотров 1,5 тыс.
Turing-NLG, DeepSpeed and the ZeRO optimizer
21:18
Просмотров 17 тыс.
Has Generative AI Already Peaked? - Computerphile
12:48