GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Подписаться 3 тыс.

Просмотров 1,2 тыс.

50% 1

Large language models (LLMs) typically demand substantial GPU memory, rendering training impractical on a single consumer GPU, especially for a 7-billion-parameter model that necessitates 58GB of memory. In response, the GaLore paper introduces an innovative strategy that projects gradients into a low-rank space, enabling the model to fit within the constraints of a single GPU. Remarkably, this approach not only addresses the memory challenge but also outperforms other parameter-efficient tuning methods like LoRA, delivering superior results.
paper link: arxiv.org/abs/2403.03507
Table of Content:
00:00 Intro
02:17 LoRA
03:18 Limitations of LoRA
05:58 GaLore
18:18 Adam with GaLore
21:01 8-Bit Optimizers
22:50 LOMO
24:48 GaLore vs LoRA
26:20 Rank vs Perplexity
27:07 results
Icon made by Freepik from flaticon.com

Опубликовано:

25 июн 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 11

@yashmandilwar8904 3 месяца назад

"mr" is the size of Projector P_t I think. In the algorithm they calculate R_t = P_t.T G_t Great video by the way! Thanks.

@soroushmehraban 3 месяца назад

Yes you’re right. Why I missed that lol. Thanks!

@iliyasindikar4695 Месяц назад

well explained.

@user-gl5ys8nr2u 21 день назад

Excellent video! Would you recommend any resources that explains the theorems they propose for low-rank gradients and their convergence in-depth? Also, what tools do you use to create such cool animations?