Тёмный

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection 

Soroush Mehraban
Подписаться 3 тыс.
Просмотров 1,2 тыс.
50% 1

Large language models (LLMs) typically demand substantial GPU memory, rendering training impractical on a single consumer GPU, especially for a 7-billion-parameter model that necessitates 58GB of memory. In response, the GaLore paper introduces an innovative strategy that projects gradients into a low-rank space, enabling the model to fit within the constraints of a single GPU. Remarkably, this approach not only addresses the memory challenge but also outperforms other parameter-efficient tuning methods like LoRA, delivering superior results.
paper link: arxiv.org/abs/2403.03507
Table of Content:
00:00 Intro
02:17 LoRA
03:18 Limitations of LoRA
05:58 GaLore
18:18 Adam with GaLore
21:01 8-Bit Optimizers
22:50 LOMO
24:48 GaLore vs LoRA
26:20 Rank vs Perplexity
27:07 results
Icon made by Freepik from flaticon.com

Опубликовано:

 

25 июн 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 11   
@yashmandilwar8904
@yashmandilwar8904 3 месяца назад
"mr" is the size of Projector P_t I think. In the algorithm they calculate R_t = P_t.T G_t Great video by the way! Thanks.
@soroushmehraban
@soroushmehraban 3 месяца назад
Yes you’re right. Why I missed that lol. Thanks!
@iliyasindikar4695
@iliyasindikar4695 Месяц назад
well explained.
@user-gl5ys8nr2u
@user-gl5ys8nr2u 21 день назад
Excellent video! Would you recommend any resources that explains the theorems they propose for low-rank gradients and their convergence in-depth? Also, what tools do you use to create such cool animations?
@gamerfawaz1234
@gamerfawaz1234 3 месяца назад
Love❤, keep sharing, and shining
@savanthtadepalli3968
@savanthtadepalli3968 3 месяца назад
Your explanation is truly awesome! Keep making more, please!
@soroushmehraban
@soroushmehraban 3 месяца назад
Thanks 🙂
@alihadimoghadam8931
@alihadimoghadam8931 3 месяца назад
Great
@alinaderiparizi7193
@alinaderiparizi7193 3 месяца назад
Interesting!
Далее
Вечный ДВИГАТЕЛЬ!⚙️ #shorts
00:27
Просмотров 1,3 млн
СТРИМ ► Elden Ring - Shadow of the Erdtree #4
5:55:46
MoCo (+ v2): Unsupervised learning in computer vision
31:03
Fine-Tuning Mistral-7B with LoRA (Low Rank Adaptation)
1:01:16
Large Language Models in Five Formulas
58:02
Просмотров 33 тыс.
Вечный ДВИГАТЕЛЬ!⚙️ #shorts
00:27
Просмотров 1,3 млн