Тёмный
No video :(

Aligning LLMs with Direct Preference Optimization 

DeepLearningAI
Подписаться 334 тыс.
Просмотров 25 тыс.
50% 1

Опубликовано:

 

5 сен 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 20   
@eliporter3980
@eliporter3980 6 месяцев назад
I'm learning a lot from these talks, thank you for having them.
@NitinPasumarthy
@NitinPasumarthy 6 месяцев назад
The best content I have seen in a while. Enjoyed both the theory and the practical notes from both speakers! Huge thanks dlai for organizing this event
@PritishYuvraj
@PritishYuvraj 5 месяцев назад
Excellent description between PPO and DPO! Kudos
@vijaybhaskar5333
@vijaybhaskar5333 6 месяцев назад
Excellent topic. Well explained. One of the best videos on this subject I saw recently. Continue your goodwork😊
@amortalbeing
@amortalbeing 7 месяцев назад
This was amazing thank you everyone. One thing though, if thats possible, it would be greatly appreciated if you could record in 1080p where the details/text on the slides are visible and easier to consume. Thanks a lot again
@MatijaGrcic
@MatijaGrcic 7 месяцев назад
Check out notebooks and slides in the description.
@amortalbeing
@amortalbeing 6 месяцев назад
@@MatijaGrcic Thanks a lot, downloaded the slides
@katie-48
@katie-48 6 месяцев назад
Great presentation, thank you very much!
@user-rx5pp3hh1x
@user-rx5pp3hh1x 6 месяцев назад
cut to the chase - 3:30 questions on DPO - 27:37 practical deep-dive - 30:19 question - 53:32
@jeankunz5986
@jeankunz5986 6 месяцев назад
great presentation. Congratulations.
@dc33333
@dc33333 Месяц назад
muy bueno gracias
@PaulaLeonova
@PaulaLeonova 6 месяцев назад
At 29:40 Lewis mentions an algorithm that requires fewer training samples, what is the name of it? I heard "data", but don't think that is correct. If anyone knows, would you mind replying?
@user-rx5pp3hh1x
@user-rx5pp3hh1x 6 месяцев назад
Possibly, this paper, "Rethinking Data Selection for Supervised Fine-Tuning" arxiv.org/pdf/2402.06094.pdf
@ralphabrooks
@ralphabrooks 6 месяцев назад
I am also interested in hearing more about this "data" algorithm. Is there a link to a paper or blog on it?
@TheRilwen
@TheRilwen 6 месяцев назад
I'm wondering why simple techniques, such as sample boosting, increasing errors for high ranked examples, or attention layer wouldn't work in place of RLHF. It seems like a very convoluted and inefficient way of doing a simple thing - which convinces me that I'm missing something :-)
@austinmw89
@austinmw89 6 месяцев назад
Curious if you compared SFT on all data vs. training on completions only?
@iseminamanim
@iseminamanim 7 месяцев назад
Interested
@trisetra
@trisetra Месяц назад
ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-QXVCqtAZAn4.html The details in the Llama3 paper seem to validate the claim that DPO works better than RL at scale.
@MacProUser99876
@MacProUser99876 6 месяцев назад
How DPO works under the hood: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-Ju-pFJNfOfY.html
Далее
Почему-то хочется плакать
00:17
Просмотров 500 тыс.
A Minecraft Movie | Teaser
01:20
Просмотров 20 млн
[Webinar] LLMs for Evaluating LLMs
49:07
Просмотров 9 тыс.
How AI 'Understands' Images (CLIP) - Computerphile
18:05
Stanford CS25: V3 I Retrieval Augmented Language Models
1:19:27
ORPO: NEW DPO Alignment and SFT Method for LLM
24:05
Просмотров 3,8 тыс.
Почему-то хочется плакать
00:17
Просмотров 500 тыс.