No video :(

Aligning LLMs with Direct Preference Optimization

DeepLearningAI

Подписаться 334 тыс.

Просмотров 25 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

5 сен 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 20

@eliporter3980 6 месяцев назад

I'm learning a lot from these talks, thank you for having them.

@NitinPasumarthy 6 месяцев назад

The best content I have seen in a while. Enjoyed both the theory and the practical notes from both speakers! Huge thanks dlai for organizing this event

@PritishYuvraj 5 месяцев назад

Excellent description between PPO and DPO! Kudos

@vijaybhaskar5333 6 месяцев назад

Excellent topic. Well explained. One of the best videos on this subject I saw recently. Continue your goodwork😊

@amortalbeing 7 месяцев назад

This was amazing thank you everyone. One thing though, if thats possible, it would be greatly appreciated if you could record in 1080p where the details/text on the slides are visible and easier to consume. Thanks a lot again

@MatijaGrcic 7 месяцев назад

Check out notebooks and slides in the description.

@amortalbeing 6 месяцев назад

@@MatijaGrcic Thanks a lot, downloaded the slides

@katie-48 6 месяцев назад

Great presentation, thank you very much!

@user-rx5pp3hh1x 6 месяцев назад

cut to the chase - 3:30 questions on DPO - 27:37 practical deep-dive - 30:19 question - 53:32

@jeankunz5986 6 месяцев назад

great presentation. Congratulations.

@dc33333 Месяц назад

muy bueno gracias

@PaulaLeonova 6 месяцев назад

At 29:40 Lewis mentions an algorithm that requires fewer training samples, what is the name of it? I heard "data", but don't think that is correct. If anyone knows, would you mind replying?

@user-rx5pp3hh1x 6 месяцев назад

Possibly, this paper, "Rethinking Data Selection for Supervised Fine-Tuning" arxiv.org/pdf/2402.06094.pdf

@ralphabrooks 6 месяцев назад

I am also interested in hearing more about this "data" algorithm. Is there a link to a paper or blog on it?

@TheRilwen 6 месяцев назад

I'm wondering why simple techniques, such as sample boosting, increasing errors for high ranked examples, or attention layer wouldn't work in place of RLHF. It seems like a very convoluted and inefficient way of doing a simple thing - which convinces me that I'm missing something :-)

@austinmw89 6 месяцев назад

Curious if you compared SFT on all data vs. training on completions only?

@iseminamanim 7 месяцев назад

Interested

@trisetra Месяц назад

ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-QXVCqtAZAn4.html The details in the Llama3 paper seem to validate the claim that DPO works better than RL at scale.