PaLM Pathways Language Model explained | 540 Billion parameters can explain jokes!?

AI Coffee Break with Letitia

Подписаться 49 тыс.

Просмотров 21 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

18 сен 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 39

@Ma2rten 2 года назад

I am a coauthor of the PaLM paper. Thanks for choosing to cover it!

@AICoffeeBreak 2 года назад

Thanks for the visit! And congrats for the cool work. 👏 I'm eager to see what you have lined up next.

@michaelfischer841 2 года назад

thank you for your brilliant work

@michaelfischer841 2 года назад

when you are training these things -- are you also using the contents of university level reference materials in PDF format -- which can be converted using pdftotext on the command line

@sabofx 2 года назад

@Maarten Bosma: I've viewed several videos on PaLM like this one and one by Dr Alan D Thompson. Is there any way that I could have a conversation/chat with the PaLM AI? I would love to test its reasoning capabilities myself. Is there somewhere where I can sign up for access? Looking forward to your reply! Cheers, Joost.

@anthonyrepetto3474 2 года назад

In regard to PaLM developing certain capabilities only once it reaches a threshold: We now know that even random graphs, of sufficient size and connectivity, undergo a 'phase-change' into states of higher order, as explained on Quanta's recent article, "Elegant Six-Page Proof Reveals the Emergence of Random Structure" - So, even though the model is not an innovation, it does provide a potential insight: making models bigger can cross *thresholds* into sudden new abilities!

@Mutual_Information 2 года назад

I'm glad you chose PaLM. It felt like DALLE was sucking up all the attention when PaLM was doing some seriously impressive things we haven't seen before. Very nice video. As always :)

@iliemihai949 2 года назад

Foarte tare Letitia, unul dintre cele mai bune canale de urmarit in materie de NLP. In lunile urmatoare vom lansa un model de GPT2-780M pe limba romana antrenat pe 40 GB text.

@AICoffeeBreak 2 года назад

Wow, de abia aștept să văd. 👀

@HoriaNeagu 2 года назад

Salut! S-a concretizat cumva acest proiect?

@hannesstark5024 2 года назад

Fantastic! Thank you for this summary which prevents me from having to read slightly boring papers :7

@fedelopez77 Год назад

"Few-shot learning, as we see it from GPT-3 onwards, is just glorified pattern completion" --> Standing ovation, just awesome

@bazejmarciniak5682 Год назад

Your channel is a gem! Thanks for your great work!

@michaelfischer841 2 года назад

your commentary and insight is top notch

@JuliusSmith 2 года назад

I have to watch all your videos now! Your style is perfect for me - thanks for making them!

@AICoffeeBreak 2 года назад

Glad you found us! 😁

@mrshankj5101 2 года назад

I don't think AI language models are boring! paLM and GPT-3 is awesome!

@JuliusSmith 2 года назад

Maybe "few shot orientation" would be a better term

@AICoffeeBreak 2 года назад

🤣

@JM-ln2zm Год назад

Great Video Letitia! i have a question. So PaLM was trained on 6100 TPU's, lets say you created a language translator using PaLM, In order for me now to use this newly created language translator do i still need access to the 6100 TPU's or can it be run on less TPU's once the model has been trained?

@AICoffeeBreak Год назад

Hi, thanks for the question. Maybe someone knows this more thoroughly than me, but no, the parallelization on more than 6k TPUs is for speeding up training, for storing gradients. For inference, they do not need the gradients, they just need to load the parameters. Due to the enormous number, it is surely more than one TPU they need for inference, since it needs so much memory. If you are happy to wait a bit (I do not know how long "a bit" is for such enormous models), you could even load on a CPU with enough RAM for inference. 😅

@tildarusso 2 года назад

Nice wrap up. As you said, it is XXXL large but nothing new - boring as usual imho. Thanks you for saving the 87-page reading time for a lot of people!

@federicolusiani7753 2 года назад

Thank you so much for these videos!! The quality of the explanations and insights you provide is unmatched.

@AICoffeeBreak 2 года назад

Thanks, so nice of you! :)

@Skinishh 2 года назад

Thank you for the great video, as always! I wonder why these large LMs are all decode-only as GPT and not encoder-decoder as T5? 🤔

@Skinishh 2 года назад

Answering my own question: these kinds of models are only interested in next-text generation and not in fine-tuning tasks or mask completion as T5. Therefore, only a decoder is needed for text generation.

@Ma2rten 2 года назад

@@Skinishh Google Research has also done work on large encoder-decoder models - most recently ST-MoE-32B. Decoder-only models tend to work best for open ended text generation and few shot. Encoder-Decoder models for classification and close ended text generation (e.g. machine translation).

@DerPylz 2 года назад

I wonder what the output would be without the "few-shots", so not giving the 2 examples of correctly solved tasks before the prompt. Do you think there would be no answer at all, or just a very bad one?

@odysseashlap 2 года назад

There would be an irrelevant answer

@scottpulver 2 года назад

Irrelevant followed by 1 perfect

@Abdulazizab2 2 года назад

Checkout the GPT-3 paper "Language Models are Few-Shot Learners" as they evaluate the 'few-shots' and also 'zero-shot' where you don't provide any prompt for a given task. For some tasks, zero shot does well and other tasks the model needs to be driven by at least one example i.e. 'one-shot'.

@Micetticat 2 года назад

"Boring models" with new exciting hardware tricks!

@jeanpicard1844 2 года назад

I’m confused as to what you mean about toxicity and why it’s being toxic? Or how it’s being toxic? Is there an example of something you can point me to? Maybe I’m just missing the meaning of a term as it is used in the AI/Language space.

@AICoffeeBreak 2 года назад

Maybe you can read more about it here. 🔗 GPT-3 examples of toxic behaviour: venturebeat.com/2022/01/27/openai-rolls-out-new-text-generating-models-that-it-claims-are-less-toxic/