How Large Language Models Work

Подписаться 758 тыс.

Просмотров 428 тыс.

50% 1

Learn about watsonx → ibm.biz/BdvxRj
Large language models-- or LLMs --are a type of generative pretrained transformer (GPT) that can create human-like text and code. There's a lot of talk about GPTs and LLMs lately, but they've actually been around for years! In this video, Martin Keen briefly explains what a LLM is, how they relate to foundation models, and then covers how they work and how they can be used to address various business problems.
#llm #gpt #gpt3 #largelanguagemodel #watsonx #GenerativeAI #Foundationmodels

Опубликовано:

27 июл 2023

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 121

@mindofpaul9543 2 месяца назад

I don't know what is more impressive, LLMs or this guy's ability to write backwards perfectly.

@patmil8314 2 месяца назад

the whole thing is flipped i guess. He's "writing left handed" and we all know that is impossible

@djham2916 2 месяца назад

It's mirrors and a screen

@catherinel7718 Месяц назад

I have a teacher who can write backwards perfectly. It's creepy lol

@chrismartin9769 Месяц назад

There are videos that show you how people do this- it's a visual trick not a dexterity master class ;)

@gatsby66 Месяц назад

@@djham2916And smoke!

@dennisash7221 9 месяцев назад

Very nice explanation, short and to the point without getting bogged down in detail that is often misunderstood. I will share this with others

@surfercouple 3 месяца назад

Nicely done! You explain everything very clearly. This video is concise and informative. I will share with others as an excellent foundational resource for understanding LLMs.

@DilshanBoange 8 месяцев назад

Great video presentation! Martin Keen delivers a superbly layman friendly elucidation of what are otherwise very 'high tech talk' to people like me who do not come from a tech based professional background. These types of content are highly appreciable, and in fact motivate further learning on these subjects. Thank you IBM, Mr. Keen & team. Cheers to you all from Sri Lanka.

@user-oq2lz4ux3c 6 месяцев назад

P ppl

@saikatnextd 4 месяца назад

Martin keen as awesome as usual...... so natural. I love his talks and somehow I owe to him my understandingof complicated subjects in AI> thanks......

@KageManTV 3 месяца назад

Really really enjoyed this primer. Thank you and great voice and enthusiasm!

@Pontie66 4 месяца назад

Hey, nice job!!! yeah, I'd like to see more of these kinds of subjects in the present and the future as well!!!

@rappresent 4 месяца назад

great presentation, feels like personal asistant, great!

@SuperRider-RS 29 дней назад

Very elaborate explanation. Thank you

@evgenii.panaite 22 дня назад

tbh, I just love his voice and ready to listen all his videos 🤗

@dsharma6694 Месяц назад

perfect for learning LLMs

@dmitriyartemyev3329 Месяц назад

IBM big thanks to you for all this videos! This videos are really helpfull

@vicweast 3 месяца назад

Very nicely done.

@peterprogress 4 месяца назад

I've liked and subscribed and done it again a thousand times in my mind

@SatishDevana 4 месяца назад

Thank you for posting this video. What are the other architectures available apart from Transformer?

7 месяцев назад

Greate explanation ❤

@amparoconsuelo9451 8 месяцев назад

Can a subsequent SFT and RTHF with different, additional or lesser contents change the character, improve, or degrade a GPT model?

@ChatGPt2001 2 месяца назад

Large language models like GPT-3 work by using deep learning techniques, specifically a type of neural network called a transformer. Here's an overview of how they work: 1. **Data Collection**: Large language models are trained on vast amounts of text data from the internet, books, articles, and other sources. This data is used to teach the model about language patterns, grammar, syntax, semantics, and context. 2. **Tokenization**: The text data is tokenized, which means breaking it down into smaller units such as words, subwords, or characters. Each token is assigned a numerical representation. 3. **Training**: The model is trained using a process called supervised learning. During training, the model learns to predict the next word or token in a sequence based on the preceding context. It adjusts its internal parameters (weights and biases) through backpropagation to minimize prediction errors. 4. **Transformer Architecture**: Large language models like GPT-3 use a transformer architecture, which is highly effective for handling sequential data like language. Transformers include attention mechanisms that allow the model to focus on relevant parts of the input sequence while generating output. 5. **Fine-Tuning**: After pre-training on a large dataset, language models can be fine-tuned on specific tasks or domains. This process involves additional training on a smaller dataset related to the target task, which helps the model specialize in that area. 6. **Inference**: Once trained, the language model can generate text by predicting the most likely next tokens given an input prompt. It uses the learned patterns and context from training to generate coherent and contextually relevant responses. 7. **Continual Learning**: Some language models support continual learning, which means they can be updated with new data over time to improve their performance and adapt to changing language patterns. Overall, large language models combine sophisticated neural network architectures, extensive training data, and advanced training techniques to understand and generate human-like text.

@EmpoweredWithZarathos2314 6 месяцев назад

such a great video

@vainukulkarni1936 2 месяца назад

Very nice explanation, are these foundation models are proprietary? How many foundation models exist?

@cushconsultinggroup 6 месяцев назад

Intro to LLM’s. Thanks

@eddisonlewis8099 3 месяца назад

Interesting explanation

@NicholasDWilson 12 дней назад

Lol. I only knew Martin Keen from Brulosophy. This is sort of mindblowing.

@GuyHindle 10 месяцев назад

What is meant by, when referring to "sequences of words", "understanding"? I mean, what does "understanding" mean in that context?

@chetanrawatji 3 месяца назад

Thank You Sir ❤

@kevnar 5 месяцев назад

Imagine a world where wikipedia no longer needs human contributors. You just upload the source material, and an algorithm writes the articles and all sub-pages, listing everything it knows about a certain fictional character because it read the entire book series in half a second. Imagine having a conversation with the world's most eminent Star Wars expert.

@korgond 5 дней назад

I get a remote job offer. The duty is AI training for LLM. Shall i go for it? What do you think?

@Pontie66 4 месяца назад

Hi Martin, are you there around? Could you please talk about " Emerging LLM App Stack" ? Thanks in advance!

@narayanamurthy5397 2 месяца назад

Knowig about LLM Model Work Mr. Martin Keen. Can you larger focus on LLM Modelling and what exact related stuff(program skills) is requried. Thank you so much it was pleasant video i appreciated.

@CyberEnlightener 9 месяцев назад

The term large can not be referred to as large data; to be precise it is the number of parameters that is large. So slight correction.

@dennisash7221 9 месяцев назад

I do beleive that Large in LLM refers both to the large amount of data as well as the large number of hyper parameters, so both are correct but there is a prerequisite that the data be large not only the paramaters.

@TacoMaster07 8 месяцев назад

There's a lot of params because of the huge dataset

@ApPillon 2 месяца назад

Thanks dude

@mandyjacotin8321 3 месяца назад

That's amazing! Our company has a great project that can benefit from this and then use the proceeds to benefit mankind. How can we speak more about this? I am very intrigued.

@She_cooks2023 2 месяца назад

Amazing!

@Nursultan_karazhigit 3 месяца назад

Thanks . How much to build the own LLM

@ArgumentumAdHominem 3 месяца назад

Nice explanation! But I am still missing the most important point. How does one control relevance of the produced results? E.g. ChatGPT can answer questions. So far, what you explained is a model that can predict -> generate the next word in a document, given what has already been written. However, given a set of existing sentences, there is a multitude of ways to produce the next sentence, that would be somewhat consistent with the rest of the document. How does one go from plausible text generators to desired text generators?

@Leonhart_93 2 месяца назад

Statistical likelihood based on the training data. And then there is a random seed so that there a little variation between inputs and outputs, so that the answer isn't always exactly the same for the same prompt.

@shravanardhapure4961 9 месяцев назад

What is quantized version of models, how it would be created?

@tonyhawk123 9 месяцев назад

A model consists of lots of numbers. Those numbers would be smaller. Fewer bits per number.

@mauricehunter7803 Месяц назад

Other than the physical limitation of space like any other computer has, it seems to me that technology like this should be applicable to robotics and allow for creation of much smarter and adaptive robotics projects and creations.

@Private-qg5il 10 месяцев назад

In this presentation, there was not enough detail on Foundation Models as a baseline to then explain what LLMs are.

@Gordin508 10 месяцев назад

The foundation model is trained on a gigantic amount of general text data on a very general task (such as language modeling, which is next-word prediction). The LLM is then created by finetuning a foundation model (a specific case of "pretrained model") on a more specific dataset (e.g. source code), sometimes also for a more specific task. The foundation model is basically a stem cell for LLMs. It does not yet fulfill a specific purspose, but since it saw tons of data can be adapted to (pretty much) everything. Training the foundation model is extremely expensive, but it makes the downstream LLMs much cheaper as they do not need to be trained from scratch.

@yaguabina 4 месяца назад

Does anyone know what program he uses to sketch on screen like that?

@sebbejohansson 4 месяца назад

It's a glass window. He is physically writing on it. For it to show the correct way (and him not having to write backwards) they just flip the image!

@hatersgonnalovethis 3 месяца назад

Wait a minute. Did he really write in mirror handwriting?

@michaelcharlesthearchangel 2 месяца назад

AI was used to make it appear that he can write on your screen.

@penguinofsky Месяц назад

He writes it normally but the video is flipped horizontally..

@nuwayir 9 месяцев назад

so the transformers only for the language and text related things??

@user-vo5gv1tk1m 5 месяцев назад

no for the image processing too

@user-vo5gv1tk1m 5 месяцев назад

Transformer models, originally developed for natural language processing tasks, have been extended to computer vision tasks as well. Vision Transformer (ViT) is an example of a transformer model adapted for image processing. Instead of using convolutional layers, ViT uses self-attention mechanisms to capture relationships between different parts of an image.

@rangarajannarasimhan341 4 месяца назад

Lucid, thanks

@sheikhobama3759 4 месяца назад

1 PB = 1024 TB 1TB = 1024 GB 1GB = 1024 MB 1MB = 1024 KB 1KB = 1024 B 1B = 8 bits So 1 PB = 1024 * 1024 * 1024 * 1024 *1024 Bytes Multiply it again by 8 to get the number of bits. Guys do correct me if I'm wrong!!

@tekad_ 3 месяца назад

How did you learn to write backwards

@kiyonmcdowell5603 3 месяца назад

What's the difference between large language models and text to speech

@schonsospaet22 Месяц назад

Thank you for explaining! 🪲 Min. 3:37 is the major "bug" 🐞 within the learning system, *it does not start off with a related guess, it's random.* 🌬 I can't wait until the *brain slice chips* can last longer and get trained like a real human brain that is actually learning by feelings and repeating instead of random guessing and then correcting itself until the answer is appropriate. They could soon replace A.I. technology completely, so maybe we shouldn't hype too much about it. After all the effort, energy and money we put into A.I. and new technology, it's no doubt that *we could have educated our children better* instead of creating a fake new world, based on pseudo knowledge extracted from the web. 👨‍👩‍👧‍👦👨‍👩‍👧‍👧 Nobody want's to be r3placed without having the benefit of the machine. General taxes on machines and automated digital services could fund better education for humans. Dear A.I.: You know what is real fun? Planting a tree in real life! 🍒

@AIpowerment 10 месяцев назад

Did you only mirror the screen and it looks like you can write RTL, isnt it?! wow

@IBMTechnology 10 месяцев назад

See ibm.biz/write-backwards

@VRchitecture 9 месяцев назад

Something tells me “The sky is the limit” here 👀

@krishnakishorenamburi9761 2 месяца назад

@2:15 a different sequence. this is just for fun .

@RC19786 7 месяцев назад

could haver been better, most of it was speculative when it came to application building, not to mention the laws governing it

@niket1231 2 месяца назад

Need one use case

@eregoldamite8739 Месяц назад

How are you able to write that way

@7890tom7890 17 дней назад

My chemistry professor does videos with one and explains it in a video: Chemistry with Dr. Steph (thats her Channel), it's the featured video on her page

@uhrcru 3 месяца назад

NİCE VİD O7

@WarpRulez 9 месяцев назад

How does ChatGPT know about itself and its own behavior? If you ask questions about those topics, it will answer intelligently and accurately about itself and its own behavior. It will not just spout random from patterns from the internet. How does it know this?

@dennisash7221 9 месяцев назад

To start with ChatGPT does not "know itself" it is not self aware, what you are seeing when GPT answers the question "Who are you?" is a pre programmed response that has been put there by the trainers of the model, something like toy with prerecorded messages that you can hear when pressing a button or pulling a string. ChatGPT does not "know" anything it simply responds to your prompts or as you see them your questions with the appropriate answers.

@Joyboy_1044_ 6 месяцев назад

GPT doesn't possess genuine awareness, but it can certainly mimic it to some extent

@devperatetechno8151 10 месяцев назад

but how its possible to an LLM innovate when its being trained with over human knowledge boundaries?

@mauricehunter7803 Месяц назад

I'm far from an expert on the matter but the simple answer to your question is that it's programmed to be able to learn and adjust according to many various inputs. Arguable it's probably where robot technology should be headed next. Having an ability to learn and react to that learning.

@saadanees7989 3 месяца назад

Is this video mirrored?

@pdjhh 5 месяцев назад

So LLM based AI is just language not ‘intelligence’? Based on what it’s read it knows or guesses what usually comes next? So zero intelligence?

@mauricehunter7803 Месяц назад

From what I can tell of the subject matter it's more of a mimicked intelligence. That's why the analogy of a parrot was used. Cause this technology can learn, repeat back and limitedly guess what's coming next. But there's a certain level of depth and nuance that a human posses that parrots and chat GPT tech do not.

@dirkbruenner 4 месяца назад

How does this presentation work? You do are not mirror writing behind a glass pane, do you?

@sebbejohansson 4 месяца назад

Yea, it's a glass window! He is physically writing on it. For it to show the correct way (and him not having to write backwards) they just flip the image!

@lmarcelino555 3 месяца назад

I don’t even know where to begin. 😵‍💫

@shshe6515 3 месяца назад

Still dont get it

@varghesevg5 4 месяца назад

Getting hallucinations!

@Secret4us 3 дня назад

How many 'parameters' does the human brain have, I wonder.

@boriscrisp518 9 месяцев назад

Ugh corporate videos..... the horror

@user-en4zy4xh7i 7 месяцев назад

Why does a gigabyte have more words then a petabyte? I am lost already!!! 1 Gig =178 million words, 1 petabyte is 1.8x10^14 words, and there are only 750,000 words in the dictionary?

@turna23 6 месяцев назад

I got this far, stopped the video and searched for a comment like this. Why isn't this the top comment?

@abdulmueed2844 6 месяцев назад

its not total unique words… basically its text from different websites its different sentences … so lets say you want llm to answer you about coding you train it on all the data on stackoverflow, leetcode etc every available resource … so it knows when users asked questions how to run loop in java the replies were x,y,z … its more of glorified and better google search that feels like intelligence …

@dasikakn 6 месяцев назад

He said 178m words in a 1 GB sized file. And a petabyte sized file has 1 million _gigabytes_ in it. So, loosely speaking, you multiply 178m with 1 million to get number of words in an LLM. But…It’s not being fed unique words. It’s getting word patterns. Think about how we speak…our sentences are word patterns that we use in mostly predictable structures and then fill in the blank with more rich words as we get older to convey what want to say with synonyms etc.

@ereinei 5 месяцев назад

1PB = 1024 TB = 1024 GB

@jks234 4 месяца назад

What makes knowledge so complex is not the words, but the way the words are used. Choose any word and you will see that it is linked with hundreds of topics and contexts. If I say draw, I could be talking about drawing water drawing class drawing during class drawing my friend drawing a dog drawing a long time drawing that sold for a lot of money I like drawing And so on. These all code for a different idea. And it is these “ideas” or relationships that foundation models encoded. With these relationships, you now have the probabilistic weights that allow you to construct realistic and correct sounding sentences that are also likely accurate because of the enormous dataset it was trained on. Another context idea. You want to connect fish to swim. This is highly weighted in the llm.

@TheLeppus28 5 месяцев назад

What to do if a Large Language Model after putting all petabytes of data into it is still talking nonsense?

@jonitalia6748 3 месяца назад

$PLTR

@Balthazar2242 9 месяцев назад

How is he writing backwards

@IBMTechnology 9 месяцев назад

See ibm.biz/write-backwards

@karolinasobczyk-kozowska3717 5 месяцев назад

Wow! It's a clever idea 😊

@thiruvetti 9 месяцев назад

You could have finished the video by saying an LLM like Chat GPT could have produced the entire explanation for this video.. (I think u hinted the same)

@MichaelDomer Месяц назад

Hire someone next time who can explain it to the average John and Jane. Talk about 7 billion parameters and you already have John and Jane scrathing their head like crazy what the fuck he's talking about. Oh, yeah, some in the comment section understand it... but they're not the average John and Jane... they're often familiar with coding, data, business processes, computers, etc

@AGI-001 2 месяца назад

This is scarrry

@Blazeww 8 месяцев назад

Isn't it using the most likely thing that humans defined and just uses patterns of what's most expected based on how humans interact and info put in..... that's not complicated. How do they not understand how it works....

@ricardog.p2610 2 месяца назад

If IBM knows that, why they didnt implement it in the Watson that were useless 😂😂😂

@spadaacca 7 месяцев назад

Not a very good video. Really didn't explain much. You could have said so much more in 5:33 than slowly drawing things and talking about business applications.