The Truth about ChatGPT and Friends - understand what it really does and what that means

Подписаться 2,2 тыс.

Просмотров 5 тыс.

50% 1

On 10 October I gave a talk at the EABPM Conference Europe 2023, making clear what ChatGPT and friends actually do - addressing the technology in a non-technical but correct way - , and what that means. At the end, one attendee remarked "You made my head explode a bit. Why isn't this story everywhere?" Luckily, a video registration was made and this is the end result. Presentation, and a single exchange from the Q&A are included.
There are many explanations of Large Language Models like GPT out there. Some go in deep and explain the transformer architecture to you, but that is completely irrelevant for most of us. Many talk about the results, good, bad, or imagined. But this talk fills the gap between the tech and the results. At the end you will understand what these models really do in a practical sense (so not the technical how) when they handle language, see not only how impressive they are, but also how the errors come to be (with a practical example), and what that means what we may expect from this technology in the future.
Note: almost everything I have created (mostly blogs) about AI can be found here: ea.rna.nl/the-...
• The Truth about ChatGP... Presentation start
• The Truth about ChatGP... The Lie to Children concept (Pratchett)
• The Truth about ChatGP... The Wager
• The Truth about ChatGP... Neural nets work on numbers (section start)
• The Truth about ChatGP... Text has a variable length (sequence)
• The Truth about ChatGP... One word at a time (RNN)
• The Truth about ChatGP... Context and Long Context
• The Truth about ChatGP... Transformers enable very large models
• The Truth about ChatGP... GPT (section start)
• The Truth about ChatGP... How GPT produces text (autoregression)
• The Truth about ChatGP... How GPT calculates (attention to context)
• The Truth about ChatGP... There is no 'prompt and reply'
• The Truth about ChatGP... Increasing creativity (temperature)
• The Truth about ChatGP... The Biggest 'Lie-to-Children' (section start)
• The Truth about ChatGP... Representing words (embedding)
• The Truth about ChatGP... Playing dictionary
• The Truth about ChatGP... Handling text. Not words, but...
• The Truth about ChatGP... Handling numbers
• The Truth about ChatGP... Reasoning and logic
• The Truth about ChatGP... Prompt engineering and plugins
• The Truth about ChatGP... Being harmless (safety concerns)
• The Truth about ChatGP... Alchemy
• The Truth about ChatGP... Will growing more help?
• The Truth about ChatGP... Useful, provided...
• The Truth about ChatGP... GPT-fever
• The Truth about ChatGP... Understanding
• The Truth about ChatGP... It's the context, stupid!
• The Truth about ChatGP... Expectations (section start)
• The Truth about ChatGP... Code generation
• The Truth about ChatGP... Issues (section start)
• The Truth about ChatGP... "Bewitchment by language"
• The Truth about ChatGP... GPT quality overview
• The Truth about ChatGP... Why isn't this story everywhere?
If you want to find more of my 'realistic insights' see the other videos or visit ea.rna.nl/ for more written content.
Yes, I know, the Transformer encoder-decoder depiction/explanation around 12:24 is erroneous. This is not very important for the story and conclusions. Bugs me, though.

Опубликовано:

11 сен 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 14

@JustIsold День назад

Thanks so much for putting this talk online, I now have a much better grasp of what generating sentences by using the next likeliest 'word' (really token) means in practice and why ChatGPT works and doesn't work at the same time.

@GerbenWierda День назад

Thank you. I appreciate people spreading it around.

@x--. 10 месяцев назад

Absolutely fascinating talk. Compelling, well-crafted. I do have to wonder where the line is? Listening to young children learning language really quickly demonstrates that they are basically statistics machines. They learn the words and the patterns far quicker than they gain the understanding of all those words. I've seen kids just throw words together with no clear understanding of what they are actually saying but simply seeking a desired response. I've always wondered how much of us remain those statistic-machines that mostly speak in set patterns. This talk actually alludes to it when our presenter speaks on how "conviction" often precedes observation and reason. We go shopping for the facts that fit our currently held beliefs. Our pattern detection systems can certainly be disrupted but it's often challenging. What does this mean? I don't know. I want to believe that LLMs aren't conscious, it's a far more comforting thought and the evidence here is strong but how far are they from understanding? Are the centers, the hubs, in our own brains that form around languages or other skills just LLM-like statistics-machines that work to predict the patterns of stimuli we encounter in the environment? Could we string multiple LLMs together, trained and tuned differently, in a way that started to approach some rudimentary 'intelligence?' After all, one of the reasons teachers and students "show their work" with math problems is because our own prediction machines sometimes skip steps or intuit a wrong answer. Is a math-tuned model possible? Maybe not as an LLM but a separate type of system that an LLM could draw upon? I certainly don't know the answers here but I do feel my intelligence is emergent and not so special as to be impossible to reproduce. Very thought-provoking (and speaking of prediction machines, it's funny that RU-vid put this in my suggested videos even though it has only 580 views at the moment and none of my other suggested videos are AI-related -- seems YT's machine-learning is working pretty well).

@GerbenWierda 10 месяцев назад

Thank you. The remark on children learning makes me think about how Uncle Ludwig handles meaning in (hence understanding of) language. It boils down to "Meaning of language elements lies hidden in what a community of people with shared experiences consider correct use". Basically, children learn by *using* elements and finding out from feedback if they are considered *correct*. (Uncle Ludwig really was spot-on if you ask me). If that is correct, then token prediction resembles understanding because it is directed at 'correct' use. That is why it is so often so close. But it is not correct use of the same elements we see as elements (words and phrases, and even intonations and such), but something very technical: the tokens (n-grams, I recall that in the early 1990's there even was a patent on using n-grams, but I digress). While the correct use of words and phrases is how we humans learn understanding (answerable with correct/incorrect, so true/false), the 'guessing-type' use of tokens (effectively out of a set of close alternatives) is how these models do it. There are some deep questions lurking here. For instance, can there be understanding without experience? (My guess would be 'no or almost entirely no' - which would have consequences for LLMs as well) Can we build something better out of multiple LLMs? We might (The cool Cicero application from Meta bearing some resemblance to that approach), but the issue of size explosion might make it impractical, in the same way that an almost infinite set of rules (GOFAI's Cyc approach) runs into physical limitations quickly (especially on performance). We might have to build things that are 10k-100k as large and will that ever be feasible? Digital approaches are quickly overwhelmed, which is a fundamental problem too if you ask me. More on the conviction-story here: ea.rna.nl/2022/10/24/on-the-psychology-of-architecture-and-the-architecture-of-psychology/ And if you like it (which you do) I ask people to actively spread it in their circles. I (and others) think it might help if this explanation was a little wider seen.

@ppulkkinen5308 10 месяцев назад

Thank you, very interesting and well presented. I would loved to hear something on multi-modality. Stephen Wolfram thinks that the GTP-4 has a language-independent metamodel of ”words”. So no sparks of the AGI yet?

@GerbenWierda 10 месяцев назад

@@ppulkkinen5308 Thank you. I wonder what "a language-independent metamodel of 'words'" would mean (I could not find it), unless that stands for the embeddings of tokens. I ask everyone who thinks this is worthwhile to watch to actively spread it in their own circle. It's pretty hard to get realism a place, given the fever currently out there.

@GerbenWierda 6 месяцев назад

Sorry @@ppulkkinen5308, I forgot AGI. No. These models 'understand' token/pixel distributions. That is like understanding 'ink distributions' of a printed text. With this understanding of 'ink distributions' the can approximate the results of 'text/image that can be understood by human understanding'. There are no correct and erroneous results (no 'hallucinations') only successful and failed approximations. See ea.rna.nl/2023/11/01/the-hidden-meaning-of-the-errors-of-chatgpt-and-friends/ (and more in the series of which that post is a part, and that grew out of this presentation)