Scaling laws are explained by memorization and not intelligence - Francois Chollet

Подписаться 328 тыс.

Просмотров 12 тыс.

50% 1

Full Episode: • Francois Chollet - LLM...
Transcript: www.dwarkeshpa...
Apple Podcasts: podcasts.apple...
Spotify: open.spotify.c...
Follow me on Twitter: x.com/dwarkesh_sp

Опубликовано:

14 окт 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 78

@SrikarDurgi 4 месяца назад

"Generality is not specificity scaled up" 💯

@alst4817 Месяц назад

Hmm, this is far from obvious I think. It’s not clear that general intelligence actually exists in nature, never mind in AI

@feignit 4 часа назад

Why not? Maybe it's a definition issue but a NFL quarterback is very good a throwing a football to a moving target. All the mechanics and predictors inside the brain are probably pretty specific but they can most certainly be transferred to other physical applications "generally". I would consider the layer that orchestrates that to be the key general intelligence.

@memetb5796 4 месяца назад

Thank you, really appreciate the podcast. One note that I kept saying to myself as I was listening to this is that the examples Francois was giving and that you were rebuking were perhaps too pedestrian to be striking (specifically the kind of arguments like "this day is completely novel"). Imagine instead asking GPT 6 to "solve climate change", "engineer a chloroplast into human skin cells", or "come up with a novel combustion engine that is neither a reciprocating engine nor a rotating detonation engine nor a turbine"... all of these questions pose the exact type of novelty challenge that Francois is raising and none of them sound so plausible that a person could scoff at it and say "of course GPT 6 will be able to solve that". I also appreciated the fact that the topic of decimation of the work force for automated tasks was mentioned: it actually raises a much more important problem which is that if we do end up doing that (sending talent packing and relying solely on solving automateable tasks), we may as a species kill any ability to innovatively solve challenges for which we have never seen a distribution.

@julkiewicz 4 месяца назад

I think the notion of "this day is completely novel", even though exaggerated, was supposed to highlight the fact that even things that we find mundate in everyday life do actually consist in portion of completely novel experiences (at least novel to that one specific person, not humanity as a whole). I think this is what is highlighted by the other arguments he brought up like the fact that self-driving cars need to be trained for each specific city, and they still at time stumble and do horribly stupid things because they run into one of those micro-holes in their training set, and their poor generalization skills didn't allow them to plug that hole with intelligence.

@julkiewicz 4 месяца назад

A simple example of what Chollet is talking about is that even with all the advancement, LLMs still cannot add two very large numbers together. Clearly just learning the algorithm is the most efficient and the most compact way to store that skill. And it's also has incredible predictive performance. Yet, you give the newest ChatGPT two sufficiently large numbers to add and it'll be unable to produce a good result. It won't even get the number of digits right. If you scale that, all you'll get is more memorization for larger and larger numbers in the training set. But there's no way you can train on all possible additions of two operands.

@615bla 4 месяца назад

this is actually not really very different from human mind, we also cant really multiply or add two large numbers, when we do, it is not a pure mental effort, we use tricks notes and methods that we were taught, its mechanical and external. it is not a natural action for human brains, i would even say most humans will not find the tricks required if never taught

@воининтернета 4 месяца назад

various transformer models trained specifically for addition have been studied by mech interp researchers - with sufficient training they learn a general algorithm and not just memorization. haven't seen this done though for an arbitrary number of digits, but given that the learned algorithms for fixed number of digits where quite clever, I don't have much doubt that it's possible. but the main point still holds - all the claims that DL is just memorization are easily refuted by existing mech interp research

@sloth_in_socks 4 месяца назад

I just tried adding 2 random 10 digit numbers in Gemini and it gave the right answer. Are you claiming it's memorized hundreds of billions of combinations?

@julkiewicz 4 месяца назад

@@615blaWhat do u mean i can't add or multiply. I most definitely can if like LLM i have ability to step through it. Especially if i had so much context and memory as an LLM. Children reinvent addition algorithms on their own all the time. I did that when i was preschool. It's not that impressive

@julkiewicz 4 месяца назад

@@sloth_in_socks10 digits is not enough. It stumbles after a certain point, like 20-25. Certainly ChatGPT does. All versions. And it's not like that's a harder problem. It's just harder to get away with memorizing semi patterns. E.g for longer numbers often the beginning is fine, the end is fine but the middle is garbage. And not even number of digits is correct, off by 1 or 2. A mistake that's hard to justify if we claim it "understands addition"

@ironeagle4274 4 месяца назад

Memorizing solutions using a set of programs from its existing database of samples isn't necessarily a form of intelligence but it is a valuable tool. If the AI begins to generate unique code to solve novel problems, then we've got a truly remarkable development to discuss.

@stevo7220 4 месяца назад

The point is it will never be able to generate unique code to solve novel problems , because it has never been exposed to the patterns those novel problems require in order to solve . It can only solve problems with patterns that it has been exposed in the training data . Scalability will increase its memorization of pattern appliance or reasoning patterns , it just applies them . And scalability to the untrained eye will looklike a general intelligence because it will perform pretty good within the training data . IT cannot improve , it cannot be flexible and change its output in the data ... It will never be able to solve problems with novel patterns like breakthroughs or Millennium Prize problems . All the problems that are not solved by humans involve a % of novelty of patterns that even Transformers cannot find because they cannot recognize novel patterns . They can just perform very well with the patterns of human discoveries .

@ironeagle4274 4 месяца назад

@@stevo7220 I see your point, but I would ask you if these systems will ever reach real intelligence in the form of creative responses or do you suspect that AI will only ever efficiently master the knowledge and skills that we humans have developed? The question is: will AI gain the ability to think in a unique sense or will it merely be a tool that rearticulates what we humans have collectively given it as input. This question is at the core of the fears that many have over AI. Our imagination of the dangers of AI run wild to the point where podcasters like Rogan fear that AI will be able to self improve thus approaching a god like state whatever that means. Though I must admit many from this group don't seem to understand what LLMs actually are and even I must aknowledge the limits of my understanding as well. In any case, unlike Joe, I view AI, in its current state at least, far more as a tool than as a threat to humanity and if anything the question of whether it will ever at any point in the future be conscious, self editing, or creative poses interesting side implications such as the nature of what consciousness and thought even is. But I digress…

@stevo7220 4 месяца назад

@@ironeagle4274 Yeah very nice question . First off....Iam college student in Biochemistry majoring in Machine Learning specifically for the correlates of Neuroscience and Intelligence . The point that everyone is afraid is AGI ( Artificial General Intelligence) . The problem is you can't achieve general intelligence like in humans with generative transformers like GPT 4 or neural networks , Thats not how the natural intelligence does it . There are commonly 2 networks one is intuitive / automatic , what Chollet have been saying as "Memory " , the other one is "Active Inference " or when you're thinking or actively Reasoning in tasks like some kind of Tree search or Search in short . The point if we will be able to achieve AGI is not certain , but nature told us that it is possible . We know it is possible because we've achieved Superhuman level in intelligence domains that are closed . Systems like in Chess , Go , and any system that is closed. Stockfish 16.1 Chess Engine can be said that is smartest single agent in any intellectual domain ever created . With Generative transformers in order to achieve superhuman intellectual capacity ...it needs experience and training . computationally

@HarpreetSingh-xg2zm 3 месяца назад

@@stevo7220yep true general intelligence would be able to, the current scaling process doesn’t seem to be headed that way.

@BadjaBeats 4 месяца назад

i want leopold vs francois

@Gigasharik5 4 месяца назад

slop vs kino

@rongbingmu3758 4 месяца назад

Not even close. Francois is a deep learning OG with countless big projects under his name. Leopold is a 2021 college graduates who hopped on the OpenAI hype train at the right time. He doesn't have any accomplishment other than briefly worked at OAI

@BadjaBeats 4 месяца назад

Leopold seems like a good ambassador for the scaling maxi camp, not because of his deep experience in the field, but he seems well informed, standing on the shoulders of Sutskever, Hassabis etc. He is also highly motivated, energetic and entertaining to listen to.

@mttiii 4 месяца назад

Francois Chollet is just doing on the fly speech synthesis.

@wei-ching_lin 3 месяца назад

hahhahaha🤣you make my day

@aaronbeach 6 дней назад

Maybe someone has already done this: It would be interesting to develop a set of questions that better test for AGI as a benchmark. An example of what I think such a question would be I tested out on GPT 4o-mini #1. After asking the LLM how it solves new problems, I asked it to use this information to generate a question it could NOT solve #2. I then asked another instance of the LLM to solve the problem generated by #1 #3. I then asked another instance if the LLM's answer from #2 correctly solved the problem from #1 LLM #3 said that LLM #2 correctly answered LLM #1 question (I also agreed that it did). This means the LLM failed to generate a question it could not answer. This is maybe just one example of the type of automated questioning that could form an AGI benchmark...

@ayanchoudhary044 4 месяца назад

I think what that means is that LLMs are very good at system 1 thinking but not at system 2 thinking. For that we need further advancements in the field.

@benprytherch9202 2 месяца назад

I love Chollet. The case for "memorization" is overwhelming. First, it's what LLMs are designed to do! They learn to generate text by practicing guessing the next token on existing text. Beyond that, there are piles of research papers showing that LLMs perform much better on "cognitive" or "reasoning" tasks when the particulars look like what's in their training, as opposed to when the particulars are changed in ways that are arbitrary with respect to the task at hand. For instance, LLMs do a vastly better job at deciphering encrypted text that was created using a very popular algorithm vs. some slight variation on that algorithm. Performance on any task is strongly related to how often that task shows up in the training. Humans, on the other hand, can apply a general algorithm using very few examples, or sometimes just a set of instructions with no examples. Hmm...

@AlexanderMoen 3 месяца назад

eh, I don't buy these arguments so much. I lean toward what Dwarkesh brought up at the end: humans need templates and training as well. If you never taught a human how to read they won't just magically conjure it up without training, no matter how good they are at whatever else. If you provide an AI application a sufficient amount of templates, allow them to test and play around with both known and unknown ideas, and give them some ability to generalize, I think you've pretty much nailed intelligence (given sufficient memory, processing, etc) and will even get new and creative ideas. I think a lot of people who make counterarguments against the potential of machine intelligence have some sort of intangible and undefined thing sitting at the center of human intelligence. Just because we have some idea of how current AI built up to what it is, but cannot decompose human intelligence down to 100% intelligible parts, they seem to define this gap as insurmountable.

@jodiak 3 месяца назад

but the difference between you and I and an LLM is we can be shown a new example or "template" we've never seen before generally and be pretty good at reasoning about how to solve it, while this isn't the case with LLMs. For example even after consuming all known books on mathematics GPT3.5 still could not reverse a list of numbers until it was trained on hundreds of thousands of examples showing how to do it. I don't think we need to define intelligence in every aspect in order to replicate it but a key component of intelligence is being able to solve new problems that they haven't seen before which LLMs cannot currently do.

@ThePowerLover 3 месяца назад

@@jodiak Our training set is still very different to make comparisons like that.

@benprytherch9202 2 месяца назад

@@jodiak bingo. A human can read a single book on mathematics, and then apply the general methods to brand new problems. Maybe someday and AI will be able to do that too, but as of right now those don't exist.

@ToonamiAftermath 21 день назад

Synthesizing a new template is a form of metacognition. Who says you can't train an LLM to perform metacognition in this way? Also even if you don't specifically train an LLM to do this it may be an emergent property as you scale, it would make sense as a form of compression.

@Pixelarter 4 месяца назад

So according to Chollet, are humans AGI or not? To me it seems we don't pass his definition. Because we are also only able to create up to a finite horizon in front of our previous mostly-memorized factual-and-algorithmic knowledge. Otherwise humanity would have massive technological leaps, instead of our recorded progressive history. Most of our new knowledge comes from small mutations and recombinations from previous knowledge, with a lot of trial and error leading to discovery, instead of deep understanding of totally unseen problems and being able to find out a solution on first try.

@tristanwegner 4 месяца назад

I would argue, humans DO have general intelligence, but it is not needed that often, and it can be argued, that a person who is good at learning/memory/using the right patterns, can almost have a success live without ever using it.

@dpasek1 4 месяца назад

"...On the first try?" Do you expect everything to be deterministic? Nature is random, and also chaotic. Many problems that are deterministic cannot be solved in closed form.

@mikezooper 3 месяца назад

Humans do more inference with less data. That’s because they use intelligence to fill the gaps. LLMs need all the data because they can’t intelligently infer when there’s gaps.

@ThePowerLover 3 месяца назад

@@tristanwegner Who has shown "general intelligence"? DaVinci? Von Newmann? Einstein? Besides moments of supposed "illumination", no human seems to have that.

@ThePowerLover 3 месяца назад

@@dpasek1 Nature? GR is deterministic, and the Schrodinger Equation is a linear equation (and the Born rule is derived from its pure form).

@gunnerandersen4634 4 месяца назад

DISCLAIMER: I tell you ideas from my naive perspective and a lack of understanding, in a hope that someone with a deeper knowledge can either shape the idea / feel inspired somehow by it, or help me understand why it makes no sense if so, therefore also helping me understand more about such an amazing field! What if we combined Transformers with something like an EBM that is trained explicitly designed for reasoning and uses something like A-star heuristic search or other approach to brake the problem down into a set of "basic blocks", the approach could break any problem to the most basic core problems in a systematic efficient way (perhaps using some current potentially useful algorithm for that matter like the mentioned A-star or any other if there is one), then it uses the transformer model knowledge for the pure knowledge retrieve part, as when we recall from our memory and knowledge, i.e: before writing down a formula that we memorized, then once again in the EBM, it could adapt dynamically it's energy function (instead of a predefined function) to come up with a proper solution not based on parallelism with the current knowledge alone but with a reasoning of the problem as well? Let me explain how I see it, in the example for the school kids distribution and the %, instead of looking for a similar problem with different values, it would bake it down into more basic knowledge, it would split equation knowledge and statistics knowledge, and so on, then it would make a combination of this parts, more like Dwarkesh proposes, rather than how Francois says it might be (filing a template) and that combination and "reasoning" would be performed by a model that is more fit for a task like that (perhaps an EBM or something like that) where the core foundational ideas of equations, statistics, etc.. are combined in many possible ways, until the minimum energy state is achived (the problem is solved) and that is the response the model gives you, not only based on knowledge, but also based on a process of barking down the task, combining fundamental knowledge and achieving a minimal energy state that represents the final response to a complex task. Love to hear thoughts on that possibility of model combinations and how it might / might not be interesting.

@dpasek1 4 месяца назад

"Generality is not specificity scaled up" Right! Generality is specificity abstracted, not accreted. To put it as an example; generality is like a differential equation and specificity is like a particular solution to that differential equation. It takes memorization ability to accrue a collection of particular solutions, but it takes something like general intelligence to derive the underlying principle from that collection of solutions. You need to be able to hold a sufficient number of particular solutions in working memory to be able to find that underlying principle. I would also say that a higher general intelligence will be able to arrive at an underlying principle with a smaller number of specific examples. AI is not intelligent. Memorization ability is *not* intelligence, and many people do not know the difference. AI looks to me like an Nth generation database management system combining a massively effective memorization and retrieval system with a sophisticated probability algorithm used to accomplish the retrieval instead of a traditional DBMS index file. AI so far does not seem to accomplish any synthesis of general underlying principles, but it seems to have replaced the DBMS back end programmer quite handily. ChatGPT, for example, is incredibly stupid but has a massive memory with rapid retrieval. I have encountered numerous examples where it is unable to correct its obvious errors even when they are pointed out, and it continues to fail through multiple iterations of attempted correction. One example is that it fails spectacularly at balancing simple chemical reaction equations. One frustrating aspect of ChatGPT is that it is unable to cite its sources of information even when it reasonably could be expected to be able to point to a primary reference. (This might be just a training requirement defect.) So, for casual research, it is of only limited use. Newer LLMs might be getting better, but I doubt that they will make the jump into synthesis of abstraction. If they do, then watch out. I am guessing that it will take a different model other than LLM to accomplish synthesis. It might eventually be possible.

@antonymossop3135 4 месяца назад

Yes, I agree. The present artificial intelligence approaches make good students, able to solve problems that have been taught to them. Scholarship rather than creative thought...

@natzos6372 4 месяца назад

no existing intelligence is really general if you take it this strictly, humans are specialized from my perspective. These kind of discussions seam useless when there is no established definition of these concepts.

@jgonsalk 4 месяца назад

He does give the example of the Arc testing framework (hopefully I've not misspelt it) that five year old humans can figure out and LLMs really can't. So it's more that LLMs can combine patterns in the training data but can't learn wholly new patterns. The question then is whether the scaling laws get a reasonable approximation of AGI which he argues it won't because we see novel situations everyday but are so well developed to handle them that we don't even need to think about them. The rest of the talk covers it a bit better.

@julkiewicz 4 месяца назад

Intelligence is a spectrum. The more intelligent something is the more generalization power it has and fewer examples it requires, the more complex the generalization structure can be. That's in contrast to just having more and more examples to generalize from. It's like the difference between a really great lawyer and a lawyer who is mediocre but has an incredibly robust database of templates for every conceivable type of legal document. They may perform on the same level for some tasks, but give them a precedent case to argue in front of the supreme court and one will excel and the other will fall apart. Of course as he also says in another podcast, all intelligence is a specialized intelligence to a degree. For instance humans are particularly good at spatial, but really bad at longterm planning. But we still can manage way way way more generalization power than any LLM, while an LLM just has an absurd number of templates to reuse.

@ThePowerLover 3 месяца назад

@@jgonsalk Very bad example.

@mikestaub 4 месяца назад

LLMs are only-the-fly generating programs though

@julkiewicz 4 месяца назад

That's not what he meant if you listen to the whole thing. It's not about generating text as in "producing some output". What he means is synthesizing a novel program that solves a problem and that isn't just a rehash of one of the existing programs that was present somewhere in the training set. What he means is that in response to an input-output problem that's resistant to memorization, AGI should internally synthesize an algorithm and then apply it.

@float32 4 месяца назад

I would claim most of it is memorized. If you ask them something remotely out of the ordinary, they fall apart.

@mikestaub 4 месяца назад

@@float32 I think that is mostly a product of the RLHF, the jailbroken models of llama 3 for example seem far more capable than the standard ones

@julkiewicz 4 месяца назад

@@mikestaubwell a person with w calculator is not more intelligent than the same person without a calculator. More capable, yes. But not intelligent. Giving access to external tools just obfuscates the measurement process.

@mikestaub 4 месяца назад

@@julkiewicz I think the distinction is not useful and biased towards carbon-based intelligence. The brain is a tool

@kevinwoods9274 2 дня назад

I just tried it. I had Chat GPT add together trillions and trillions and it did just fine. Super quick.

@KP-fy5bf 4 дня назад

This is not really true. Memorization would entail zero adaptability, as soon as an example is seen that was not in the training set the LLM would falter. Yet you have LLMs that take user input that is inherently stochastic and it can come up with novel outputs. Sure it might have memorized patterns that enable that, but that literally everything. Reasoning cannot exist without memorization. You either have to memorize low order truths close to the axioms and derive all others through logic or you memorize higher order truths, but LLMs are able to build off knowledge that isnt brute force applied memorization

@deadeaded День назад

Memorization only entails zero adaptability if you think the LLMs are memorizing a list of facts. What they're memorizing is a (very complicated) probability distribution. That distribution can be sampled to generate novel output, but it's still the product of memorization.

@mttiii 4 месяца назад

LLMs may be able to give you the algorithm, but they are not good at executing it. (yet)

@tubestreamkyki 4 месяца назад

The only prove will be GPT-5, whether it's memory or real intelligence.

@garrettmillard525 4 месяца назад

Brilliant

@marshallmcluhan33 4 месяца назад

Compression is intelligence

@joseph_thacker 4 месяца назад

@LeonidKotelnikov-jg9fi 4 месяца назад

опять Франсуа по ушам ездит)))))

@Gigasharik5 4 месяца назад

он прав во всем

@mttiii 4 месяца назад

Francois Chollet is wrong. You heard it from me first 😂

@jgonsalk 4 месяца назад

Haha! I think he's right, so at least one of us will win!

@dpasek1 4 месяца назад

@@jgonsalk I also think he is right and I have been saying something almost identical for over a year. The problem with acceptance of his ideas is that most people confuse memorization ability with fluid intelligence.

@jgonsalk 4 месяца назад

@@dpasek1 you are well ahead of me then! I only recently noticed the massive gaps. I think it'll become more widely known in the coming years. Basically, when it doesn't happen