I think this is a fantastic episode. I know I'm personally biased, as I was the fan which was mentioned at the end of the talk, but I think something to keep in mind is that Brandon has been doing this for years: the better part of 2 decades. As mentioned many times, the nuts and bolts are his jam, and if you are someone who is thinking about going into ML/Data science for a career or school, this is an example of someone who has been looking at ML as a long term project, in addition to applying those frameworks to real-world problems/applications. Additionally, he's been trying to synthesize and democratize that core knowledge into bite sized packets of information that are very understandable, even for those with a limited amount of formal mathematical training. Personally, I don’t think I’ve seen an MLST episode where either Tim or his guest(s) have spent more time smiling/laughing while staying on topic. That enthusiasm is so infectious! These are complicated and serious topics, but that injection of light-hearted fun exploration of these topics make them seem approachable. Here’s a question for us to think about: did anyone feel lost during this episode? Now, pick another MLST episode at random, and again, repeat the previous question to yourself. There are not even many people who even appear on this channel (experts in this field mind you), that could honestly say that there wasn’t a part of almost any randomly chosen episode that they didn't experience this confusion (of varying degree, of course). With that thought out of the way, I feel I could present this episode to my mother or a high school student, and they, for the most part, would be able to follow along reasonably well and have a blast while doing so; that says something! So, major kudos to Tim for a fantastic episode, and kudos to Brandon for being a part of it!
I don't think I've seen so much cope this year, this pair will still be saying LLM's can't do X after they are better at everything than any human on earth. "But it's still just matrix multiplications, it can't smell"...yet.
Great comment. Never has a subject made me feel smart quite the way AI does. I've never in my life seen so many seemingly intelligent people talking complete nonsense.
Some good points, but I have to say I disagree with the consensus expressed in this conversation. Would be nice to include someone with the counter-hypothesis, because without that, it devolves into strawman. LLMs as described, out of the box, are analogous to a genious with a neurological defect that prevents reflection prior to saying the first thing that comes to mind. Various LLM cluster model's have addressed that weakness. Tools such as tree of thought have greatly improved the ability to solve problems which when done by a human requires intelligence. If you want to know if these systems can reason, I recommend starting with the paper "GPT-4 can't reason". Then follow up with any on several papers and videos that utterly debunk the paper. (Edit: accidentally clicked send too soon)
3:10 Disagreed, you can come up with an abstract riddle or puzzle that doesn't exist but requires logical reasoning to solve and GPT4 will do it just fine. It has the capability to reason, not just to store and retrieve information.
the jury is still out on this one, has not GPT4 solved all 10/10 coding problems from a well-known coding test from 2020, but 0/10 from the same one from 2022?
@@Srednicki123 Memorization certainly helps, but it _can_ reason. It can't reason consistently, but the fact that it sometimes can says enough. If a capability is demonstrated like this, getting the consistency up is just an engineering problem that will be solved.
13:12 "I would include other sequences of other types of inputs". Yes, LLMs, are limited to texts, or linear sequence of tokens. We also think in pictures, at least 2D (a projection of 3D we see), and we can infer the 3D and have a mental model of that, even with time dimension (4D spacetime), but most of us are very bad at thinking about 4D spatial, or 5D etc. since we never can see it. But my point is, while we CAN traverse e.g. 2D images (or matrices) sequentially, pixel by pixel across, then line by line, that's not at all how we do it, so are linear sequences limiting? To be fair, we do NOT actually see the whole picture, the eyes jump around (saccades) and your mind gives the illusion you're seeing it whole, so maybe images, and thoughts in general are represented linearly? When you play music, it's linear in a sense, but for each instrument, or e.g. for each finger that plays the piano. So again, sequences are ok, at least parallel sequences. How do the text-to-image (or to-video) models work? The input is linear, but the output is 2D, and we also have the reverse process covered. Those diffusion processes start with pure random noise but I've not understood yet how each successive step to a clear picture can work, does it work linearally, because of the linear prompt? [Sort of in a way of a painter's brush.]
Chat GPT is not designed to be boring, it's boring because it's reward function is to predict the next token, it's easier to predict it when it's more mundane and obvious.
The story of the robot arm reaching under the table is exactly what humans do when they hack bugs in games. Like the old ex army gunner back in the 80s that my managers challenged with an artillery game problem. He immediately loaded the maximum charge and shot the shell directly through the mountain then went back to his job muttering "Stupid game" as he departed.
16:06 You can also strip a human brain down to its neurons and find no consciousness. Unless this guy has solved the hard problem of consciousness, his statement doesn't really mean anything...
This is such an underrated comment. I skipped to this timestamp and decided not to watch the video. These guys clearly aren't the sharpest tools in the shed 😂
LLMs used for coding can sometimes be dangerous and lead you down the completely wrong pathway to doing something. I asked ChatGPT, with some back and forth, to write some code to read a .wav file and run a very basic dynamic range compressor on the samples. It had no concept of the fact that .wav files are going to be PCM blocks and just assumed that samples would be []float64. Jetbrains AI assistant was much more helpful, in my experience, and knew that you would need a library to decode PCM blocks (and directed me to the most popular one!). It's a bit of a niche subject, but it was rather alarming to me.
I love the description of a car in terms of personality it seems to possess. Yes, most who are into cars have such an experience. In fact, I used to race mine on the trace track. There were days where I did not feel like wanting to push very hard but the car seems to "want it". The result was unexpectedly enjoyable and surprisingly exciting drive. It was the pleasant surprise I needed at the time.
As regards agency in LLMs, Searle's Chinese Room Argument springs to mind. You cannot reduce agency to syntax. Agency is a response (individually, socially and evolutionarily) to the problems, cares and concerns that embodiment entails for the organism.
Rohrer describes our ability to excessively imbue things with agency. Rohrer is guilty of making this very mistake in his assumptions about human intelligence and understanding. Language models are certainly not humans, but humans are almost certainly more similar to language models than Rohrer would like to think. Much of human learning and processing doesn't play out via formal or symbolic systems, rather, we are also very skilled statistical learners. Rohrer seems immune to decades of research coming from Connectionism. There is considerable evidence that we are capable of learning not just surface features, but also abstract category structures and models via statistical learning. The extent to which these processes underlie human cognition is an open, empirical question. I do agree with Rohrer that our definitions of intelligence have been too anthropocentric. We need not use human intelligence as a lens for viewing a model. If we choose to use that lens, as Rohrer has done here, we should do so in an informed way.
No, LLMs are not super-intelligence. On the other hand, they can summarize context from large corpora and basically have access to everything they've been trained on. People don't have those abilities. So if your point is that LLMs and the agents that run them aren't perfect, you have a point.
I don't understand this confusion over whether it is intelligent or not. It clearly is. The intelligence is the prediction of the next word. The understanding is the statistical inference. The words we speak betray the structure of the world through their statistical correlations. Our subjective experience of the world *is* what statistical correlations feel like, viscerally. It's really not that difficult to understand. Yet everyone seems to be simultaneously confused by and unable to release this notion of cartesian dualism. Experience is ambient and implicit in the physical world, it doesn't need to be conjured, like seem biblical miracle. It just is. And so it is obvious that GPT4 or whatever has subjective experience just as everything else does AND it is obvious that much of that subjective experience aligns with our own due to the isomorphic set of statistical correlations within the machine and our own minds. It's absolutely obvious to me and it really bugs me that no one else can see it. Maddening.
I don't think it's that people refuse to understand, I think it's that "smart" people have been told they're smart their whole life and their paycheck relies on them being blind to it. They will be replaced.
Okay, so we have to make a distinction here. There are two main kinds of definitions to use for “intelligent”: an outcome-oriented definition and a process-oriented definition. In other words, the ends and the means. Now, the ends of both human and LLM intelligence are pretty similar: We can both learn and solve problems. However, humans are still superior at solving novel problems, aka problems that are outside of the dataset. Okay, why is this? It’s because of the means. The means that humans use to learn and solve problems are very different and more applicable to novel problems than what LLMs use. The means by which LLMs are intelligent is simply by predicting the next word. What comes after “scrambled?” Eggs, obviously. So what are the means by which humans are intelligent? Well, researchers are still actively attempting to figure this out, and they are still not even close to understanding the full complexity of the answer to this question. However, there is one pretty strong consensus in the research: Relational reasoning is central to human intelligence. LLMs are already engaging in one form of relational reasoning: understanding covariance. They understand which words usually go together and which don’t. However, merely understanding covariance is not enough to replicate the extraordinary abilities of human intelligence. There are many types of relationships that LLMs must use at a foundational architecture level in order to replicate our intelligence like opposition, comparison (including temporality), causality, hierarchy, and spatial. Expanding on what I said earlier about relational reasoning being central to human intelligence, human intelligence is in large part the efficiency with which we build, maintain, and compare sets of relational bindings in working memory. What does this mean? Example from Chess: 1. **Building**: Recognizing patterns and relationships between pieces, like how they can move and work together. 2. **Maintaining**: Keeping track of the game's changing dynamics, like remembering opponent moves and adjusting strategies. 3. **Comparing**: Evaluating different moves by imagining their outcomes and choosing the best one based on strategy and potential future positions. As for whether AI is conscious or not, I truly do not know. It’s far too early to say.
@@ChannelHandle1 I agree this is a valid point, but tbf few that want to argue these type of semantics bother to establish this context. Which I think is a type of moving the goalpost when somebody is making the case that say an LLM has sufficient intelligence to model langauge. So discussions turns into people talk past each other about what is philosophically significant in that example.
Except statistical correlation is not what we do when we're "being intelligent." The thing we value in intelligence is the ability to jump from statistics and data to conceptual, rule-based understanding. Humans are able to abstract over multiple instances to generate flexible, rule-governed concepts. For example, if you teach a bright kid how to add/multiply 4-digit numbers, they will be able to use that algorithm on numbers that are much, much bigger. They understand the concept "number" and know what rules apply to it. The same is not true of a statistical learner: if you give them things far outside their training set, they will not be able to use what they have "learned".
@@didack1419 Probably yes. I have a growing suspicion, the more I study AI and neurology, that consciousness arises at such a level, where it can focus the channeling of output (and/or short-term memory) to one particular corner of the network (one brain region) at a time, and so this is where higher-level properties necessary for our consciousness, like selective attention, awareness and decision-making, seem to emerge. As all good neurologists and psychologists realize by now, we do not consciously perceive everything going on in the brain simultaneously, and if we did it wouldn't work. Some mechanism takes all this and funnels it into a single output, and the input-output model of AI has held up in recreating this for at least the lowest level. But many in the field have made the mistake of assuming it ends there, thinking the collapsing of (sensory) input is the end goal. Now we're realizing that we need some other process to handle a bunch of different outputs as its inputs and "decide" what to with them. Easier said than done, but it seems to be the only path forward.
I forgot to mention, crucially, that the final bundle of outputs is a continuous process which feeds back into the network at at least the lower levels, resulting in a continual feedback loop. This is where self-awareness and the immediate perception of the flow of time come in.
In today's academics they do not care about fundamentals, only thing they care is the number of publications, more better. It is not their job to explain to stats, ML students, how DNN does work on fundamental level. Because the most prof. and TAs do not know, therefore they ignore those uncomfortable questions. End the day students have to figure out by themselves and MIT is no exception. After college you have some fuzzy concept about ML, but you do not understand it on fundamental level. That is end up with trial and error method - that is wast of time for 8 years of your life.
Great conversation! I really agree with almost all of Brandon's points. General agents may not be considered intelligent for to be optimal at all scenarios. RL is brittle and hard to train! Great insights.
I consider curiosity and exploration to be identical. The trick is making it intrinsically rewarding to discover novelty at higher and higher levels of abstraction. After you've learned all the basic possible sensory inputs and motor outputs to be had they cease to be rewarding but then you chain them together to do something new that's rewarding unto itself because it's a new pattern but at a higher level of abstraction. Then you build hierarchies of these reciprocal sensory/action chains (which includes just internal volition within an internal model to form internal thought or an internal monologue) to achieve new conceptually novel activities and actions. This naturally requires a capacity for even detecting abstract patterns. It's the learning itself and detection of successively more complex and abstract patterns that drives curiosity/explorative behavior. What I've been interested in more recently is all of the research about the basal ganglia (striatum, globus pallidus, etc) and the interaction between the cerebellum and cortex. Cerebellum literally means "little brain" and they've discovered that it's not just a motor output stabilizer, but is critical for the cortex to do everything it does - from vision to audition to thought and of course controlled and refined motor output. If the neocortex is like a logarithmic spatiotemporal hierarchical pattern detector/generator then the cerebellum is a sort of linear time window for associating virtually anything going on in the cortex via feedback through the thalamus, and with feedback from the basal ganglia for reward prediction/integration into behavior. Seeing that someone like Brandon here is thinking a lot of the same things that I have about a general intelligence and robotics makes me really excited that we're definitely super close to a future of infinite abundance and humans being able to transcend preoccupation with sustaining our biological existence (aka 'The Singularity'). This was a great MLST, thanks Tim!
Great chat, very enjoyable! Brandon is quite wells-spoken! PS. For the uninitiated (if there is even any such following this channel), Brandon, I guess for the sake of keeping things as accessible as possible, mostly describes language modelling in the very last segment, not really the Transformer. The Transformer is just one of the ways to do it.
Many seem to be making arguments about human intelligence analogously to the God of the gaps argument. Human intelligence are whatever machines and animals cannot (yet) do. This will likely leave us less and less room as technology progresses.
I concur, it all boils down to semantics... if all of these intelligent oposing positions to your point can conceive this the model will break. What is the empirical truth of how an LLM can operate? It needs a computer, something to compute it. How does a computer computer? By the simple cycling of power, on and off, 0's and 1's in continuous incredibly complex and at light speed. How does a human brain function fundamentally? Well it's provided blood by a pump, and the pump is in a feedback loop with the brain and other organs. Some organs provide the ability to convert nutrients, distribute, filter etc. In the end this system creates ELECTRICITY. This electricity is Cycled on and off, or 0's and 1's, in a highly complex algorithmic pattern... "Conscious experience" is the process of stimulus and reaction, on and off. I can elaborate virtually endlessly, but food for your brain at the least. 37:01 - 39:01
The thing that is critically wrong with these AI models is they are trained on text. After watching the video I have a lot in common with this guy your interviewing and I agree on the idea of open sourcing code because it can often be vague and hard to understand. I also built AI systems and found that the AI can have all the text-vision it needs to understand the environment but it's stilll critifcally limited by the input of text.
But why should processing a symbolic representation of the world be a limitation to seeing? Humans don't process light waves in the brain. Light hits a photoreceptor in the eye which triggers a synaptic signal that travels down the optic nerve. It's a symbolic representation of light. There is no obvious reason ai shouldn't be able to see using a similar mechanism.
@darylallen2485 because the words themselves might be limited on how much information they convey, the reason we can communicate content to other humans is because they also have learned about physical reality. The relations between tokens could be too underdetermined for this to be possible.
New AI and others such as Tesla's AI system are using Vision only as input to train their systems. @@darylallen2485 The AI can do well with text-vision input but it wouldn't be able to truly experience the world beyond the words in the text-vision description which is just not how we experience the world at all.
@@didack1419 If anything, the trajectory of language models over the last 30 years suggests that the knowledge that can be encoded by and is recoverable from language well exceeds what we previously thought. It seems like we're still discovering those limits. Your objection raises a couple of important points, though: Language models are trained on text generated by humans whose knowledge and language use reflects their embodied experience and what they've learned about physical world. They may miss out on aspects of knowledge by only borrowing from human embodied experience via language. Just because a language model can learn about the physical reality from language doesn't mean that humans must have learned it that way. Results showing surprising alignment in color judgments by blind and sighted people suggests that humans can learn a lot about the physical world from language alone. This suggests that maybe language isn't such a limited signal
GPT knows more about physics and color than you. It knows more about physics and color than me.....but you think it can't do a metaphore? 12:00 Absolutely braindread 😂 but we're clearly not using the same tool.
It 'knows' so much about physics that it actually also 'knows' just as many incorrect things that have been written down in various places as it does know accurate facts. Some would argue that effectively makes it more of a highly sophisticated search engine (in regard to theoretical information - the same is obviously not the case for writing stories etc) than an entity with any consistent knowledge about the world whatsoever.
@@videotrash "'knows' just as many incorrect things that have been written down in various places as it does know accurate facts" I disagree. I'm with Geoffrey Hinton, who said about LLMs "these things do understand".
8:12 "... you dont do something worthy with capabilities, if chatgpt can replace you..." I think there is the big issue... what if people dont have the capabilities... how will they earn money and find purpose? And if the bar of chatgpt and agents gets higher, more and more will fall into this category... what will people do to earn money amd find purpose if they are not needed.
I've seen lots of bad history essays, including those written by LLMs like ChatGPT. It's usually shallow, generic, and beautifully written beyond what most undergraduates are capable of. However, simply regurgitating facts is not what historical thinking or good historical writing are about regardless of what data scientists might think. Teachers who are particularly enamored of using ChatGPT for everything may not be qualified in their subject to be in front of a classroom.
52:14 I conjecture some natural functions like refractive index or diffusion of subsurface scattering algorithms from nature might be that nudge. My research project aims to repurpose ray tracing render engines to find physics based rendering functions that apply to activation/cost functions in machine learning. It'd be awesome if Rohrer investigated that bridge from physics to apply to exploration.
12:00 "[Won't understand range of embodied experiences] unless it is explicitly represented in text, which is just a very narrow drinking straw." Indeed!!!
hey guys you do know what AI do is called associative reasoning right? you can even make new data a reasoning path if you introduce it gradually in context. aka phase change into semantic from spatial ur welcome.
As a mechanical engineer and also programmer, I see the same thing. You always have to do real world tests because your best designs always meet unexpected hiccups. Also, my AI results on chemical formulations for geopolymers always need to be cross checked, many errors.
But this is EXACTLY what the new Artificial Neural Networks DO. LLMs are just the most recent, trendiest subsets. There is NOT a bank of programmers keying in all the millions of responses and associations that the current "AI"s are called upon to do. The "programmers" basically set up a curriculum (database of tokens), and craft a reward/punishment metric that , in the lingo of the industry, is used to "train up" an AI. This is experiential learning, just a little more abstracted than how we experience it as children. Just as our children are taught about poisonous snakes from books, rather than direct experience in the wild.
“religious belief in the capability of language models”. Amen brother. These things are wonderful tools for some very specific use cases, but suggesting they have human intelligence is an injustice to the miracle of human cognition.
Agency: If you put an LLM into a robot that needs to charge itself every so often and has a sensor saying its low then what is different between our basic agency of solving hunger and its of solving power? Surely, if in a team of robots, it would figure out it needs to go off and charge soon but recognise another robot is using the charging point. We look to have this already in the latest OpenAI 1X robot video - no? These robots look to have agency. Sure there isnt higher level objectives yet but this all just seems a scaling problem - the basics are solved and done.
I keep looking at The Assembly Hypothesis. It looks kind of like Forward-Forward; and seems so biologically plausible. Apparently, random networks of Neurons create GPTs: ie:, embeddings, and next token prediction; emergently.
What's the point of discovering something that isn't new? So if you are trying hard, you are working on the edge. Yes, LLMs can't write code that's never existed before, but they can plan. And if they plan something new by cobbling together code that isn't new, that's still pretty good.
I am a huge fan of this podcast! It’s great to catch them just a little behind Current events . It’s crazy that through pre-simulation to real world there is already at least one company that has created a robot that is doing dishes, folding laundry, sweeping floors, and cooking in a new environment it has never been in by teaching itself. This podcast, two days old🤔; it’s amazing how fast technology is moving.😉😉It seems to me the problem of agency is as good as solved. It’s just a matter of time and technologically, time is moving very fast. It’s taken billions of years for us to acquire agency. It’s also good to remember that when humans are babies, they take quite some time usually one and a half years just to learn how to walk . I guess I don’t think agency is a magic only biology inhabits.
@@gustavoalexandresouzamello715 I can’t remember which university but it’s a start up company out of a University. I was blown away when I saw it. My guess is that another company such as Optimus from Tesla will buy them out and we will see this in mass production which many large industrial companies have contracts for already for 2024
I wonder if anyone will ever read this but Instead of humans creating a Ai Y don’t we make a computer designed a Ai Like its own baby Similar to the science behind deep neural networks Couldn’t we give a computer like the quantum computer codes for deep neural networks and (maybe some biology stuff(or something along those lines)? Give it a blank slate and let it run wild Essentially my idea is to let robots create robots with no human interference
@@xegodeadx what you've hypothesised already exits they call them evolutionary algorithms. Look them up we use them to design antennae the best they've been used for is scheduling. They are usually slow to converge to the correct answer, when they aren't it's because of the way they are designed which is difficult to get right and if you give them millions of neurons... combinatorial explosion ie solving the problem with a computer is slightly better than solving it by hand both are painfully slow.
@@ea_naseer if humans can’t comprehend consciousness and a computer is to slow at recreating it. Where do we go from there. Could you give a quantum computer consciousness equations or questions and see what its answers are. If a computer itself can’t create/birth its own computer I don’t think there will ever be Ai Unless ether or dark matter or some weird science thing is discovered and mastered
well you'd have to define consciousness and a computer can birth another computer it's called recursion it takes up too much memory so no one does i. i think there is a disconnect between the public view of AI which forgive me for saying but like you is a cult like expression of wanting a brain in a vat and what goes in academia which is just building capable machines than the ones we have now. @@xegodeadx
The current state of the art chatbots suck tremendously at simple arithmetic. Building logic and reasoning ability atop such a shaky foundation seems like folly. They lack the ability to inspect and check their own statements.
From the perspective of a hard sciences University student, most English and Literature majors suck tremendously at geometric proofs and differential equations. They seem to lack the focus and rigorous deductive reasoning skills. Is this because they are an inferior architecture? Or is it maybe because during the "Reinforcement Learning with Human Feedback" during the extensive trainup of their neural nets those qualia and responses were not heavily weighted?