Тёмный

ChatGPT o1 - In-Depth Analysis and Reaction (o1-preview) 

AI Explained
Подписаться 285 тыс.
Просмотров 168 тыс.
50% 1

Опубликовано:

 

18 сен 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 703   
@WALDtoon
@WALDtoon 5 дней назад
the only review i was waiting for 😂
@MarkosMiller15
@MarkosMiller15 5 дней назад
yup all others "reviewers" show their bias one way or the other nearly immediately, but not AI Explained!
@morelricht8013
@morelricht8013 5 дней назад
100%
@dustinbreithaupt9331
@dustinbreithaupt9331 5 дней назад
His and David Shapiro's are literally the only channels worth listening too for AI.
@homeyworkey
@homeyworkey 5 дней назад
Where would we be without this guy LOL we'd be so clueless.
@spectralanalysis
@spectralanalysis 5 дней назад
Other youtubers: OPEN AI JUST UNVEILED STRAWBERRY AS RUMORED: AGI??? NO, ASI AND IT SHOCKS THE INDUSTRY TO ITS CORE! BILLIONS OF JOBS LOST OVER NIGHT! Oh and by the way, the rest of this 40 minute video essay will be about fantasies of a world with superintelligent robots.
@Maks4739i
@Maks4739i 5 дней назад
instead of watching OpenAI's own videos, I actually wanted this channel to explain me about the new model. 😂😄😄
@000Gua000
@000Gua000 5 дней назад
Same.
@zivzulander
@zivzulander 5 дней назад
I watch both. Seeing what a company claims about their products/services is useful info even if it's rosy marketing.
@codycast
@codycast 5 дней назад
Cool story
@randfur
@randfur 5 дней назад
Their videos were more choresome to watch this time around, I wish they made them either more slick or more natural and authentic.
@BlakeEM
@BlakeEM 5 дней назад
Here is what OpenAI won't tell you. It does some more chain of through stuff, even though they say it's not that, and it's still super dumb with very simple things. It's a small improvement using brute force, because they are desperate and have no new ideas. Their voice AI is still not out because it keeps talking like the users, and Sora was beaten by the Chinese, so they are grasping to releasing whatever they can to keep them on peoples minds. They been loosing coders to Claude 3.5, and this will start bringing them back.
@mattbelcher29
@mattbelcher29 5 дней назад
Please don’t burn yourself out getting your video out immediately. I think, many of us who watch your videos are always looking forward to your in-depth analysis but we also understand that it might take you a little bit longer to put your information out due to the amount of thought and work you put in.
@AnasOmer-hw4is
@AnasOmer-hw4is 4 дня назад
Let him cook
@AprezaRenaldy
@AprezaRenaldy 4 дня назад
Let him cook
@oliverstanley2177
@oliverstanley2177 4 дня назад
Let him slow-cook TBH I’m here for the all-day crock pot stew of AI analysis.
@pauljones9150
@pauljones9150 5 дней назад
26:30 such a good phrase "stochastic parrots can fly so high"
@blengi
@blengi 5 дней назад
sounds like an alternative ending to original blade runner _I've seen things, you people wouldn't believe, mm_ _Attack ships on fire off the shoulder of Orion_ _I've watched C-beams glitter in the dark, near the Tannhauser gate_ _All those moments will be lost in time, like tears in rain_ *_stochastic parrots can fly so high_* _time to die_
@TheGreatestJuJu
@TheGreatestJuJu 4 дня назад
Try and get Apple voice recognition to understand and type “Stochastic”. Impossible! Apple is falling behind on all Ai.
@clray123
@clray123 2 дня назад
A yet better phrase would be "we taught two new tricks to our Clever Hans horse".
@mcfarlangeoffrey9413
@mcfarlangeoffrey9413 5 дней назад
This review is MILES better than the others I have watched.
@aiexplained-official
@aiexplained-official 5 дней назад
Thank you man, very kind
@medicalaiexplained
@medicalaiexplained 5 дней назад
A new model released? Philip covers every detail within 24h. Just impressive work!
@jan.tichavsky
@jan.tichavsky 5 дней назад
I haven't even noticed there's a new model. Apparently I'm not following AI development closely enough because no smart algorithms have pushed the info to me. But RU-vid did, and rightly so, because I'm always happy to watch your videos, at this time of day I even have time to watch them. 😄
@OnigoroshiZero
@OnigoroshiZero 5 дней назад
After testing the model for a lifetime... He probable lives outside the simulation, so the concept of time in our universe does not affect him.
@Disent0101
@Disent0101 5 дней назад
i think he uses Ai as part of his workflow....
@squamish4244
@squamish4244 5 дней назад
Cocaine's a hell of a drug.
@HAL9000.
@HAL9000. 5 дней назад
Wow. Quick. And the only non-hype opinion that matters. Time to settle down and listen to reason . . .
@orterves
@orterves 5 дней назад
I'm afraid I can't do that Dave
@sagetmaster4
@sagetmaster4 5 дней назад
I want to sincerely thank you for the effort you put into these videos
@mAny_oThERSs
@mAny_oThERSs 5 дней назад
OH MY GOD THIS GUY IS THE FINAL BOSS OF AI ANALYSIS HOLY SHIT. The model is out for 1 day. Meanwhile AI Explained: 0:29
@user-sl6gn1ss8p
@user-sl6gn1ss8p 5 дней назад
I used to follow two minute papers quite closely, but with the whole AI boom it started sound a little too hype-based to me. For one, repeatedly calling this one "Einstein in a box". AI Explained is so much more balanced.
@zactamzhermin1434
@zactamzhermin1434 5 дней назад
@@user-sl6gn1ss8p that channel's content sounds like it's made by gpt2 + trashy text-to-speech it's actually insufferable
@njpm
@njpm 3 дня назад
Why does that get you so excited...
@mAny_oThERSs
@mAny_oThERSs 3 дня назад
@@njpm excited is the wrong word. Its impressed. You can try finding someone els who goes through hours of reading material on the day a product gets released and uploads a detailed 20 minute report video on it the next day early morning.
@BlakeEM
@BlakeEM 5 дней назад
This is not a paradigm shift for my test cases. 1. It failed my custom ball physics test (not the one everyone else uses, since that is making it's way into the fine tuning/training as seen on the Two Minute Papers video), because it still doesn't understand physics intuitively like a human. It assumes a ball would not fall out of an upside down ceramic coffee cup that is held above a table. The model says "2. Cup is held upside down above the table: Assuming the ball doesn't fall out, it remains inside the inverted cup." 2. It was unable to solve a react.js view port lazyload bug. The issue is that more and more images keep loading as you scroll, so the view port jumps around, because it loads more images before the previous have loaded. It doesn't understand this without me doing most of the work and reasoning for it. It kept trying to adjust image sizes and css that was not the problem. 3. It failed to find the optimal solution to setting a square post at a 45 degree angle to a wall. I said I had to only a tape measure, but it kept wanting me mark out spots and measure to the center of the hole. My solution involved simply turning the post and measuring until both corners are the same distance from the wall. It wanted me to triangulate the hole on the other side of the wall and assumed the post would be in the center of the hole (it's at a 5 degree lean, and is not). It did realize that my solution was better and more direct. 4. I asked it "What is the prime factorization of 1,090,101, providing the prime factors and their respective exponents?" and it got it wrong. It does not have access to python or a calculator, once it does it will likely be a big improvement. ChatGPT-4o gets this one correct, by running Python code. I've seen ChatGPT-4 do some impressive stuff by telling it to solve it with code using brute force, even better than ChatGPT o1 is now. As for what it's good at. It can understand and output much more code. It does better at refactoring and dealing with code that involves multiple files. It's defiantly an improvement, but about the same as Claude 3.5 with it's big context length vs ChatGPT-4, when it comes to coding ability. If you want a real paradigm shift, have it analyze it's own text for assumptions, and follow those up with a question before continuing. This one change made it solve the ball and cup problem 100% of the time. Just ask "Did you make any assumptions? If so, correct them, or follow up with a question if more information is needed." Response when using this method: "Assumption Made: I initially assumed the ball stayed inside the cup. Reality: Unless the cup has a lid or the ball is held in place, gravity would cause the ball to fall out when the cup is inverted." Also "Follow-Up: Was there anything preventing the ball from falling out when the cup was inverted (e.g., a lid, the ball being stuck, or someone holding it in place)? Is there additional information about the ball's behavior during the inversion?" All very important questions about my scenario, that I was not 100% clear about, although I was specific about a ceramic coffee cup because they have no lid and can't be squeezed to hold the ball in. It solves many of the weird responses, because it has a chance to notice them and correct them. It's trained on noticing poor responses just as it's trained on giving the correct response the first time, this means you can increase it's ability by tapping into this knowledge. This will lead to much improved answers, and actual interaction to break down and solve a problem, rather than it taking a best guess each time. Most people don't use it to solve the types of problems they are training it on. They need to make it more deductive rather than more predictive. This way it's better at doing research and finding the correct answer in the noise, rather than knowing the answer outright. It needs a BS detector and to be good at using tools. This is why the LLM will never be better than a calculator, so why are they making it do math from memory? You're welcome OpenAI.
@ekstrajohn
@ekstrajohn 5 дней назад
nice tests
@mirek190
@mirek190 5 дней назад
you know that model is not full o1? From the table full o1 is around 60% better in reasoning than preview version.
@korozsitamas
@korozsitamas 5 дней назад
"so why are they making it do math from memory?" because this is an early preview, and the features you are asking for will be coming later
@BlakeEM
@BlakeEM 5 дней назад
@@mirek190 I was responding to the video that is about the preview, and it says "-preview" on the model name, so yes I know. "60% better in reasoning" there is no standard measure as to what this even means. It can't reason its way out of these problems, because it's not reasoning at all, just predicting text output. Calling it "reasoning" is a joke.
@ThatPsdude
@ThatPsdude 5 дней назад
Fantastic testing and suggestions. This was a great read!
@alexclark7518
@alexclark7518 5 дней назад
Your channel was the first that I had ever subscribed to which was just over 18 months ago. Then because of the quality, such as today's video I have never missed an episode. Thanks for explaining things so clearly and without the hype.
@jackfarris3670
@jackfarris3670 5 дней назад
I'm glad they are trying ideas other than simply scaling. These unique ideas on top of scaling will be needed to reach human level reasoning and analysis.
@jake9764
@jake9764 5 дней назад
Human level reasoning will never be reached with an LLM, we need a different technology entirely or in combination with LLMs.
@jan.tichavsky
@jan.tichavsky 5 дней назад
Yeah, scale smart, not brute force. Reasoning and world-understanding should probably have different system architecture that LLM. But I'm not afraid that we won't get there. There will be a paradigm shift eventually thanks to all the geniuses working on AI research and of course also the AI helpers.
@wildfotoz
@wildfotoz 5 дней назад
@@jake9764 I agree 100%. We should be calling this Cognitive Automation instead of Artificial Intelligence.
@squamish4244
@squamish4244 5 дней назад
DeepMind is betting on other architectures to reach AGI. Not LLMs.
@maciejbala477
@maciejbala477 5 дней назад
@@wildfotoz too bad, the name already stuck. I, for one, don't mind calling the field Artificial Intelligence, because I'm too used to video game "AI" which is also just algorithms, so I guess I got desensitized to that term being used for something not actually intelligent. And I think a lot of the time, at least something like GPT can give you suspension of belief on that one, especially if one's not familiar with LLMs (or ones of this caliber at least). Same definitely couldn't be said about video game "AI"... on that note, someone should definitely use neural networks and whatnot to deploy in video games, it's a big problem that the computer never really acts that smart and the ways to increase difficulty usually involve giving handicaps to the player or cheats to the computer rather than making the computer act better
@AgentStarke
@AgentStarke 5 дней назад
I have to admit, I never thought LLMs could be pushed this far
@maciejbala477
@maciejbala477 5 дней назад
I already had that realization multiple times in my lifetime at this point, ahaha. Amazing technology, despite all its flaws
@richardyu5283
@richardyu5283 День назад
Wait till next year!
@Landgraf43
@Landgraf43 5 дней назад
I have been waiting for this notification. He only uploads if something important has happened.
@wagnsprinter
@wagnsprinter 5 дней назад
I saw another video pop up which said that they released o1, I didnt watch it but waited to watch you video on it, you are the best and also most reliable and super fast in releasing these videos
@rhaedas9085
@rhaedas9085 5 дней назад
It seems every model so far because of how they are trained simply cannot conclude that it isn't sure or doesn't know something. Not remarking that it can't pull real URLs and making up some is a great example. That inability shouldn't be punished, the realization that one can't do or doesn't know is crucial to figuring out the facts. I've even tried on a uncensored local model to break this habit with constant reminders that it's okay to disagree or say it doesn't know, but it always leans to the "give the human what they want to hear" mode.
@wileysneak
@wileysneak 5 дней назад
i mean, when have you ever seen someone online or in a book just say "i don't know", maybe the problem is the training data
@maciejbala477
@maciejbala477 5 дней назад
Yeah, I agree. I think it's part of the broader positivity bias as well, where the LLM is expected to provide a helpful, constructive response, but isn't good at figuring out at when it can't really give one, and therefore it tries to force a solution of the sort, sometimes leading to agreeing with you when it should simply tell you you're wrong, or yeah, just state it doesn't know.
@ticketforlife2103
@ticketforlife2103 5 дней назад
Ask your LLM "how would a man without arms wash his hands?" this is the answer I got from chatgpt 4o. "A man without arms could wash his hands using assistive devices or by adapting techniques. For example, he could use his feet, specialized prosthetics, or mouth to operate faucets or use automated systems like sensor-activated sinks. He might also use tools specifically designed for individuals with disabilities that enable greater independence in personal care tasks."
@AprezaRenaldy
@AprezaRenaldy 4 дня назад
Can people without fingers play rock, paper, scissors?
@TechnoMinarchist
@TechnoMinarchist 4 дня назад
@@ticketforlife2103 o1 still makes this same mistake.
@NimTheHuman
@NimTheHuman 4 дня назад
Ah, that's a good one. Thanks for sharing. I just tried 3 different LLMs. None of them pointed out that a person without arms likely doesn't have hands.
@TechnoMinarchist
@TechnoMinarchist 4 дня назад
@@ticketforlife2103 o1 and 4o can solve it if you ask it to examine the question. Otherwise they fail
@TechnoMinarchist
@TechnoMinarchist 4 дня назад
@@ticketforlife2103 The reason these LLMs fail at this question has less to do with their capabilities and more to do with limitations. Understanding such questions requires the AI to be capable of inferring deceit, which requires it to be capable of deceit. OpenAI is actively trying to avoid an AI that can be deceptive, ergo it is unlikely that they will ever let an AI be able to answer a question like this.
@Kleddamag
@Kleddamag 5 дней назад
Love your videos! You're hands down by far the best AI channel-just like the leap from GPT-3.5 to o1-ioi. Straight to the point, without the noise. Keep up the great work!
@musaran2
@musaran2 3 дня назад
Humans also do wrong explanations of good answers: how planes fly, or bicycles stand. Is thinking just selecting good rationalizations?
@ScientiaFilms
@ScientiaFilms 5 дней назад
In the movie "Her", the AI that powered Samantha was called OS1, and knowing how Altman likes that movie so much...
@Hexanitrobenzene
@Hexanitrobenzene 4 дня назад
Dunno, I think the naming is at least a little confusing after GPT4o.
@leegaul8250
@leegaul8250 5 дней назад
Your videos are the best in the AI domain. Just when I'm doing my own research, reading the papers, and formulating my impressions, you release a video that often mirrors my own impressions. Thanks as always!
@aiexplained-official
@aiexplained-official 5 дней назад
Thanks Lee
@GoldenBeholden
@GoldenBeholden 3 дня назад
Your history of criticising OpenAI's claims makes it easy for me to share your excitement.
@thomasmitchell2514
@thomasmitchell2514 5 дней назад
3 for 3 randomly catching AI Explained video drops within minutes, in the evening, first time opening youtube. Love it!
@Milark
@Milark 5 дней назад
Good feeling
@MarkosMiller15
@MarkosMiller15 5 дней назад
hahaha, same here
@Falkov
@Falkov 5 дней назад
It’s happened for me a handful of times for this channel..enthusiastic warm-fuzzies!
@Dannnneh
@Dannnneh 5 дней назад
BOSS, you are _on_ this already! I was secretly hoping that there would be an upload from you already, but it seems I was just underestimating you. Relentless!
@solaawodiya7360
@solaawodiya7360 5 дней назад
Thanks for your thoughtful analysis as always Philip 👏🏿♥️. Looks like OpenAI is back to shipping something cool. Only a matter of time before Anthropic launches something big
@MusketeerYanick
@MusketeerYanick 5 дней назад
Amazing video, amazing ending and quote “LLMs are dumpsters and we attach rockets to them”. I’m also almost sure GPT5 will indeed be an avatar based model.
@keeganpenney169
@keeganpenney169 5 дней назад
I'm quite impressed with the o1 preview, but I'm a little bewildered on using it properly.
@ZenBen_the_Elder
@ZenBen_the_Elder День назад
When AIE waxes on about how this is a step-change, I pay attention. British understatement, mate. 3:28-4:20
@michaelwoodby5261
@michaelwoodby5261 5 дней назад
Man, I have been saying for months the scratch pad is the way to get to the next level. That's why people sleeping on the efficiency gains is so annoying. You turn that into multiple passes per given response, and tah dah!
@amkire65
@amkire65 5 дней назад
Your first reaction is probably more detailed and informative than many people's reviews will be.
@maxidaho
@maxidaho 5 дней назад
AI can already do my job better and much much faster than I can. The missing component is a user interface. Once someone figures out how to input the data, I'm toast.
@MatthewKelley-mq4ce
@MatthewKelley-mq4ce 2 дня назад
"In the last 24 hours... " I hope that involved some sleep my guy. Thanks for sharing what you do.
@anonymes2884
@anonymes2884 4 дня назад
The video I was waiting for (how dare you be abroad when o1 was released - holidays aren't allowed ! :) and it didn't disappoint. Balanced and informative as always. 22:34 Yeah, this is what struck me about o1. Designing the system to "reflect" on its responses and "aspire" to better answers is a step _closer_ to it having an actual goal. AGI seems unlikely/impossible _without_ being goal driven (and self-reflective) but clearly _that_ will be when it starts to get really dangerous (in the 'Skynet' sense rather than "just" the economic meltdown and civil unrest sense :).
@harrysvensson2610
@harrysvensson2610 5 дней назад
13:37 "They don't look like they are leveling off to me" I wonder if it has anything to do with the fact that the X axis is in log while the Y axis is linear.
@Ockerlord
@Ockerlord 5 дней назад
The y axis isn't really linear though.
@danberm1755
@danberm1755 5 дней назад
Thanks so much for the really time consuming work you put into testing these LLMs. Much appreciated 👍
@mehdihassan8316
@mehdihassan8316 5 дней назад
Now we just have to wait for scale through increasing compute and energy so there's longer inference time? Then combine that with bigger training data to get PhD level GPT 5 by late 2025?
@reinerbraun9995
@reinerbraun9995 5 дней назад
That what I was going to say 😅
@24-7gpts
@24-7gpts 5 дней назад
Amazing how this is just the *preview*
@rickandelon9374
@rickandelon9374 5 дней назад
The GOAT AI youtuber that first predicted GPT 4 capabilities is here and is explaining a new type of model. Your analysis is on a different plane. Been waiting for this one since the moment o1 was released. Finally ❤❤🎉
@codycast
@codycast 5 дней назад
Slurp slurp. Dude. Settle down.
@_supervolcano
@_supervolcano 5 дней назад
Thank you for being one of the reasonable and level headed people in this space.
@williamjmccartan8879
@williamjmccartan8879 5 дней назад
Thank you Phillip for this update on o1 mini, and as quickly as you have, I can't imagine the amount of work you have to do to stay on top of these constant evolutions in ai, it really is greatly appreciated, looking forward to your forthcoming follow-ups on this product and I imagine the other announcements that might pull you off track for a moment, thank you and whoever the elves are who might assist you in this work, take care of yourself and be safe, peace
@nicholasgerwitzdp
@nicholasgerwitzdp 5 дней назад
Incredible job as always explaining the technical details in a way anyone can understand! Super underrated skill!
@aiexplained-official
@aiexplained-official 5 дней назад
Thanks nicholas!
@pneumonoultramicroscopicsi4065
@pneumonoultramicroscopicsi4065 5 дней назад
Model score company Human (avg) 92% heaven
@Roma88572
@Roma88572 5 дней назад
o2 will be very close to AGI at this point. The results that are coming in are insane.
@gbrailsford
@gbrailsford 5 дней назад
Great video, I love your informed take on these releases. Interestingly, I just tried the ice cubes in a frying pan question on this model and it got the right answer (unlike your test in which it got it wrong). I ran it 3 times in separate chats and got the same correct answer each time. Here are the last steps in its reasoning: ... "Ice cubes in a hot frying pan (especially one frying a crispy egg) will melt quickly-typically within a minute. By the end of minute 3, any ice cubes placed before or during that minute would have melted. Calculate Remaining Ice Cubes at the End of Minute 3: Considering rapid melting, it's realistic to assume that no whole ice cubes remain in the pan by the end of minute 3. Choose the Most Realistic Option: Given the melting rate of ice in a hot pan, the most realistic number of whole ice cubes remaining is 0.
@darylallen2485
@darylallen2485 5 дней назад
Can't tell you how excited I was when this video popped into my feed! Time to click play and say thanks for your efforts!
@DreckbobBratpfanne
@DreckbobBratpfanne 5 дней назад
The fact we have two scaling laws on top of each other now is so mind boggling. Plateau? Yeeeaaah, I don't think so anytime soon. There is so much crazy stuff this now has unlocked. Imagine a GPT-5 class Omnimodel where any output modality has an o1 like reasoning chain. That's a totally different world of capabilities. 2025 is gonna be fun, allthough I doubt we get such a full force system before the end of 25, if at all, who knows if gov officials deem it to dangerous (or costly?)
@maciejbala477
@maciejbala477 5 дней назад
I hope so, but in truth we never know if there's a plateau. Many people say there is, many others say there isn't. I'm just observing from afar and following, because I don't really know the answer, and I'll wait to see where it goes
@6681096
@6681096 5 дней назад
I listen to RU-vid at 2x speed except for AI explained which goes back to normal speed and pausing.
@aiexplained-official
@aiexplained-official 5 дней назад
High praise!
@TheEhellru
@TheEhellru 5 дней назад
In the US, primary school is called elementary school. Keep up the good work, we love you!
@kellymoses8566
@kellymoses8566 5 дней назад
If you ask about the chain of reasoning steps OpenAI will email you to stop and threaten to end your access to o1
@jeff__w
@jeff__w 5 дней назад
17:34 “As models become larger and more capable, they produce less faithful reasoning on most tasks we study.” I don’t even get why people, especially those in the field, would think that what these models are outputting as “chains of thought” (a complete misnomer to me) _are_ the steps the model is taking. People can give reports of what they “think” when responding to this or that question and it’s not clear even in _those_ cases what connection, if any, those thoughts have to the response. These LLMs are a step removed from that (if not more)-they don’t have inner verbal behavior, which, again, may or may not be really relevant as “steps.” To use a human analogy, it’s all going on at a “neuronal” level-these LLMs _can’t_ report what is going on there (it’s like expecting people to give accurate verbal reports on some unconscious physiological process in the brain). It’s almost like some pre-scientific theory of reasoning.
@YT-gv3cz
@YT-gv3cz 5 дней назад
I think that's the problem with "interpretable AI" in general. The word "interpretable" seems to be used in two different senses. One is mechanical in the sense of reducible to some mathematical model or axiomatic foundation - like a lower dimensional description of the neural net, precisely characterisible dynamics in the parameter space, results exactly verifiable by logical systems like Lean, or even interpretablility by another AI system (like the study by OpenAI using GPT4 to "interpret" GPT2 neuron behaviors, which I find quite ironic in that it seems to hint at some kind of infinite regress. Though by analogy you can say it's as mathematical logic accepts analyzing arithmetics in axiomatic set theory as a standard practice, so it technically still counts as a type of mechanical reducibility). The other of course is human understandability, which by itself is hopelessly nebulous. Language is just a passable approximation to the amorphous goop sandwiched by neuronal activity from below and all possible mathematical descriptions from above. If we run an LLM which feeds formal outputs to Lean to verify, then we have this precise structure where the system is tethered to the mechanical computation of neural nets on one end and a precise axiomatic system on the other. The linguistic manifestation of "deduction" in the middle does not have any precise status, but just imitates the arbitrary constraints of human thought we bootstrapped it from. Reducing to either end feels insufficient, but what else can we do? Often one has to give up either precision or "understandability". And even if the individual logical steps in a precise reduction feels understandable, the whole thing could still very well be not due to combinatorial explosion of logical complexity. Just like how natural languague has theoretically unbounded recursion depth but use more than 3 layers and you start to lose track. In the end these CoT steps are like comments written by human programmers for other humans to read: They usually provide some approximation, in the goop layer, to the underlying precise logical process, but ultimately are not causally linked to them. They might help debugging. That's all there is to it.
@jeff__w
@jeff__w 5 дней назад
@@YT-gv3cz Thanks! I appreciate that highly technical explanation, even if it is well beyond my capacity to understand. (My background is in behavioral science, not computer science.) “In the end these CoT steps are like comments written by human programmers for other humans to read…” I’m not even sure _that’s_ true-again, the steps might just be _emulating_ those. The closest things I can think of in human terms-and it’s not all that close-are (1) how, in split brain experiments, the verbal hemisphere of the brain produces plausible, yet _wholly fictitious,_ explanations of what the non-verbal hemisphere is doing (it _can’t_ know because the connections between the two hemispheres are severed) _except_ here the LLM is not even observing its own output (it’s not clear _what_ it’s observing, if it _is_ observing anything or if it _can_ observe anything, i.e., “reflect”) and, (2) more generally, as I said in the first comment, how people _can’t_ give accounts how the neurons are collectively firing to give rise to whatever behavior they give rise to-human brains are simply not wired like that. I guess these models _could_ be reporting the interim results of what the various internal calculations are as “steps”-does the architecture even _allow_ that?-but it seems a lot more likely (to me, anyway) that they’re just producing what would be the most plausible steps (whether or not those have any connection to the actual processes), just as they produce _any_ verbal output. Obviously, _CoT_ _works_ (that’s why people are interested in it) but that the model is “reasoning” better is at best a description (and it might be a highly misleading one at that) and not an explanation as to what is going on. I’m not sure what we _can_ do but what I think would be good to _stop_ doing is viewing these steps as necessarily having any connection with how these models come up with the outputs they produce. The best we can say, it seems to me, is when the models are prompted for “steps,” _that’s_ the output they produce. It’s simply more LLM output to be explained.
@TechnoMinarchist
@TechnoMinarchist 5 дней назад
o1 is gpt4 turbo with reasoning capabilities. It's why it costs similarly to run as turbo does in the api, why turbo jailbreaks that don't work on gpt4o still work on o1, and why it costs 3x more to run than gpt4o.
@FortWhenTeaThyme
@FortWhenTeaThyme 4 дня назад
Wouldn't be surprised. GPT 4 Turbo is a significantly smarter model than 4o. I don't care how many benchmarks 4o wins at, Turbo's reasoning skills are just better.
@samuelluz9241
@samuelluz9241 5 дней назад
I will always upvote before watching until proven otherwise
@pauljones9150
@pauljones9150 5 дней назад
3:40 Holy shit that's a big jump
@AprezaRenaldy
@AprezaRenaldy 4 дня назад
Estimate what score AI 0-1 will get
@slm6873
@slm6873 4 дня назад
Very nice review I love how much more nuanced and critical you've gotten over time.
@aiexplained-official
@aiexplained-official 4 дня назад
Thanks slm
@Philbertsroom
@Philbertsroom 4 дня назад
Comparing David Shapiro's video about the same subject shows how much more value you bring. Thank you.
@willfrank961
@willfrank961 5 дней назад
Been refreshing the page for this one 😅
@mukunda33
@mukunda33 5 дней назад
yay, Phillip dropped another vid.
@user-sl6gn1ss8p
@user-sl6gn1ss8p 5 дней назад
"has reigned supreme for quite a while" aka, a few months
@danypell2517
@danypell2517 2 дня назад
Man, there's not that many things that are greater than technological progress. An upgrade for an AI model that significantly pushes the frontier is almost orgasmic :) Cant wait for the next significant upgrade
@gemstone7818
@gemstone7818 5 дней назад
damn thats a highly impressive update, can't wait to see what the competition brings in response
@dmytroshchotkin2939
@dmytroshchotkin2939 12 часов назад
The best phrase to express my thoughts about the future are coming from a Russian meme: "Страшно, очень страшно, если б мы знали что это такое, мы не знаем что это такое".
@bluetensee
@bluetensee 4 дня назад
brilliant work. as always. you never hype unless it really is appropriate. my most trusted channel on AI out there. keep up the good work💪 cheers from Germany. Mat
@Bryghtpath
@Bryghtpath 4 дня назад
AI shocked the world when Deep Blue beat Garry Kasparov at chess. Now, with GPT-o1 handling complex logic puzzles.
@AIForHumansShow
@AIForHumansShow 5 дней назад
another banger. just the best place to get DEEP info on new AI models
@IngieKerr
@IngieKerr 4 дня назад
The only AI news channel I literally recommend to current members of my country's government. OK, I'm from a tiny country, but all the same. :)
@aiexplained-official
@aiexplained-official 4 дня назад
Oh wow! What country?
@IngieKerr
@IngieKerr 4 дня назад
​@@aiexplained-official I'd call it Mannin, but I'm sure you know it by its English name of Isle of Man [tho Isle of _Mann_ is, indigenous-culturally at least, our preferred spelling] As we're only a population of
@aiexplained-official
@aiexplained-official 4 дня назад
I am likely going to Isle of Mann next year! My dad loves it. So thank you for spreading the word
@JamesOKeefe-US
@JamesOKeefe-US 3 дня назад
This is the best AI channel. No joke.
@aiexplained-official
@aiexplained-official 3 дня назад
:))
@sp00l
@sp00l 4 дня назад
Was waiting to here from you about this before looking at other "omgosh" videos. Cheers mate!
@Degenerate_o7
@Degenerate_o7 4 дня назад
I am no where close to fully keeping up with the AI game, but your vids are so perfect for helping someone like me stay abreast on the industry. Thank you!
@aiexplained-official
@aiexplained-official 4 дня назад
Thanks chaos
@integralyogin
@integralyogin 5 дней назад
if this video kept going, id keep on watching. very well done.
@Xilefx7
@Xilefx7 5 дней назад
I want to see what others big llm companies anounce in the near future. Happy weekend
@-mwolf
@-mwolf 4 дня назад
From memorizing answers to memorizing answer programs - step change from implicit to explicit. It is stillimited to what's in its dataset though and can't handle novelty (the arc agi blogpost explains his well).
@theterminaldave
@theterminaldave 5 дней назад
Phillip, I think you should consider converting some of your videos/work into podcasts, I don't actually listen to podcasts yet, many other people do, but I realized that as I was listening to your video while doing dishes, that your commentary is so good that it doesn't always require the screenshots. I'm so happy for you man, you're kickin' ass.
@aiexplained-official
@aiexplained-official 5 дней назад
Thanks man, could anyone recommend a good way of doing this?
@ginogarcia8730
@ginogarcia8730 5 дней назад
here in the Philippines, I literally said can't wait to go to sleep and then wake up to AI Explained on o1 preview. You're so quick dude! You're the bomb.
@MachineCode0
@MachineCode0 4 дня назад
"If your domain doesn't have starkly correct 0/1, yes/no, right answers/wrong answers, then improvements will take far longer." I feel like this is an interesting area for discussion. So the obvious question is; if there aren't 0/1, yes/no, right/wrong answers in these domains, then how are we adjugating the results? If there are no objective results that we can reference, then how do we know that our assessments of the “correctness” of the outputs are, themselves, objectively correct?
@aiexplained-official
@aiexplained-official 4 дня назад
Human labels, but of course they are not infallible
@MachineCode0
@MachineCode0 4 дня назад
@@aiexplained-official Yeah. it seems like an interesting sort of, 'Gordian knot' to mull over, no?
@cyanophage4351
@cyanophage4351 5 дней назад
Thanks for doing this video and the research so thoroughly. Looking forward to more
@Ecthelion3918
@Ecthelion3918 5 дней назад
Very good video my friend, was eagerly awaiting your first impression on this. Looking forward to more in depth testing Also, take care of yourself always :)
@koningsbruggen
@koningsbruggen 3 дня назад
Great review, thank you
@epokaixyz
@epokaixyz 5 дней назад
Consider this your cheat sheet for applying the video's advice: 1. Understand that o1 excels in STEM fields and can be used to solve complex problems in areas like physics, math, and coding. 2. Test o1's capabilities by throwing challenging problems at it, but remember to double-check its reasoning as it is still under development. 3. Stay informed about future updates and advancements in o1's capabilities. 4. Exercise caution when deploying o1 in sensitive domains, as it's important to use this technology responsibly and ethically.
@sanesanyo
@sanesanyo 5 дней назад
Was waiting for this review. Thanks for doing this.
@nicholaslemosdecarvalho5328
@nicholaslemosdecarvalho5328 5 дней назад
Finally, I was waiting for this one. I'll be able to carry on with the rest of my life afterwards
@FahimSattar
@FahimSattar 5 дней назад
Really was waiting for your take on the new model!
@markmorgan5568
@markmorgan5568 5 дней назад
I’ve done one prompt so far with o1-mini. Basically, “this code runs/works, but isn’t doing quite what I want. What I want is [several things it’s not doing now].” Claude had failed this enough that I had decided I’d fix it myself. Then o1 dropped, and fixed it in one shot. It’s an anecdote, but a promising one. And honestly, I don’t care if it can’t do things that are easy for me if it can do things that are hard, annoying, or tedious. A hammer can’t wash the dishes - doesn’t mean it’s useless.
@AllisterVinris
@AllisterVinris 2 дня назад
Was gone for the weekend, I come back to this. Crazy how fast things go in AI huh? No but seriously that's kinda crazy because now they can use this new method on a 100x system and if that doesn't yield crazy results, I don't know what will.
@billykotsos4642
@billykotsos4642 2 дня назад
TLDR: Do not get ahead of youreselves
@JustFacts81
@JustFacts81 3 дня назад
I have a simple bench on AI RU-vidrs 😅 AI Explained is not allowed to participate, simply too strong 💪 Thx for the excellent and best review!
@aiexplained-official
@aiexplained-official 3 дня назад
Haha nice one, and thank you!
@Schyhol
@Schyhol 4 дня назад
We waiting for your review so much
@leatherindian
@leatherindian 3 дня назад
Keep up the analysis. It cuts through the hype. Thank you
@aiexplained-official
@aiexplained-official 3 дня назад
Thanks leather
@rapauli
@rapauli 3 дня назад
Be sure to set your RU-vid resolution Quality to full 1440p
@sergeyromanov2751
@sergeyromanov2751 5 дней назад
There is one important thing to understand about o1: judging by everything, it is a rather small model. I think less than a hundred billion parameters. It is clearly smaller than Omni. And this means that the basic model itself is rather weak, it obviously does not have a very developed model of the world, etc., simply due to its small size. The Srtawberry algorithm is good, but if it works on small "brains", then the performance of the entire system will not be very high.
@TechnoMinarchist
@TechnoMinarchist 5 дней назад
It's just gpt4 turbo with reasoning capabilities. It's why it costs similarly to run as turbo does in the api, why turbo jailbreaks that don't work on gpt4o still work on o1, and why it costs 3x more to run than gpt4o.
@codycast
@codycast 5 дней назад
How do you know any of this? Which model. How many parameters. Etc.
@bossgd100
@bossgd100 5 дней назад
How did u made your estimation ?
@Yobs2K
@Yobs2K 5 дней назад
Why is it clearly smaller than Omni?
@BlackTakGolD
@BlackTakGolD 5 дней назад
@@TechnoMinarchist That makes less sense than gpt4o being the base still, since they did add in training for the model to be capable innately of CoT, it's possible for it to deviate from gpt4o at the seams while being based on it. There is slim chance that they would go back to the older massive gpt4 model to use this inference costly method; that is why it costs more, it's literally generating so much more data, not necessarily that it's based on gpt4. Add in the naming and there's a 'mini' model, it's just less reasonable to think it's gpt4 based.
@felipefairbanks
@felipefairbanks 5 дней назад
as always, a pleasure to see you go through any subject! I confess I wasn't hyped for this one. but after this video, I think I'll subscribe for chatgpt plus for a month and try it again. last time I did so, was about a year ago and it was pretty underwhelming. hopefully now I can get some cool things done with this new model (mainly interested in programming, being not a programmer)
@haileycollet4147
@haileycollet4147 5 дней назад
FYI o1-mini is better at many coding tasks than the current o1-preview. Depending on the task, you may prefer results from Sonnet 3.5
@reza2kn
@reza2kn 5 дней назад
🥰🥰🥰You always keep delivering!❤❤ Thanks!! Also, @26:27 look at the first comment on this post 😂😂😂🤣🤣🤣
@profsjp
@profsjp 4 дня назад
Excellent, balanced and informative discussion. Thank you. 👏🏻👏🏻👏🏻
@aiexplained-official
@aiexplained-official 4 дня назад
Thanks prof
@stephenrodwell
@stephenrodwell 5 дней назад
Thanks! I’ve been looking forward to this! 🙏🏼
@rantmarket
@rantmarket 4 дня назад
Excellent as always. Thank you, sir.
@themikematrix
@themikematrix 5 дней назад
Can you imagine what the world will be like 10 years from now? Feels like everything is going to change drastically , in an existential way. I believe we are in a truly existential paradigm shift. I know most of you reading this agree. It's starting to become real though which is crazy. Feels like a movie
@fcaspergerrainman
@fcaspergerrainman 5 дней назад
Yup! I’ve been saying that since 2017….i met head of google ai in 2017, he said it will be 50 yrs before we have an ai that can reason, logic, chain of thoughts, see, hear, and talk (that’s what AGI was define back then), and now we have it! What a time to be in
@vladimirLen
@vladimirLen 5 дней назад
The most honest and comprehensible review, despite my reservations of your safteyism.
@MemesnShet
@MemesnShet 5 дней назад
Yes finally THE video i was waiting for
@bhargavk1515
@bhargavk1515 4 дня назад
My only final analysis is how it interacts with me...that's the true marker of AGI
@AustinThomasPhD
@AustinThomasPhD 5 дней назад
I would love to hear how models based on proper first-principles reasoning are being developed. Clearly, this stochastic parrot has room to fly higher, but at the end of the day, it is still a stochastic parrot. If we want models capable of developing new science, then we need actual first-principles reasoning. I suspect that won't be transformer-based. I know people are working on this.
@SimonCarpio09
@SimonCarpio09 5 дней назад
It's been trained to do it's reasoning the exact same way I was trained to do math and science in Asia. 1. Copy the teacher's methodology exactly. 2. Show your solutions or you don't get credit. 3. Right minus wrong in your final score.
Далее
AI can't cross this line and we don't know why.
24:07
Просмотров 587 тыс.
For my passenger princess ❤️ #tiktok #elsarca
00:24
Добрая весть 😂
00:21
Просмотров 542 тыс.
VFX Artist Reveals the True Scale of Minecraft
14:28
Просмотров 1,4 млн
The Genius Behind the Quantum Navigation Breakthrough
20:47
This new type of illusion is really hard to make
17:58
Просмотров 457 тыс.
Why Democracy Is Mathematically Impossible
23:34
Просмотров 4,1 млн