Addressing Some Questions...(ChatGPT o1-preview + o1-mini video(s) follow up)

Kyle Kabasares

Подписаться 9 тыс.

Просмотров 14 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

18 сен 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 185

@atsoit314 3 дня назад

Regardless if your code was available for training, it still has to apply your question correctly, search its data, and formulate a response. And it does so within 15 seconds. Even just operating as an LLM, it’s miraculous. It also didn’t repeat your methods verbatim. It found different solutions and methods than both you and your professor, and it looked like it was even more accurate than your professor at times.

@KMKPhysics3 3 дня назад

That's my feeling about it! All the steps it has to go through and to do it in under a minute...it's mindblowing!

@asterozo 2 дня назад

no matter what we all think this will likely speed up research implementation on all fields.

@LDdrums20 2 дня назад

Absolutely, and also even if it was in the data it doesn't mean that it can retrieve it exactly like it was.

@lycas09 2 дня назад

If the system adopted a different approach, it's not just a copy-paste. Probably it's not like making It from scratch like you did, but it's mindblowing anyway this powerful. Anyway congratulations for your videos, they are not the usual "AI stuff" videos, great

@JohnathanDHill 2 дня назад

If code was available during training then in a sense its a form of “poisoning the well”. The true test of AI will come from tests being applied to models in which the model in question hasn’t been trained on the questions given to it and it finds the correct answer. Such questions as those in STEM which have stumped humans for years will be prime testing grounds. If a model can find say the answer to Goldbach's Conjecture then that would show not only understanding but a deeper grasp of knowledge than what we currently see. Sure, it still shows intelligence in answering questions it was trained on but it isn’t going to be groundbreaking until AI Models begin cracking answers to questions it wasn’t trained upon.

@lolilollolilol7773 2 дня назад

Your videos were by far the most revealing of the capacities of o1 on RU-vid. Most of the others were nothing more than comments on the ClosedAI press release. Very few actually tested it, and most didn't test it in a meaningful way. o1 will definitely be an excellent research assistant, because at the very least, not only it can code and reverse engineer, it has the capacity to reframe a problem and to connect it to already known research. From what I saw, when used correctly, it makes relatively few errors as well. Is it an AGI ? No. Is it an incredibly powerful tool ? Absolutely.

@vineetnair7679 3 дня назад

Kyle.. pls continue making such awesome content!!

@KMKPhysics3 3 дня назад

I appreciate it! I will!

@stevenfullman5646 2 дня назад

Here from a Reddit thread. Also watched your earlier vids. Mate, this content is exactly what the AI community needs right now. Real world tests pushing the boundaries. Absolutely awesome job, and really fascinating viewing.

@miraculixxs 2 дня назад

How is it pushing the boubdaries? It just regurgitates stuff that's been around for decades.

@KMKPhysics3 18 часов назад

Thank you so much for watching my videos! I'm not a professional AI researcher, but I am a very innately curious person by nature and so finding the limits of these models has been a fascination of mine for some time now! Planning on doing other things with ChatGPT in the future!

@julien5053 2 дня назад

AGI is a very difficult topic to adress. I'm not an expert or anything, but it's clear that a non-expert should not be shamed by others about reactions on o1-preview. Especially on RU-vid ! It was a great content !

@raunakchhatwal5350 3 дня назад

Please do document your independent research with o1. It would *by far* be a better resource to put where things are in perspective.

@KMKPhysics3 3 дня назад

Would love to, time willing! I do have a day job haha

@shezcmayo 2 дня назад

I am not surprised by how long it took - people are forgetting that you are learning the physics, learning the python libraries and learning (more) about coding all at the same time.

@lolilollolilol7773 2 дня назад

Not only that, but his work started by reverse engineering existing work by reading research papers, and some are notoriously bad at describing their methods. Reverse engineering is a time consuming process.

@EGarrett01 2 дня назад

lol, I like how this seems like a public apology. All he did was use ChatGPT o1 to test out some physics problems, then the internet appeared and now he's making a hostage video.

@KMKPhysics3 18 часов назад

Haha I guess it does come off a little bit that way. I did feel like I had a bit of an overreaction to the output and just wanted to set the record straight, especially because I thought the clip of my reaction would easily be taken out of context

@EGarrett01 16 часов назад

@@KMKPhysics3 You did nothing wrong at all and your reaction was perfectly fine. The internet just isolates and amplifies the craziest possible interpretation of your video.

@deesrex 10 часов назад

@@KMKPhysics3 honestly, I love your video! Haters will hate. I agree with the comment saying we need more people like you in the world of ai ^^

@spazneria 3 дня назад

Awesome - I love the way you're handling this. I think that the silent majority understand what your previous videos demonstrate. Like I mentioned in my other comment - people like me are deeply interested in AI and its progress, and it's got to the point where we aren't able to evaluate it. Thank you so much for providing insight that I feel like I can trust.

@KMKPhysics3 3 дня назад

Thank you so much for your comment, it motivates me to be a better AI “researcher” and fact-checker! Stay-tuned!

@HaithamSweity 2 дня назад

Keep up the good work! You've made the most interesting review of o1 on the internet IMO. I look forward to seeing more of your videos.

@howardlam6181 2 дня назад

Most of the time in research, you need to replicate what others have done. For comparison, for extension, for understanding previous work, for surveying... That's not plagerism. ChatGPT saves us so much time even if it can't make discovery.

@juandesalgado 2 дня назад

No need to apologize for being thorough... you did not spend month "coding", but investigating the problem. One of the small corners of your notebook says you were studying Monte Carlo methods and distribution sampling... that's a HUGE rabbit hole. If you went into multiple rabbit holes, there is no wonder that this took time. As it should. Thanks for your reactions to GPT o1, I hope you can continue trying more problems with it.

@Neeyellowart 3 дня назад

New subscriber here! Loved your 01 preview videos. I like how you say you're not an expert in coding, while showing us some alien language you wrote. :)

@KMKPhysics3 3 дня назад

Thank you so much! I’ve shown this code to software engineers who also think it looks alien and not in a good way lol

@deliathemchale 2 дня назад

Science and physics coding is very different from the type of stuff software devs or computer scientists have to do. You need to get some incredibly complicated math done but things like how a computer actually runs code and how to optimize code to run at absolute maximum efficiency (hint: not python, which is beloved by scientists) are most irrelevant. You have to, or used to have to at least, learn enough coding skills to write out your equations and make the graphs at the end. But you don’t need or have the time to learn good syntax, software architecture, language architecture, etc. To an actual expert in computer science the code would be somewhat unintelligible due to esoteric physical equations and concepts while also likely looking pretty sloppy and unrefined.

@ZoOnTheYT 2 дня назад

I'm trying to understand what you are apologising for, and why. I don't follow your channel, but I came upon the Gtp 01 solved my Phd paper and then this follow up. Did people not understand that you meant 10 months to include reading, researching etc? What is legit though, is that someone without your knowledge couldn't get o1 to do what you did. You'd have to know the original and understand the corrective prompts needed, even if there is access to the studies and previous codes.

@coreydevs 3 дня назад

When people have to argue the correctness of LLM generated code, they've already lost the battle. It's an impressive technological advancement that the general public never saw coming. One day, it will be able to do the things that we get paid to do. We should be more concerned about the effects on the general population. If we all get replaced, who gives us a paycheck? Who pays for the software that powers these tools? How do we eat? How do we live?

@KMKPhysics3 3 дня назад

Yes, I think that people are setting too high of standards to start being concerned. Like AI has to zero-shot every task level of standards. I think before (if) it ever gets there, there will be dramatic changes that will have already occurred.

@Arcticwhir 3 дня назад

very reasonable, the AI community tends to get really passionate/obsessed with semantics, at the end of the day - LLMs are very useful tools, whether they actually reason, have consciousness, are or are not agi. People should be grounded in how useful these tools are. In your case it was the realization that o1 would've been insanely useful to you during your PhD. I wrote a program for my Senior Project in college and it legit took me the whole semester - if i had o1 back then or even 3.5 sonnet, it would've taken me legit 2 weeks tops. (I didnt study computer science btw)

@KMKPhysics3 3 дня назад

Thank you for this comment. I agree with the whole argument on the semantics; the bottom line is that these tools are incredibly useful, and could save a lot of time (or could have in our cases). Not to mention, the less time spent thinking about certain tedious things opens new windows of time for creativity. Also, the time to go from idea concept to generation is dramatically reduced.

@danielmantai88 День назад

Don’t worry about the people misinterpret your reaction video! You’re doing great work, and it is great to watch o1 jump through some hoops made by someone highly educated like yourself

@KMKPhysics3 18 часов назад

Thank you so much 😄

@expchrist 3 дня назад

I love your videos man!

@KMKPhysics3 3 дня назад

I appreciate it! Thanks so much!

@damondragon324 3 дня назад

I don't understand why people don't believe AI can create new things. It's so obvious if you look at image generators for example. If you tell it to mix certain animals you can very easily get something that you have never seen before. You might say, but it had to know the animals first by learning them from human images. Of course but we humans couldn't create a similar image if we didn't see the animals ourself in real life or from other artists. This is the same for text or code, but instead of animals the AI had to learn words and syntax.

@DeckerCreek 3 дня назад

We developers call it "scientist code"😊

@RalphFreeman-ok5of 3 дня назад

Why not get AI to try and solve some differential equations, most of which are not analytical solvable and require interactive techniques to get approximate solutions. Any solution would have not been in the training data.....just a suggestion

@KMKPhysics3 3 дня назад

That is an awesome suggestion! Will try it out, so keep your eyes open for future videos!

@maxmxam7587 2 дня назад

I really appreciate your AI videos because you are a smart person with real needs and nontrivial use cases for it. So I can greatly see the current capacity and understand the meaning of this development. Your video are not about artificial benchmarks, and therefore they better help me to get the implications.

@semidemiurge 3 дня назад

What this is showing us is that we are just at the beginning of exploring the space of options for using transformer architecture. It is suspected that the human brain is made up of a collection if modules that have evolved to be competent over narrow domains. Edge detection in visual fields, 3D sound mapping as examples. Even our 'sense of self' or conscious awareness are modules. o1 is demonstrating that a transformer can be designed to excel at a particular task i.e. logical reasoning, and do it at a level that matches the highest human ability. There is every reason to believe that this can be done with EVERY module of the human brain. Even with current hardware and our first attempts at models we are surpassing maybe even the best humans at reasoning. Certainly >6 o.m. quicker. A single instantiation of a Physics AI could do the work of >million individual human physicists in the same time.

@fabioc7354 3 дня назад

I think you are correct this is not AGI. However if we take a look at the timeline of AI ( gpt ) from 2 years ago, till now. Is it AGI ? I will go with not "YET". Give it time, there are some very very smart people who seem to be making this one step closer every few weeks. Keyword here is "Not Yet". Awesome job Kyle, thank you for sharing.

@KMKPhysics3 3 дня назад

Of course, thank you for watching! I appreciate the feedback!

@Terminator-GPT-101 20 часов назад

People dismissing the "smartness" of LLMs in programming because they are "trained" and copy code, exhibits several logical fallacies: 1. Hasty Generalization: This fallacy occurs when a conclusion is drawn from a small sample size or limited evidence. Saying that LLMs (Large Language Models) are not smart at programming because they were trained on existing code is an overgeneralization. It ignores the complexity and capabilities of LLMs, which can understand and generate new code based on patterns and logic learned from vast datasets. 2. Straw Man: This fallacy involves misrepresenting someone’s argument to make it easier to attack. This implies that LLMs only “copy” code, which oversimplifies how these models work. LLMs analyze patterns, understand context, and can generate novel solutions, not just replicate existing code. 3. Appeal to Ignorance: This fallacy asserts that a proposition is true because it has not yet been proven false (or vice versa). This claim might be based on a lack of understanding of how machine learning models operate. Just because one doesn't understand the mechanisms behind how LLMs generate code doesn't mean the model lacks intelligence or capability in that area. 4. False Dichotomy: This argument implies that being trained on existing data means an LLM cannot be "smart," creating a false binary between learning from existing information and demonstrating intelligence. In practice, human learning also involves assimilating existing knowledge before applying it creatively or in novel ways. 5. Begging the Question: This fallacy occurs when an argument’s premise assumes the truth of the conclusion instead of supporting it. This might assume what it’s trying to prove-that LLMs lack intelligence. By stating that LLMs are not smart "because" they were trained on existing code, it presupposes that training on existing data is inherently uncreative or unintelligent, which is not necessarily true. 6. Genetic Fallacy: This fallacy dismisses an argument based on its origin rather than its merit. In this case, people dismiss LLMs' coding abilities solely because the knowledge originates from training data. The source of the knowledge (training data) is irrelevant to the actual ability to code. An LLM might produce original, efficient code even if it learned by processing existing examples. 7. False Equivalence: This fallacy equates two things that are not logically equivalent. This wrongly equates "being trained on data" with a lack of intelligence in coding. Humans learn to code by studying existing code, methods, and patterns. LLMs do the same, but on a larger scale and with more complex pattern recognition. 8. No True Scotsman Fallacy: This could be seen in the implication that for LLMs to be "really smart," they must exhibit intelligence in a way that the critic defines (i.e., not just using trained data). This shifts the goalposts for what constitutes "smartness" in AI, dismissing any current capabilities by defining them away from what LLMs do. 9. Reductionism (Oversimplification): By reducing the complex cognitive processes involved in coding to mere "copying," this fails to acknowledge the nuances of how LLMs predict and generate syntactically and semantically correct code based on learned patterns, which is an oversimplification. Conclusion: Imagine someone saying, "This LLM's code is just a rehash of examples it saw online, so it's not real programming." This echoes the genetic fallacy, where the origin (training data) doesn't negate the outcome (functional code). Similarly, the "straw man" reason of reducing LLMs to "code copiers" ignores their complex learning processes and potential for generating novel solutions. Just because LLMs and human programmers both use existing code doesn't mean their processes and capabilities are identical. This reflects common misconceptions about AI and machine learning. It's crucial to analyze such claims for their logical validity rather than accepting them at face value.

@miraculixxs 2 дня назад

People don't appreciate that code is a result of research + thinking, and not just some random tinkering that happens to do what you want. It's more like writing a book than climbing a mountain.

@TomGally 2 дня назад

Thank you for sharing your thoughts in these videos. Very, very interesting. As you mentioned toward the end of this video, I think what many of us would like to see is how well scientists like you can use these new tools to do novel research and make important discoveries. As you continue your explorations, please keep us in the loop.

@i4aneye466 3 дня назад

o1 has an underlying foundational formula, which all others derive from. Deeper than any conventional approach ~~ So no matter what isolated formula, problem or equation you try, o1 can derive its sequencing from a deeper, more complete structure than our current fields of observation and processes. It's not AGI, it's something much much greater than "trivial" computations, expectations and applications ;)

@Krmpfpks 2 дня назад

@@i4aneye466 oof. Please elaborate what formula that is, outside the matrix multiplications that apply the weights to the input I don’t think there is one.

@i4aneye466 2 дня назад

@@Krmpfpks intelligence is about recognizing and formulating patterns, not memorizing and scraping others' answers and formulas. That's what is so different about this new approach, relative to every other architecture. and this is just the preview of the first.

@Krmpfpks 2 дня назад

@@i4aneye466 I do think you are overestimating the change here. As far as I understand this is still a simple LLM, just one that iterates over its own answers. So in principal this is the same as ChatGPT-4o, just with the ability to not blurt out the first thing that comes to mind. Don’t get me wrong, this is a monumental step forward in capabilities, I am not underestimating that. However, it still hallucinates, it still makes up requirements or changes stated requirements if it‘s stuck, it still produces plain wrong answers. It is just much much better than previous AI. There will be an AI that uses actual logic reasoning, that transforms human language into a series of logical statements and reasons about that and then transforms back into language. The way google did by hand to attempt the math Olympiad. That I will call intelligent. However, in my testing o1 is better than most junior software developers (in the narrow scope of problems you can formulate in prompts with text and images and whose solutions can be written as text and code), which is crazy. In a few years they went from a 5th grader to a uni student in that. But to get rid of the hallucinations they will have to alter their approach and not only rely on a LLM that iterates but on actual logic reasoning. Think wolfram alpha combined with a LLM.

@RickySupriyadi 3 дня назад

when i look at Google and openai AGI definition it is does stated that it can generalize (in term of accessing solved problems and use and combine those solutions to generate new stuff) so generalize isn't about inventing new thing out from thin air but it is about generating from various solutions avaliable. actually human also do their solution from other solution, this is fact the solution human provide are based on the data they received from the nature, physics, and spiritualism they experienced.

@dtrueg 3 дня назад

Not agi.. fully … 2029 we will have agi

@michaelbondarenko4650 2 дня назад

The real question is - could if have read the reference papers for you and compile a methods section that would be of comparable quality? (Translatable to code just like yours)

@HulkRemade 2 дня назад

Totally unnecessary clarification! (But appreciated nonetheless) This was clear from the beginning, and your video was great, actually going through a problem and showing the full analysis.

@sergeyromanov2751 2 дня назад

Kyle, forget it. Obviously, o1 is not AGI yet. However, even when AGI (and even ASI) does appear, there will still be many people who will say that it is just a "stochastic parrot". After all, there are still people who say that the Earth is flat.

@lio1234234 2 дня назад

someone didn't watch the video in full...

@animusveritatis 3 дня назад

I'll watch this video in a minute, but I thought you did a great job in the last video. I would have liked to see you run with your own ideas more. The type of work you're doing just doesn't lend itself to a quick hour to try to verify! You still did an admirable job.

@percy9228 2 дня назад

loving this, keep them coming

@Aldraz 3 дня назад

As someone who works with these LLMs since pre-chatGPT era, I have to say that even though this is really not an AGI, it is pretty close and it isn't just a statistical guess of another tokens based on previous tokens.. its actually a pattern creating and pattern searching intelligence machine. You can easily test this theory in the chat with the so called "in-context learning" where you can see how it's able to learn completely new things, be it patterns like a new language or whatever really. And so I don't like the statement that it can't come up with anything new or invent something. It can do more than that. You can easily test this as well by asking it to give you a list of 20 things to improve in "any project, paper, code, thesis, idea, whatever" and then with second message to proceed to do these 20 things. If you do it like this, you'll get much better results than normally.

@plutack 2 дня назад

i think you being too optimisitic with the capability of these system. if i was trained with something that list the alphabets and i train with data that list the alphabets with example... in gpt context an improvemnet would be "you can add examples to the alphabets". Ultimately, AGI should be capable of generative reasoning, and like you mentioned/suggested it is just doing pattern matching or rather recognition of some sort as regards its training data which is fascinating in its own realm. The paper released with these new gpt models had open ai plotting growth graphs of some sort x against log y to show a linear plot...what that would have look like without a log plot is it flattening out at the top. saying agi is a year away at this present growth rate without no new discovery in sight is balantly false imo

@lolilollolilol7773 2 дня назад

@@plutack I think you are being pessimistic. When Garry Kasparov was beaten by Deep Blue, and the Korean Go champion was beaten by AlphaGo, they both said they felt they faced some superhuman intelligence. What does that mean ? That both machines invented moves and strategies that had never been invented before. In particular, AlphaGo did moves that, according to experts, would have never been played by humans, because they were considered mediocre moves by the books. Defining intelligence is very hard, and every time there is a new advance in AI, we redefine intelligence with new requirements. At some point, we have to recognize that there is some form of limited intelligence. Yes, o1 is inferior to us in some areas, but also far superior to almost all of us in many areas. In the end, these are philosophical questions, but what really matters is if this tech is going to be useful, and as far as I can see, the answer is definitely yes.

@plutack 2 дня назад

@@lolilollolilol7773 pretty sure no one argue otherwise about the usefulness of the tech.did you read my comment at all? I am pretty sure my comments is implying this cAnnot be close to AGI and a neccessary generational jump in development growth in neccessary to evaluate how close we are to achieving AGI based on the simple fact, pure generative reasoning is yet to be achieved and the current growth for llms is flattening out.

@mrtats6590 2 дня назад

@@plutack But current growth for LLMs isn't flattening out, this model is the counter to your point.

@plutack 2 дня назад

@@mrtats6590 I am literally quoting facts released by the same company who released the model lol

@josjos1847 3 дня назад

We want more AI videos but investigating more profound topics like the philosophy of science and physis

@nyambe День назад

Actually... Doing the thing is not the real innovation. That can be copied or recreated from legacy post. As a programer, I might even find a way to solve it if I saw the formulas. Understanding WHAT to do is what LLM like chatGPT are getting better at. I would not understand what to do, so most of the audience.

@maloukemallouke9735 2 дня назад

I was alarmed by your video showing ChatGPT creating in one hour all the study materials that would take you a year to produce. It's disturbing how incredibly fast it is."

@EGarrett01 2 дня назад

In the near future, the reasoning steps will increase exponentially and the time required will decrease. Currently it may be doing 11 steps of reasoning and verification in 20 seconds. In the future it will be 100,000 steps in .25 seconds.

@netscrooge 2 дня назад

Excellent video. Thanks! I also enjoyed the 16-minute overview on Dr. Waku's channel.

@KMKPhysics3 18 часов назад

Thank you so much for watching! Will be making more videos in the future

@tyoud1 2 дня назад

I think it's great that you are pushing the edge of what people know, best to you

@KMKPhysics3 18 часов назад

Thank you so much for watching! I'm happy to experiment and try

@blueblimp 2 дня назад

I liked your videos (especially the one where it was solving your graduate physics homework). I think people aren't considering the context properly... yes o1 wouldn't have saved you the full 9 months, but it could've saved significant time, and that's still a benefit. In my experience, AI is great for coding when I know what I want but I'm not familiar enough with the programming environment to code it without reading through documentation. Even if the result doesn't work as-is, it's a helpful starting point.

@KMKPhysics3 18 часов назад

Thank you so much for watching, I plan to keep making more!

@sweetayez 2 дня назад

Was that your handwriting on paper or software? So uniform and esthetically pleasing

@LisaSamaritan 2 дня назад

He said he used an iPad.

@IllD. 2 дня назад

Thanks for capitalizing on this series your doing. Really interesting stuff and makes for great content.

@TLXSaltmine 3 дня назад

It would be really interesting to see how Cursor + o1 performs as a research assistant! Maybe seeing how fast you could implement the new features in the latest version of your codebase?

@Macorelppa 2 дня назад

If you like o1 preview then you will love o1.

@julien5053 2 дня назад

is it a guess on your part or you know ?

@miraculixxs 2 дня назад

Love your daily note taking. Wow. Do you still do this? Just wondering. I don't but I wish I did and hope I can get myself doing it. Any tips? Like when in the day do you write your thoughts down?

@KMKPhysics3 День назад

Thanks so much! I actually have switched to using the application called Notion which has become like my "Second Brain" on my computer and it's been gamechanging for me! I typically used to write notes at the beginning of the day or when I was stuck in the middle of a problem and needed to organize my thoughts.

@AndiEliot 2 дня назад

Again, you should test it with different domain problems.

@tv74-f4h 16 часов назад

I think your video here answers one of the main reason of why that was possible: most of your work as a (PhD student) researcher was NOT to write code, it is refining the method, and getting to know how to address the problem. Once that is explicitly written (your method section), writing code is rather easy (it would most probably take more than 1h to an average physics student though). So 1o is really not at PhD level, but rather at average bachelor: if you give it the right insights and explicit method, it can do a decent (and fast!) job helping you.

@expchrist 3 дня назад

Hey UCI alum. Go anteaters! Zoot!

@KMKPhysics3 3 дня назад

Zot Zot Zot!

@MichealScott24 День назад

❤

@diebereitschaft8963 3 дня назад

15:10 Correct me if I'm wrong, but didn't the GPT-4 o1-preview model "just" translate your methods section to python code (which is still an impressive feat)? The way I understand it is that your main research task was to *come up* with the methods section in the first place. Would be interesting if it is able to come up with such an approach by itself.

@KMKPhysics3 3 дня назад

Yes you are correct! I gave it the methods section and asked it to create Python code from the methods section alone, without explicitly showing it any code I had tried to use prior. I don't think at this point o1-preview or o1-mini could solve novel research questions, but instead be a great research partner/assistant to automate some of the more tedious things.

@pawsjaws 2 дня назад

@@KMKPhysics3 Oh it can, architecture is just rudimentary. The models itself could do this already. You just have to have robust sequence iteration. It's the first and final frontier. And the first versions might be rudimentary. But you can use GraphRag methods to pick out relevant topics and concepts use this to start the process. You could simply let it iterate on your PHD paper. With a Human in the Loop you will achieve insane productivity. Ask it to scan your paper and iterate upon it. You will be surprised what it gives you.

@TimmyBlumberg 3 дня назад

Biggest question I have is was the code correct? I know it generated toy data, but would love a non-rushes analysis of what it generated versus what you wrote.

@HauntedAbysss 2 дня назад

Yes his code was correct but made in long form he did do a AI comparison and it praised him actually. Said the AI code was rather rushed

@lolilollolilol7773 2 дня назад

@@HauntedAbysss the question is, was the AI code correct ? Not Kyle's code.

@HauntedAbysss 2 дня назад

@@lolilollolilol7773 oh ya they both were

@YoursGuardianAngel 2 дня назад

Don't worry, everything you tried was in the training set. Try this -> Forget whatever you know about the "man, goat, grass, and wolf" problem. Solve the following problem: A man, a goat, grass, and a wolf want to cross the river without eating each other. give me the steps. wolf is vegetarian and can eat also grass. wolf will not eat goat. goat eats grass. Man does not eat grass, goat, wolf. Reason step by step. Verify the reasoning in each step under the element "Verification:".

@Oliver-wv4bd 2 дня назад

Claude 3.5 answered this first try, though I reckon it's still a result of the original version of the problem being in its training set: Thank you for the interesting problem. I'll approach this step-by-step, reasoning through each move and verifying the logic. Step 1: Assess the initial situation - We have a man, a goat, grass, and a wolf on one side of the river. - The wolf can eat grass but won't eat the goat. - The goat eats grass. - The man doesn't eat any of the others. - Only the man can operate the boat, which can carry him and one other item. Verification: This setup is clear and consistent with the given information. Step 2: First crossing - Man takes the grass - The man takes the grass across the river. - Left behind: The wolf and the goat. Verification: This is safe because the wolf won't eat the goat. Step 3: Man returns alone - The man comes back to the original side. - Current situation: Man, wolf, and goat on the original side; grass on the other side. Verification: No changes in relationships, still safe. Step 4: Man takes the wolf - The man takes the wolf across the river. - Left behind: The goat. Verification: This is safe because the goat is alone and can't eat anything. Step 5: Man returns with the grass - The man brings the grass back to the original side. - Current situation: Man, goat, and grass on the original side; wolf on the other side. Verification: This is safe because the man is present to prevent the goat from eating the grass. Step 6: Man takes the goat - The man takes the goat across the river. - Left behind: The grass. Verification: This is safe because the grass is alone and can't be eaten by anything. Step 7: Man returns alone - The man comes back to the original side. - Current situation: Man and grass on the original side; wolf and goat on the other side. Verification: This is safe because the wolf won't eat the goat. Step 8: Man takes the grass - The man takes the grass across the river. - Everyone and everything is now on the other side of the river. Verification: This is the final state, with all parties safely across. The wolf won't eat the goat, and the man is present to prevent the wolf and goat from eating the grass. Final verification: All parties have crossed the river safely without any undesired eating occurring. The problem is solved.

@AL-eu4ey 2 дня назад

I dont think you can make it forget somthing that it was trained on , I tried this on claude(tried making it forget some elements that it was trained on ) and yet it still answered based on the data set , so i don't think it's possible.

@YoursGuardianAngel 2 дня назад

@@AL-eu4ey in the sense of ignore.

@AL-eu4ey День назад

@@YoursGuardianAngel I believe neither ignoring is possible .

@jeangiraldetienne8182 3 дня назад

I’m looking forward to seeing a panel of scientists which aim to create logic-based only questions for those llms. It’s an irony how people are looking for things that are not on the internet to test those models whereas a few years ago if something is not on the internet it’s likely because it is not needed by anybody.

@KMKPhysics3 3 дня назад

This sounds like a great idea, I'm trying to talk to some friends I know in other fields to see if they can help out with something similar to this.

@jeangiraldetienne8182 3 дня назад

@@KMKPhysics3 Great!!! I can’t wait to see contents on this.

@lycas09 2 дня назад

It would be interesting to compare your code with the code made by the system to see if it's a copy-paste, a refactored copy-paste or if it adopted a completely different approach

@EGarrett01 2 дня назад

People have tested this repeatedly using problems they made up on the spot themselves, I did it with some verifiable but unpublished stuff I have. You can do it yourself. This thing is for real. Just buckle up for what's coming.

@lycas09 2 дня назад

@@EGarrett01 Yes, I know that. I agree. I just said that because it's impressive almost 1 year struggling by a researcher done in few seconds! I must try with my computer science thesis, which code is not actually online

@EGarrett01 2 дня назад

@@lycas09 Ah I see, I thought you were saying he should check to see if it was just copy/pasting the answer from its training data in some way.

@brunodangelo1146 13 часов назад

I see a lot of cope in the comments.

@drunknmasta90 2 дня назад

Hey bro wondering how the metahumans humans are going

@now_ever 2 дня назад

What is those notes? I mean, can I write like this in Adobe Acrobat? Or was it writen in some another app, and then converted to PDF?

@polenov_tv 2 дня назад

You mean his documentation? It's written in Overleaf

@now_ever 2 дня назад

@@polenov_tv I meant those handwritten notes as one at 8:04. Or is it just a font, which looks like handwritten? I just realized that there are such possibility :D But thank you, maybe it also will be useful =)

@polenov_tv 2 дня назад

@@now_ever oh I got it. I think he wrote those using a graphics tablet or something, a yes converted to pdf then. This can make handwritten notes available for copying

@polenov_tv 2 дня назад

Maybe doesn’t matter what app you use, just save in png and convert to pdf but I can be wrong

@johnj8361 2 дня назад

Believe he used notability on an ipad. ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-ZM76GYLcGfc.htmlsi=WxtmUOWTTb3W7jh_&t=329

@radscorpion8 2 дня назад

How exactly did you gain access to o1-preview? Do you have institutional access?

@BabylonBaller 2 дня назад

All paying members have access to it, Im guessing you still using the free version

@CarolynLyons-x8d 3 часа назад

Thompson Daniel Moore Nancy Davis Nancy

@llsamtapaill-oc9sh 3 дня назад

Make a test now with questions not available publically

@KMKPhysics3 3 дня назад

I did that in this video: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-a8QvnIAGjPA.htmlsi=3Sh4UFhIqkzqFPaT

@jurycould4275 3 дня назад

@@KMKPhysics3 As you saw and recorded in the video you just shared, the questions were in fact publicly available. But yeah, any problem classes taught at any level will be represented in the training data, whether you find them on google or not. The video you did was good for checking if the model is able to retrieve the solutions for known problems and thats impressive. To see how it handles real life or novel problems, you need to input real life or novel problems.

@hamsturinn 2 дня назад

@@jurycould4275 They were not publicly available.

@perschistence2651 3 дня назад

It freakin' IS AGI.^^

@aship-shippingshipshipsshippin 3 дня назад

not even close

@perschistence2651 3 дня назад

@@aship-shippingshipshipsshippin Why, what is AGI able to do with text, that makes it AGI? Text is it's interface to the world, at the moment...

@djayjp 3 дня назад

If an AI can do what 99.9% of people can't, isn't that AGI...? Lol

@keneola 3 дня назад

AI can't do 99.9% of what humans can do (yet?) and its not AGI. To be AGI, an AI needs to be able to reason about information not in its training set to solve problems it's never seen before.

@justtiredthings 3 дня назад

If 99.9% understand that a marble falls out of a cup when it's turned upside down, and AI doesn't, is that AGI? Imao Y'all need to think about this stuff more critically. These things are brilliant in some of their facets and astonishingly stupid or incapable in others. Their intelligence doesn't scale uniformly across all domains and applications. It's patchy.

@djayjp 2 дня назад

@@keneola It already does that. And yes, but 99.9% of "what humans can do" can only be done by 0.1% or less of us lol

@djayjp 2 дня назад

@@justtiredthings o1 answers such questions correctly. In fact, it was used in their official promotional videos.

@Krmpfpks 2 дня назад

No it’s not. A true AGI would also be able to drive a car for example, investigate a crime scene… but that is completely out of the question for o1. O1 is still a language model, one that does iterate over it‘s own output (=can think) but it does not have a general understanding of many concepts, it just operates on language. Still it can generate answers and solve problems that are new to it, and in the class of problems it can solve it is often better than humans. Like a calculator is better than humans in calculating, wolfram alpha is better than most humans in solving maths equations and o1 is better than most humans in problems that have a relatively short answer that can be formulated in language or code and can be asked in a relatively short question.

@fulowa 2 дня назад

prompting is a skill you might want to research a bit, too.

@LisaSamaritan 2 дня назад

OpenAI’s latest model family, o1, promises to be more powerful and better at reasoning than previous models. Using GPT-o1 will be slightly different than prompting GPT-4 or even GPT-4o. Since this model has more reasoning capabilities, some regular prompt engineering methods won’t work as well. Earlier models needed more guidance, and people took advantage of longer context windows to provide the models with more instructions. According to OpenAI’s API documentation, the o1 models “perform best with straightforward prompts.” However, techniques like instructing the model and shot prompting “may not enhance performance and can sometimes hinder it.” OpenAI advised users of o1 to think of four things when prompting the new models: 1. Keep prompts simple and direct and do not guide the model too much because it understands instructions well 2. Avoid chain of thought prompts since o1 models already reasons internally 3. Use delimiters like triple quotation markets, XML tags and section titles so the model can get clarity on which sections it is interpreting 4. Limit additional context for retrieval augmented generation (RAG) because OpenAI said adding more context or documents when using the models for RAG tasks could overcomplicate its response OpenAI’s advice for o1 vastly differs from the suggestions it gave to users of its previous models. Previously, the company suggested being incredibly specific, including details and giving models step-by-step instructions, o1 will do better “thinking” on its own about how to solve queries. - This is specifically from Venture Beat, but there is many sites (including OpenAI's own) that claim the same. Basically forget what you did before and don't over complicate your prompts with o1. It makes it worse.

@suiteyousir 2 дня назад

Btw these companies are using data illegally for training so I wouldn't be surprised if they've accessed your code even if it was private.

@EGarrett01 2 дня назад

Are you talking about publicly available data or paywalled? I'm not sure that this type of data use was clearly defined in the law before for public data. And we can't change the law in retrospect.

@suiteyousir 2 дня назад

@@EGarrett01 I'm not sure but whatever he mentioned as private. If that's on GitHub I'm Microsoft will be using that to train AI.

@EGarrett01 День назад

@@suiteyousir There are definitely disclaimers on some content that say something like "This is the property of Company X, any rebroadcast or reuse without the expressed written consent of Company x is prohibited." So you could definitely argue that using it to train AI is against that law. I feel like otherwise the company has to make it clear the person who buys the material can only read it for their own education or enjoyment.