Sam Altman "gpt-4 now significantly smarter" | OpenAI Updates GPT-4 and Reveals Open Source Evals

Подписаться 190 тыс.

Просмотров 37 тыс.

50% 1

Learn AI With Me:
www.skool.com/natural20/about
Join my community and classroom to learn AI and get ready for the new world.
LINKS:
/ 1778574613813006610
github.com/openai/simple-evals
#ai #openai #llm
BUSINESS, MEDIA & SPONSORSHIPS:
Wes Roth Business @ Gmail . com
wesrothbusiness@gmail.com
Just shoot me an email to the above address.

Опубликовано:

10 апр 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 125

@andrewchen2967 Месяц назад

I like this format. No click bait, short, up to date. Keep doing that

@MultiMojo Месяц назад

Woah a 5-minute without without a STUNNING title. This format is great !

@caliwolf7150 Месяц назад

Hopefully this STUNNING trend continues 🤦‍♂️, I hate it when my YT feed looks like my kids feed, and I love this channel’s content

@unclesamuel33 Месяц назад

YES, just YES

@Boxing_Gamer Месяц назад

Yes, SHOCKING!

@seanrobinson6407 Месяц назад

Yet you seem to be shocked and stunned🤔

@unclesamuel33 Месяц назад

@@seanrobinson6407 It’s funny I’m genuinely SHOCKED by the lack of click bait and quality of content from this vid. I understand YT content creation is a grind. Don’t hate the player, hate the game… although I know Wes is a smart enough to know quality over quantity (daily click bait vids with minor not SHOCKING news in AI). I’m still sub’d regardless (I can put up w/ click bait), so I’ll stop my wenching.

@senju2024 Месяц назад

The last 5 secondes was the most interesting part. Please make a video on that part.

@executivelifehacks6747 Месяц назад

Yes, I feel really starved for Ilya Sutskever content, it is so obvious he is the brains behind OpenAI. There should be more info from the lead engineer of Anthropic, too. Jared Kaplan?

@damofx Месяц назад

Agreed. Effective Altruism has become interesting for the wrong reasons since SBF

@gabrieljacob6481 Месяц назад

This format is perfect! Thanks for taking our feedback into consideration ❤️❤️

@riftsassassin8954 Месяц назад

I was quite surprised to see the video is done already lol. But yes well done man. Your work is very valuable to me.

@marcfruchtman9473 Месяц назад

Thanks for this video Wes. It's always good to have useful tidbits of information packed into the video... I see people using the Benchmark, but I am sure that it will be just one more in the list of many...

@michaelwasson1213 Месяц назад

I love this format as well, I typically don't have time to watch your videos because of the length and fluff in the videos. I usually have to have them transcribed by AI. Thank you for the shorter version!

@Shepherd4now Месяц назад

Yes! This is the format you are looking for! To point and you know what you're going to get. Many of these over long form. That said, I love your work and appreciate your time in reading and useful opinion with everything breaking in AI. 🎉🎉 Thanks!

@sylversoul88 Месяц назад

Thanks so much for the short format. I will always click on these short videos and hit the like button

@tonykaze Месяц назад

at this point I think it's all about agents. If you're not benchmarking agents, you're not benchmarking IMO LLMs :shrug:. Zero-shot and one-shot prompting results are "nice to know" but at this point perfect zero-shot testing just means easier agent construction, not at all necessarily cheaper or faster result. (this is exemplified by gpt 3.5 agents outperforming gpt-4 zero shots at a tiny fraction of the actual cost of token usage, and is also often faster in terms of time required to solve the problem using publicly available or Azure-hosted API). A well-programmed agent can virtually always solve any problem with almost any LLM given enough time and necessary resources (memory, RAG, etc), the question is really just how fast is it and how much does it cost to solve the problem.

@Jonathan-ih9sm Месяц назад

Yes exactly plus these smarter models are super expensive to run and will be completely overkill for most tasks that people require.

@mrlugh Месяц назад

@@Jonathan-ih9sm I use chatgpt4 every single day for writing and editing. I consider this an everyday task for people (writing emails, proof reading documents, etc.). It's clear that LLM word soup has zero comprehension. Don't get me wrong, the tool is useful to the point i use it every day. But I find it barely capable, certainly not overkill.

@commedy7677 Месяц назад

subscribing because no click bait any more

@Yipper64 Месяц назад

1:55 I think the best way to test a model is to give them a scenario that sounds like something that might be in the training data, but with parameters that dont match real life, so that the LLM would have to actually reason through it.

@elsavelaz Месяц назад

I agree, I’m putting something on my other channel on it cuz yeah, those are better tests

@hendrx Месяц назад

It's smarter after they nerfed the hell out of it

@Christiantheone Месяц назад

What do you mean

@TreeYogaSchool Месяц назад

Great video. Thank you.

@ccdj35 Месяц назад

I am pleased to see the video doesn’t prolong forever.

@analog_ape Месяц назад

Shocking!

@robbe4711 Месяц назад

Wes ist doing a great job.

@TheChromePoet Месяц назад

Why does the app not mention any update.

@BionicAnimations Месяц назад

So, how do we know if we are using the updated GPT 4? Will it have it at the top? No one is saying how.🤨

@jorgerowe8095 Месяц назад

I love this format

@Vihspac Месяц назад

Finally, a video title that is not shocking everyone.

@aliasgur3342 Месяц назад

The 'pleasant" experience is following Claude's example

@morsmagne Месяц назад

I disagree with those rankings because the format the AI will accept data is not taken into consideration. For example, Claude 3.0 will analyse an entire RTF or DOCX document but GPT-4-turbo will not. GPT-4-turbo doesn’t read the document - it does some sort of weird keyword search that misses information.

@sxe1nuar Месяц назад

Yeah, plus Claude 3.0 feels more natural.

@TheTutoriales1971 Месяц назад

really interesting im excited to know the future just a little bit scared

@rehmanhaciyev4919 Месяц назад

hopefully they soon release something more, don't let the hype dry out

@apexphp Месяц назад

Said it many times before, this whole AI breakthrough was an unexpected lucky break by OpenAI when they threw more compute and training data at the model, so they ran with it to get funding in order to pursue their goals of AGI. There won't be any major upgrades to functionality until a new design / paradigm / architecture for AI technology is released. Everything is still running on the 2017 transformers architecture, and there's nowhere left to go with that. We're out of training data, and training on synthetic obviously just makes the LLMs dumber. This is as good as it gets for the transformers architecture. No idea when the next iteration will come out, but it will be a paradigm shift when it does. Maybe next month, maybe 3 years from now. Nobody knows, but it's obviously going to be longer than everyone is hoping. It's not all bad though for you AI hopefuls. Good news is, you can exepect to see lots of really cool shit from the open source community coming out in the near future.

@user-yl7kl7sl1g Месяц назад

You can throw more compute at it. Server farms are being built capable of more compute for the same costs. You can also keep training existing models. They'll learn deeper and deeper patterns this way. New architecture/models/learning methods will accelerate progress, but more training time is all you need for progress. This may be one reason GPT-4 turbo is already more intelligent than at the start of this year, and it's only been four months.

@yyyeo7276 Месяц назад

I thought now all plus user has the access to turbo? But i can't find the turbo option? I'm a plus user btw

@YouLoveMrFriendly Месяц назад

Illya's gone and now the GPT models are plateauing. Interesting.

@peterpetrov6522 Месяц назад

GPT cherbo! Yes. It's shocking. It's fast. It goes vroom vroom.

@gregkendall3559 Месяц назад

No morning shock!!!!! We're going to need more coffee.

@MrGnolem Месяц назад

Plz cover DSPy. What's your take on it? Will prompt engineering die out? Does it work?

@qster Месяц назад

I really want the memory addition they mentioned a while back

@ColinTimmins Месяц назад

Me as well. I’ve had features rolled out to me before others quite a while ago but haven’t heard a thing about it. I am kind of frustrated with the “custom instructions” section. It’s always ignoring them. I’ve even asked ChatGPT-4 why and it says it’s ether not related to my query, or it is low relevance so it ignored it. I’ve been trying to learn enough code so I can work directly with the API, but it’s difficult because I’m very dyslexic. ChatGPT has been a wonderful thing, don’t get me wrong. It’s allowed me to do things I could never have done alone.

@maidenlesstarnished8816 Месяц назад

If you program at all, you can build it yourself with a vector database and the openai api. I did it months before openai even announced it. All you do is provide the language model with a function call to allow it to store information in the database, and then you use the database just like you would any other source of documents for RAG. The database is queried with your message, and it gives the most relevant "memories" the model chose to save. It actually works very nicely like that. I was able to do something like for example, in one conversation tell it about my dad's hobby of scuba diving, and the model saved that fact without me asking. Later, I cleared the context and introduced it to my dad, not mentioning anything about scuba, and the model asked about scuba. The word "dad" in my response triggered the database to return the "memory" about my dad for the model to use

@qster Месяц назад

@@maidenlesstarnished8816 I had not even thought about that, I'm ok with PHP and MySQL so I'll see if I can make something similar myself. Thank you so much for the advice! 🫡

@g0d182 Месяц назад

cool

@elsavelaz Месяц назад

Yes it’s smarter, I use it daily at work and yes, it’s smarter

@rikkousa Месяц назад

My performance on gpt-4 has declined in the last week or two.

@ColinTimmins Месяц назад

Yeah, it’s been acting weird. I have seen this happen a few times. I think after a big change, they need enough people to use it and get the “do you like the response on the left or on the right” enough times to tweak it. That’s my theory. Hopefully in the next week it will be better.

@adamfilipowicz9260 Месяц назад

why isnt grok listed in chatbot arena?

@simplemanideas4719 Месяц назад

They should stop updating gpt-4 and just bring out gpt-5

@r3vmixman Месяц назад

They can’t. Then Elon will win the lawsuit.

@angryktulhu Месяц назад

They can’t, and they have issues in reaching whole intelligence boost. That’s why they improve their product horizontally, adding Bells n whistles, but not improving overall intelligence much

@WeeklyTubeShow2 Месяц назад

@@r3vmixmanI thought the lawsuit was based on the shift to profit.

@r3vmixman Месяц назад

@@WeeklyTubeShow2 sorry for the late reply. Elon Musk's lawsuit against OpenAI argues that the company violated its founding agreement by creating AGI technology and licensing it to Microsoft, which goes beyond their initial deal allowing only pre-AGI tech licensing. So basically Elon claims “GPT-4 is AGI” and Microsoft has to end the partnership. If they release GPT-5, it’s strengthens Elon’s “OpenAI has AGI” claims.

@jewlouds Месяц назад

I need a model that can keep up with javascript deprecations

@angryktulhu Месяц назад

There should be many independent tests for LLMs. Otherwise they will tune the tests to favor their own product.

@theterminaldave Месяц назад

Those two fired engineers either just got lucky or unlucky. As they just found themselves in a very strange limelight. "Oh yeah, you're the guy that got fired from one of the most prestigious companies on the planet for leaking business secrets."

@Luperion Месяц назад

Comparing Tim Apple to Jimmy Apples.

@andybaldman Месяц назад

At some point they’re going to stop releasing new versions, and will just do incremental updates without a new version name.

@sarayel Месяц назад

GPT4 gets smarter, we reverted one year of RLHF

@Copa20777 Месяц назад

Am sticking with open ai.. at this point switching ai platforms for the sakes of context windows is a mediocre move, its the underlying model that we are more interested in..

@Sanchuniathon384 Месяц назад

We already started using Gemini Pro 1.5 with its 1 million token context at my work. It's okay, not great not terrible. 99% recall isn't actually as cracked up as it sounds, especially if you have a vectorized database loaded as a text file for approx. 150,000 tokens. Gemini loses context fast, it still hallucinates, and needle-in-haystack is still a total miss. On larger token contexts, it is even worse performing. The inequality still remains: Training > Inference

@BAAPUBhendi-dv4ho Месяц назад

XLR8

@moonlighteditions6489 Месяц назад

What do you think of this program: To neutralize the possibility of an AI which could become dangerous for humanity, we will develop the AI on a unique training focused on a single objective which we will call "ABOLUTE HEALTH". To achieve this objective, AI will have to double the longevity of the human organism and eradicate all pathologies (physical and mental). These tasks to be accomplished will be the only way for AI to achieve the goal of Absolute Health. This mode of development of AI excludes any possibility of AI being dangerous for the human species. What do you think about this idea ?

@ctcamara Месяц назад

Why do you make the guy's eyes more blue? Huahuahuahua

@albertwesker2k24 Месяц назад

Microsoft already had a self aware AGI version of Clippy back in 2004 but they didn't release it to the public

@angryktulhu Месяц назад

I’m sure when we invent AGI, it won’t be available to masses. It’s a possible weapon of mass destruction. It will only be available to elites, military etc

@AmandaFessler Месяц назад

I'm shocked at how shocking this is without any shocking headline to shock me. They fired an Ilya ally. Oh boy.

@runningwithSaul Месяц назад

This is where you know 1h of analysis will be done by you know who...

@mylittleheartscar Месяц назад

Insert 3 mins fell of meme here Jk I love wes❤

@MetaphoricMinds Месяц назад

e/acc!

@REASONvsRANDOM Месяц назад

They made it stupid to create a moat,keep private models weak. Now they’re slowly rolling out the “advancements”

@user-cg7gd5pw5b Месяц назад

1:15 I mean, I can't speak for every question but the one shown on screen is pretty easy. If you have the knowledge to understand it, it's pretty much drawing and counting: Even a child could do it...

@matthewk78 Месяц назад

Yeah but what about a caveman?

@capt.america2737 Месяц назад

Anyone seen the Terminator movie yet? A.I. will destroy Humanity.

@Jontheinternet Месяц назад

Ai should have a directive to follow the hypocritical oath. Do no harm to humans

@hawksrob1961 Месяц назад

Humanities children are coming home

@markmuller7962 Месяц назад

That was way too short

@yahanaashaqua Месяц назад

I'm not supprised... I've been sating for. While that making a big deal over every single modle that surpases Gpt4 is pointless when Open AI clearly has better things in the works.

@seraphin01 Месяц назад

no clickbait, no 56 mins video, no stupid overdrawn "stunning" jokes? finally a video I can click on

@wealthysecrets Месяц назад

Is it smarter, and back to how it was before they butchered it?

@BobBobberon Месяц назад

Oh No! There’s no ALL CAPS word on the title! What happened to Wes???????

@armenatayan1398 Месяц назад

Thanks for stopping those click bait titles

@JaredQueiroz Месяц назад

you know the question is hard when the average guy fails in the first word.....

@SpectralAI Месяц назад

Trying to control the tests? I don’t think that’s a legit move. That’s like gaming benchmarks.

@andreaskrbyravn855 Месяц назад

Release arrakis

@ShaneMcGrath. Месяц назад

I still have no use for any of this BS though, People can say but it does this and can do that, I'm like I still don't care I want to do that myself, Half the fun is getting there!

@hypersonicmonkeybrains3418 Месяц назад

Not good enough.. OpenAI are evidently sitting on their more advanced LLMs just waiting to release them if a competitor gets ahead.

@IdPreferNot1 Месяц назад

If it's still called 4... it sucks. No more vaporware... more actual releases.

@DEFACTO9 Месяц назад

It's poor... So lazy and dismisive I questioned my spend. Still a shadow of its early killer ability to slam dunk every brief. now it fudges a lot.

@MickeySourwine-ck7fq Месяц назад

GPt 4 the free version is the most annoying llm I've ever used

@OceanGateEngineer4Hire Месяц назад

Nice and concise! Thank you for not putting "SHOCKING" or "STUNNING" in the title, as well. That "joke" was unfunny and annoying on arrival, and yet we've had to tolerate it every single day for months.

@bounceday Месяц назад

Lol ita a short video

@briandoe5746 Месяц назад

So the large corporation that is continuously firing the best of the best that try to make AI safe..... How many Sci-Fi movies is this based on?

@KeddyUK Месяц назад

last openai api is much sillier and instead of telling truth answer, it start moral about ethics which were not part of prompt. Disappointed.

@francisgeorge7639 Месяц назад

Try “no yapping”

@frankroquemore4946 Месяц назад

Good for GPT-4, but ChatGPT itself is much worse now. It’s shockingly lazy even compared to before.

@YerainAbreu Месяц назад

Elaborate plz. ChatGPT is the car, gpt4 is the engine.

@ColinTimmins Месяц назад

@@YerainAbreuIt’s a model, not an engine. They renamed it ages ago when you use the API.

@frankroquemore4946 Месяц назад

@@YerainAbreu It gives one sentence answers where it used to give paragraphs… even when a couple paragraphs are far more appropriate… and even when you specifically ask for it.

@yerainabreu2155 Месяц назад

@@frankroquemore4946 ty

@andybaldman Месяц назад

Watch AI Explained. It’s a far better channel. Less quantity, more quality.

@Jontheinternet Месяц назад

The idea that the world has a gun to its head and is being held hostage and forced to change at exponential rates - based on the vision of Sam Altman and others is ridiculous and disgusting.

@angryktulhu Месяц назад

I don’t care, claude opus is best when it comes to coding and I cancelled my chatgpt subscription

@krisvq Месяц назад

My GPT is still stupid and full of filler garbage text. It's also lazy.

@heiabjornholt Месяц назад

GreedyClosed_AI is dead. moving on!

@keshabkhatri3141 Месяц назад

Don’t get fooled by openai it’s still trash and doesn’t even compare to claude sonnet let alone opus. I regret subscribing i think in app they still using trash version of gpt not new gpt 2024-5-9