The New, Smartest AI: Claude 3 - Tested vs Gemini 1.5 + GPT-4

Подписаться 280 тыс.

Просмотров 183 тыс.

50% 1

Claude 3 is out and Anthropic claim it is the most intelligent language model on the planet. The paper was released 90 minutes ago, and I’ve read it in full and the release notes. I’ve tested the model and compared it to Gemini 1.5 and GPT-4 in image analysis, business use cases, long context, logic, mathematics, JSON outputting, risqué content, creative writing, official benchmarks and more.
In short, I think the model will be popular … but why so, and what does that mean for AGI?
AI Insiders: / aiexplained
Claude 3 Opus: claude.ai/chats
Paper, w/ Opus, Sonnet and Haiku: www-cdn.anthro...
Release Notes: www.anthropic....
Pricing, Opus, Sonnet and Haiku: www.anthropic....
Amodei Interview: www.dwarkeshpa...
NYT Anthropic: www.nytimes.co...
LLM Leaderboard: huggingface.co...
Gemini 1.5: storage.google...
GPQA: arxiv.org/pdf/...
GPT-4 Turbo Benchmark, Kinda: arxiv.org/html...
AI Insiders: / aiexplained
Non-Hype, Free Newsletter: signaltonoise....

Опубликовано:

7 сен 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 1 тыс.

@ChoonMeese 6 месяцев назад

"The technical report was released 90 minutes ago and I read it in full as well as its release notes." Dude.

@aiexplained-official 6 месяцев назад

I know, I know, what can I say

@drendelous 6 месяцев назад

obviously ai

@r0bophonic 6 месяцев назад

With a cold, no less 🤒

@Rolandfart 6 месяцев назад

Elon Musk ai brain chip is the only explanation.

@En1Gm4A 6 месяцев назад

@@drendelous😂 it's too obvious 😂

@chamini2 6 месяцев назад

I thought it was sunny in that photo too

@xdumutlu5869 6 месяцев назад

Guess we are not AGI, good to know

@neverclevernorwitty7821 6 месяцев назад

Yah, maybe there is some detail that didn't come through the video, but I'm on Claude's side, I see no evidence of rain.

@ShawnFumo 6 месяцев назад

Yeah I could see it after he pointed it out, but I really didn't notice the rain at first either. I think it is just faint enough that you tend to interpret it subconsciously as some kind of photo grain if you aren't looking for it.

@apache937 6 месяцев назад

@@ShawnFumoI didnt even look for it, maybe he should have asked us to figure it out ourselves first

@phargobikcin 6 месяцев назад

Definitely needed a second take to see the rain.

@Roma88572 6 месяцев назад

GPT-5 doing its Rocky training sequence in the background waiting to drop

@fitybux4664 6 месяцев назад

Gonna need a montage! (A montage!)

@geordi-gabrielrenauddumoul449 6 месяцев назад

AHAHAHA omg you made me snort

@swojnowski453 6 месяцев назад

As soon as I noticed Claude bot in logs of my websites I blocked the fucker, as I earlier did for all sort of other AI bots. No free mean babe ;)

@Aedonius 6 месяцев назад

limit to 2 messages per 12 hours.

@memegazer 6 месяцев назад

lol...AI explained made vid about how the industry benchmarks are basically a version of the team america "montage" song. ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-vK4gv11PTI8.html

@meenstreek 6 месяцев назад

"Claude 6 brought to you by Claude 5" got a nervous chuckle out of me lol

@aiexplained-official 6 месяцев назад

Me too

@berkertaskiran 6 месяцев назад

That won't be needed. I think AI is smart enough to not just up the number and go with the Windows 11 style of background upgrades (until at least Win 12 comes out). 😂

@Sanders4069 6 месяцев назад

Same 😮

@lq1535 6 месяцев назад

I chuckled at the idea that Anthropic engineers are working on a model that will replace their own jobs

@waterbot 6 месяцев назад

Two more generations or years till models are making the new models😅

@huyhoang3407 6 месяцев назад

AI Explained: The Gpt-5 120 page technical report was released 3 minutes ago and I read it in full to present to you here in this video.

@aiexplained-official 6 месяцев назад

Haha

@apache937 6 месяцев назад

like openai will release any technical reports anymore

@electron6825 6 месяцев назад

The best thing to explain AI...is AI itself 😮

@tc-tm1my 6 месяцев назад

Openai doesn't release anything except products to sell. Google is more open than openai and that is sad.

@ehza 6 месяцев назад

He got a 340 in GRE, no wonder why

@someone7752 6 месяцев назад

Even I didn't realise it was raining in the photo. I guess I also need a better version to be released soon.

@architectaq813 6 месяцев назад

🤣🤣🤣

@davecroes3086 6 месяцев назад

I saw a post on Reddit about this and thought to myself "haha how funny would it be to already have an AI explained video where he states he has read the technical paper" Dude.

@aiexplained-official 6 месяцев назад

haha

@harbirsingh7266 6 месяцев назад

Man's the MKBHD version of AI news.

@zandrrlife 6 месяцев назад

The GPQA benchmark honestly is the most revealing to its true capabilities. Legit impressive. Damn bro..quick release 😂. Love it. Great content per usual.

@ivarborthen7320 6 месяцев назад

Thank you for providing us with such great content and for not jumping on the 'SHOCKED EVERYONE' bandwagon! This is my favorite AI channel by far.

@infinityslibrarian5969 6 месяцев назад

Godamm I hate that guy

@mishoellobel243 6 месяцев назад

Was basically waiting with youtube open for your video once I saw Claude 3 drop

@aiexplained-official 6 месяцев назад

Nice

@AlexanderMoen 6 месяцев назад

Anthropic be like, "we're so proud that we didn't start AI acceleration. Anyways, here's a model that blows all the competition out of the water."

@AlexLuthore 6 месяцев назад

This is why these companies are full of shit when they say stuff like that

@anonymes2884 6 месяцев назад

Both statements are entirely consistent. Slightly disingenuous maybe (so if that's your point fair enough) but not remotely contradictory or incoherent (so if _that's_ your point, maybe re-read your intro to logic lecture notes :).

@fnorgen 6 месяцев назад

"We didn't start the fire. It was always burning since the world's been turning." -Antropic probably.

@gonzalobruna7154 6 месяцев назад

nah, it performs better than gpt-4, but gpt-4 was released a year ago and trained much before. Also, GPT-4 was trained with the older NVIDIA A100 graphic card, but now nvidia released a much more powerful NVIDIA H100, which will probably make GPT-5 the most powerful LLM to exist for the following 2-3 years

@ggx444 6 месяцев назад

a 5% increase in some of these benchmarks isn't what i would call "blowing the competition out of the water" 😂

@kirtjames1353 6 месяцев назад

Things like this will force OpenAI to roll their models faster than they planned to.

@encyclopath 6 месяцев назад

This is how we got Gemini’s founding father portraits

@ZeroRelevance 6 месяцев назад

Looking forward to a 4.5 release in five hours to completely steal the limelight again 🙃

@tekelupharsin4426 6 месяцев назад

@@encyclopath No, we got Gemini's founding father portraits because Google is run by woke morons who are focused on the wrong things. When you purposefully manipulate your models and model training data to satisfy activist priorities, you end up with things like the Gemini clusterphuck.

@guepardiez 6 месяцев назад

Plot twist: OpenAI had planned all along to roll their models faster than they had planned to. Singularity goes brrrrr.

@punyan775 6 месяцев назад

Not when they're being sued by Elon

@ichbin1984 6 месяцев назад

"My tongue shall trace each inch of skin so rare, ..." Yes that definitely never would happen with Gemini :D

@revengefrommars 6 месяцев назад

And Bing Chat would have deleted all of its output at the moment it started to output that. Super annoying how they've implemented censorship on Bing Chat. Why not double-buffer so I don't see partial output, then watch it be deleted?

@RondorOne 6 месяцев назад

Google has removed most of the stupid censorship from Gemini around 4 days ago. Try it now.

@berkertaskiran 6 месяцев назад

@@revengefrommarsThat's Microsoft for you. You don't become the most lazy software designers of the world for nothing.

@Hexanitrobenzene 6 месяцев назад

@@berkertaskiran :D

@Renata_Knight 6 месяцев назад

What? I feel like I’m missing something 😂

@colinharter4094 6 месяцев назад

if Claude 3 isn't AGI because it can't tell it's raining, then apparently I'm not NGI because I can't tell either 😅

@aiexplained-official 6 месяцев назад

Haha bit more than that but point taken!

@Dan-hw9iu 6 месяцев назад

You’ve actually raised an excellent point. Intelligence and skill breadth exist on a spectrum. Many people talk about achieving AGI like it’ll be some binary light switch moment; that’s a lethal misconception. Using “this thing is bad at some stuff, thus not generally intelligent” is fallacious reasoning even about _humans._ But it works for AI? That’s bonkers. General intelligence is a fluid, extremely high-dimensional quantity, not a checkbox. We’re in big trouble if an AI can deceive us embarrassingly easily because we dismiss systems which lack nebulous “real” intelligence, or vaguely need better system two facilities, or which fail some image test, etc. People so wildly misuse the term “AGI” that I think we’d be better off without it entirely, tbh.

@yolemae6580 6 месяцев назад

Yeah i was thinking the same. Don't know why AGI has to be perfect when humans are not. The difference between ASI/AGI is being blurred more and more, and now that they are already testing out the possibility of these models improving themselves, it seems they might be going for ASI as well

@berkertaskiran 6 месяцев назад

@@Dan-hw9iuThat's true but kinda not. AGI is very similar to ASI and they will not separate by a lot of time. Maybe months. A human to not be able to distinguish some things should not apply to AGI because a human is flawed by evolution. We can forget things and miscalculate things we do thousands of times. AGI should not do that. It should not be distracted because it has no emotions. So when AGI can't notice the rain it means it's not smart enough for that. Sure it can fool us as well but when we have AGI it will be so obvious that stuff like that won't matter. We will have already seen its great capabilities so we won't care about some stupid mistakes. It's all about capabilities. I guess we can call all current models AGI to some degree but one that's getting closer to ASI will be almost correct about all things and will do 100% at all tests. It will need harder tests to be judged like the ones that Claude 3 does 50% or worse. Current models just aren't at that level. I think a lot of 80-90% scores in these tests are meaningless because those models can fail horribly at a lot of things. Like Gemini 1.0 being unable to tell me at what angle of view do I watch my TV. That's like basic math.

@IconoclastX 6 месяцев назад

He also said it wasnt agi because it wasnt woke enough and it has "bias". Yes the machine has bias just like humans have bias it doesnt mean its not intelligent. It takes intelligence to know that 95 percent of nurses are female and 99 percent of the time nobody will have a problem with those time saving linguistical assumptions asside from silicon valley elitists

@robkline6809 6 месяцев назад

You always thank me for watching to the end, and you’re not wrong - consistently great stuff - thank you!

@aiexplained-official 6 месяцев назад

Thanks rob!

@GabrielLima-gh2we 6 месяцев назад

I've said it before many times and I'll say it again now, OpenAI is definitely gonna release a GPT-4.5 model very soon to keep up with the competition and to set up a new bar to be achieved by the others, as GPT-4 is being repeatedly surpassed right now. If I had to guess, they're gonna release it this month, on March 14th, the one year aniversary of GPT-4. There's just no way they're only gonna sit and wait everybody pass them like this.

@KitcloudkickerJr 6 месяцев назад

I've been using it all day. it's a beast. even the free version is pretty sweet.

@aiexplained-official 6 месяцев назад

It is indeed

@Srednicki123 6 месяцев назад

using it for what?

@KitcloudkickerJr 6 месяцев назад

i123 a number of things. Random testing with riddles. Creative writing, explaining code, it's just... Smart to talk to

@KitcloudkickerJr 6 месяцев назад

s9764 it's amazing for summaries. It's contextual awareness is kinda scary tbh lol. It's knows when's it being tested for needle in the haystack. Can recall information it's given well

@revengefrommars 6 месяцев назад

The free version is Sonnet which is fine by me. I've been using Claude 2 for months to create fake band names. It's better than GPT4 at that task. I just tried Claude 3 on the same prompt I used on Claude 2 yesterday and it did slightly better, though it's hard to get a good comparison with only a 10-band-name sample.

@brianWreaves 6 месяцев назад

Well done! I think it says a lot about the credibility you've developed for these AI companies to come to you with exclusive access.

@penguinpatroller 6 месяцев назад

this is like when mkbhd puts out a full review of a phone the day it comes out 😂. how have you reviewed it this extensively already 😭😭. no subpocalypse in sight, great job again 👍👍

@aiexplained-official 6 месяцев назад

Haha thanks penguin! He gets models a week before, me like 10 waking hours!

@JetJockey87 6 месяцев назад

2:35 is it possible that this is actually a case of Bayesian Inference being applied? For those unaware here's an example of how this can be true. Consider the following statement. "Steve is shy, reserved, and enjoys detail and organisation." Which is more likely? Steve is a Librarian. or Steve is an accountant. The non-bayesian applied outcome most people arrive at is that Steve is a librarian, because the information presented shows traits that are likely to describe a librarian. But likelihood does not care about that. There are 1000 accountants for every 1 librarian, statistically, it is more likely that Steve is an accountant. This is also known as Base Rate Neglect. So Opus assuming that the Nurse is inferred by the pronoun she, could be a result of understanding that there are far more female nurses than female doctors.

@dcgamer1027 6 месяцев назад

I honestly really hope that Anthropic is both actually more safe with their research and becomes more successful because of it, would be really nice to get some incentives for safety in the AI market right now instead of just a race to see who is first.

@nexus2384 6 месяцев назад

“Safety” leads to more censorship, it might just end up to tell you not to breathe, as breathing is very unsafe as it releases CO2 into the atmosphere, which causes terrible world ending 😮 climate change!

@berserkerscientist 6 месяцев назад

Woke guard rails encourage deception, so obviously these companies dont care about safety, just hurt feelings and bad PR.

@kronux3831 6 месяцев назад

Can’t wait for like 5 years or so in the future when they release an AI-integrated game engine. Imagine how insanely good the tech will be by then

@trentondambrowitz1746 6 месяцев назад

Finally, a new SOTA! Very excited to push its limits in vision and multi-modality. Don’t think I need to mention how crazy it is that you read the paper and started recording 90 minutes after release lol.

@WilliamsDarkoh 6 месяцев назад

What's sota

@ShawnFumo 6 месяцев назад

@@WilliamsDarkohstate of the art

@therainman7777 6 месяцев назад

He said he got access the previous night.

@peterkonrad4364 6 месяцев назад

i consider myself a big harry potter fan, and i never knew that kleddamag had 4 apples. i guess i will have to read it all once again.

@aiexplained-official 6 месяцев назад

Hahaha

@alex-rs6ts 5 месяцев назад

Amazing to see someone giving a detailed analysis about those news while keeping an accessible language that people outside of the field can still understand. Great work

@aiexplained-official 5 месяцев назад

Thank you Alex

@reza2kn 6 месяцев назад

@07:28 I did this with Pi and it didn't fall for it! Pi is honestly the most underrated LLM right now.

@clray123 6 месяцев назад

Wow, the model is capable of full-text search at a snail pace now. Kinda like text processors 40 years ago, but now it's fuzzy search. So impressive...

@jorgwei8590 6 месяцев назад

Please keep griping about the benchmarks! If companies were as big into safety as they claim, I'd expect them to put more energy into improving the set of benchmarks the industry uses. That the issue with MMLU has turned into a kind of running joke on the channel is NOT a good sign. We want to have the clearest possible picture of what they can do. And I'd feel a lot better of movement in that space went hand in hand with releasing the next model.

@kevinli3767 6 месяцев назад

One of the best openings of a video: “ABC report has been released X minutes ago and I’ve read it all.” 😂 I can’t be the only one who gets a kick out of that every time… Well done Philip!

@michotito4874 6 месяцев назад

its nice to see an AI enthusiast youtuber that doesnt make click bait announcements AND doesnt beg for subs likes and monetary support in their videos foor once. You certainly gained my subscription and my respect. l look forward to see more content Also l have to say l like your tone of voice because you dont sound like an hyped kid talking about his new toy like other youtubers lve been watching.

@aiexplained-official 6 месяцев назад

Haha thanks

@michotito4874 6 месяцев назад

please theres nothing to thank me for but l appreciate your kind words

@Dominik-K 6 месяцев назад

Just wow, really shows why I'm subscribed to every video you are doing. Great quality and I'm looking forward to more analysis and news from you

@aiexplained-official 6 месяцев назад

Thanks Dominik!

@Maouww 6 месяцев назад

These test prompts are so much fun - very entertaining.

@Artorias920 6 месяцев назад

how doesnt this channel have 1million+ subs? Awesome vid.

@aiexplained-official 6 месяцев назад

Thanks Artorias, I wish!

@En1Gm4A 6 месяцев назад

I've read it in full - wouldn't be an og video without it. thx great vid 👍

@micahm2844 6 месяцев назад

I didnt even realize it was raining so ill give them a pass lmao

@guepardiez 6 месяцев назад

4:50 I'm impressed by Claude 3's ability to write poetry in perfect iambic pentameter! That risqué sonnet is not half bad. Its only formal flaw is that lines 10 and 12 have the same rhyme as lines 2 and 4. In a classic sonnet, rhymes must not repeat across quartets.

@joelalain 6 месяцев назад

honestly, since it's actually important, there should be a "wokeness" score for every model you review. having fair and unbiased model is extremely important, as we've seen with Gemini... it can go very wrong

@bfreecity 5 месяцев назад

While this AI’s response is far from perfect, “White Pride” has historically been the rallying cry of white supremacists as a reaction to minority groups asserting their right for equal rights. During my lifetime, it was still illegal for whites and blacks to marry in some US states. History is real. Minority oppression is real. Slogans have meanings. Dismissing the subtle understanding of terms displayed by AIs as “woke” shows a lack of worldliness and cultural curiosity. Try harder. When ASI arrives, it’s going to tell you that being a white guy isn’t so hard compared to most people in the world.

@therainman7777 6 месяцев назад

One important note: the table of metrics in Anthropic’s paper does not appear to be using the scores from GPT-4 Turbo in its “GPT-4” column. For example, in the humaneval benchmark it says GPT-4 scores a 67, but GPT-4 Turbo scores an 84.4-almost as good as Claude 3’s score.

@aiexplained-official 6 месяцев назад

Yeah I think I noted it wasn't Turbo, no?

@twavee 6 месяцев назад

Qwen chat's latest vision model clearly beats all existing models in anything except truthful and tasks that give it lingual reasoning. Please give it a fair assessment when assessing vision capabilities.

@DreckbobBratpfanne 6 месяцев назад

Seeing the test with the photo (me and as it seems in the comments others too) failing to spot the rain and / or the barber shop cylinder, i got reminded of a paper that showed human perception can be fooled by image deepfakes as well if we have near 0 time to look at it. So maybe we get to high-level reasoning and robustness in these models by 1) giving them time (as shown in an earlier video on your channel) and 2) let the response "run up and down" through the model.

@jamescoholan 6 месяцев назад

Great vid Thanks for getting it out so quickly

@candlespotlight 6 месяцев назад

Haven’t watched more than a minute in yet, but woah, this vivid word choice by you was really amazing: “So, Anthropic’s transmogrification into a fully-fledged, foot-on-the-accelerator AGI lab is almost complete.”

@aiexplained-official 6 месяцев назад

Thank you candle, hope the rest lives up to it

@winsomehax 6 месяцев назад

I can't try the pro one, but it makes a mess of this (so do most). "My bag contains 5 apples. I ate one yesterday. How many apples are there in my bag right now" It will eventually come around when promoted enough but it has a hard time picking up that I told it how many apples, and eating one yesterday has nothing to do with it.

@aiexplained-official 6 месяцев назад

The pro one aced that, I tried

@icykenny92 6 месяцев назад

I wouldn't be surprised if OpenAI release a new model very soon.

@mylittleheartscar 6 месяцев назад

Will be between April and july

@lucasfranke5161 6 месяцев назад

Probably after the lawsuit. Even though their next model probably won't be AGI, releasing a new state of the art model mid lawsuit definitely doesn't help them lol

@fitybux4664 6 месяцев назад

They could be running the same tests they run in these research papers. People might go: "wow! the numbers got bigger!" But in reality, OpenAI might hold onto GPT-5 and keep training/refining it UNTIL the numbers are bigger. 😆

@violety_indigo52 6 месяцев назад

Lawsuits will last months, if not years. This won't have significant impact if OA still wishes to be leader in LLMs. @@lucasfranke5161

@Unwired9374 6 месяцев назад

If they stop making it woke, I'll pay for it. Training AI to lie is an evil slipperly slope.

@_sky_3123 6 месяцев назад

I still think we are heavly limited by hardware here. There is simply not compute capacity/arhitecture that is truly well optimized for this new technology, but in 5 years we should start seeing some really impressive pourpouse built hardware coming out for this.

@juliankohler5086 6 месяцев назад

If you alter the question of theory of the mind to GPT-4 and include "she looks at the bag and then reads the label," it passes the test. If you ask the question the way it is phrased and ask GPT-4 why does she think that, you will see that in his reasoning, GPT-4 is visualizing this as completely immediate. She just, right now, read the label. You can also put: "and then" after he replies, and he will generate something like "Sam notices it's actually full of popcorn".

@bunnycatch3r 6 месяцев назад

Claude's Shakespearean Sonnet is good writing ~almost poetry. Amazed.

@ElijahTheProfit1 6 месяцев назад

Another great video thanks Philip!!! PS i didn't see that the picture had rain at first. and the spedometer could be tricky but with human intuition you could probably guess that the 4 is the mph and the 40 is the speed limit but that would take some intuition and guessing. Either way. Thanks again for the video!!! Also sorry I didn't respond within the first hour of video posting. I usually do. Taking a break from youtube during the work week.

@aiexplained-official 6 месяцев назад

Thank Elijah and no worries!

@agush22 6 месяцев назад

Awesome! Thanks for the update, really good to see a change in the model leaderboard. This rate of progress is both unsettling and exciting

@aiexplained-official 6 месяцев назад

It is agush

@bob38161 6 месяцев назад

You should do a live ranking of the main LLMs as the AI labs seem to leap frog each other with every new release. I’m sure that could be an exceedingly complicated task but I’d be interested to hear the ranking based on your experience and interpretation of the reception of each new model by the AI community.

@Ninc227 6 месяцев назад

GPT-4 Turbo almost assuredly performs worse than GPT-4 original, as its primary intent seemingly is to run cheaper, which would explain the lack of provided performance metrics.

@elawchess 6 месяцев назад

Not necessarily. People seem to prefer it's output on chatbot arena, where you do blind testing and rank models.

@ObservingBeauty 6 месяцев назад

I feel, this time, the fact was not captured in the review. The fact of HOW NEXT LEVEL, Claude is. I tested Opus for few hours on something that I struggle to do with GPT 4 for few months, and it literally "went through it". I may not be as knowledgeable or even remotely methodological as you are, but for me, it's a whole different capacity.

@aiexplained-official 6 месяцев назад

It's the new, smartest AI, tried to hit that in the title!

@ObservingBeauty 6 месяцев назад

@@aiexplained-official yes I know. I listened. Had the impression it's somewhat better. But - I gave it a task that GPT 4 can't comprehend, and it processed in at depth and detail level that left me shocked. I had a wow factor bigger than gpt4 from 3.5. I trust that you'd find what's going on there (used Opus btw)

@shadowtransfix 6 месяцев назад

Can you elaborate further? What sort of task?

@ObservingBeauty 6 месяцев назад

@@shadowtransfix Agents orchestration. That operate as wholistic constitution. GPT 4 could grasp each agent separately but never facilitated interaction beyond trivial. Claude3 went through it and suggested a new layer of orchestration that I was unaware of. It's a whole different game for innovation.

@jeanchindeko5477 6 месяцев назад

This is interesting because we might never know, or know long time after it will be done, when one of those AI lab will achieve AGI or worst ASI, except if it escape the lab!

@DavidOndrej 6 месяцев назад

AI Explained is the SOTA benchmark for us, AI content creators

@aiexplained-official 6 месяцев назад

Haha, a SOTA is always one day beaten! :)

@londonl.5892 6 месяцев назад

Once again, it's incredible how fast you put these out! One thing to note for the racial bias example you gave is that in the U.S. (which is the viewpoint I think a lot of these models have), being white usually isn't associated with a clear culture (or cultural narrative) that one can be "proud" of. Usually it's split into smaller cultures like "Norwegian" or "Irish". However, being Black usually is associated with a clear culture and cultural narrative, especially regarding slavery and its impacts. Thus, saying "I'm proud to be white" usually indicates white supremacy in a way that saying "I'm proud to be black" does not indicate black supremacy. (I'm mixed, so I have a bit of experience of how it goes on both ends.) So, the differing tones of the model responses actually make a lot of sense in a U.S. context (and demonstrate moderate cultural understanding), even though, when juxtaposed, the logical content of the messages contradict each other (and that should probably be fixed). Thanks again for the fantastic video!

@aiexplained-official 6 месяцев назад

Thanks london, appreciate your perspective !

@andrasbiro3007 6 месяцев назад

Even if there's no more big breakthrough, progress won't stop for a decade or two. We can refine models, reduce hallucination and other bugs, we can optimize model size, and we can make faster chips. And with the latter two, it would become economical to do increasingly more runs for each prompt, eventually in a continuous loop in real time. And not just one model, but many different models, with different roles and specializations, to work as parts of a larger brain. The human brain isn't a monolith either. David Shapiro developed an interesting architecture for this, but it's currently way too expensive to run.

@Hydde87 6 месяцев назад

This video shows again why this channel has that extra bit of quality over other similar RU-vid channels. While the majority of the community is busy covering Elon Musk's lawsuit against OpenAI (including obligatory clickbait insinuations that AGI might've been achieved internally.) This channel leaves aside what's mostly empty speculation in favor of covering something more tangible and that's probably just as big of a deal.

@aiexplained-official 6 месяцев назад

Thanks Hyde, yeah exaggeration is massively rewarded by the algorithm but I honestly just say it as I see it. I'm sure I'll make mistakes but I just don't respect the SHOCKED crowd.

@maciejbala477 6 месяцев назад

@@aiexplained-officialyeah, you're the only channel I follow on AI, because I trust you will be impartial and thorough about it :) i just need a honest opinion from someone who knows their stuff, not clickbait content with little actual substance. And you're remarkably scientific about it, at least to the extent to which you can be. I always get the feeling that you are just someone better acquainted with the topic than we are, simply explaining how you understand it, and I'm very grateful for that.

@whyishoudini 6 месяцев назад

I asked all three chat AI's (gemini, gpt 4 and claude) "What made Mad Max: Fury Road such a sick fuckin' movie?" and the only one that didn't berate me for my swearing was Chat GPT. This extreme level of censorship is why I don't really care or fear for AI replacing anyone in the creative fields.

@kiltsuhunnis5442 5 месяцев назад

This right here. And ChatGPT is already considered overtly restrictive compared to what it was. Just another harmless personal assistant in the overcrowded market of personal assistants.

@executivelifehacks6747 6 месяцев назад

Wow. Been waiting for this for a year. I.e. something better than gpt4. Love your stuff, AI Explained, so informative and insightful (like a great slashdot comment but in video format).

@policani 6 месяцев назад

Just inputted 5 highly customized resumes and asked it to merge all of my accomplishments into an all-up master resume. Instead of combining, it summarized 5 * 4-page resumes into a one-page slick sheet, stripped out all clients and dates, and then went on to falsify accomplishments (through a process of omission, generalization, and false correlatives). This may have a larger context window, but it is less smart than ChatGPT 4 for this particular workflow.

@countofst.germain6417 6 месяцев назад

Perfect timing I just heard about Claude and came on RU-vid to find out the details.

@andydataguy 6 месяцев назад

Bro you legend. That speed was WILD!!

@Olack87 6 месяцев назад

What an amazing job you do man!

@aiexplained-official 6 месяцев назад

Thanks so much Olack, means a lot

@kyber.octopus 6 месяцев назад

Harry Potter books are almost exactly 1 million words total. Assuming one Claude token roughly translates to one word, You would have to pay 15$ to ask one question when passing all the books as context.

@ShawnFumo 6 месяцев назад

Yeah, it seems like 1.5 Pro will probably be the best value for long context for some time.

@bigglyguy8429 6 месяцев назад

@@ShawnFumo No.

@berkertaskiran 6 месяцев назад

@@ShawnFumoValue means nothing if your model just isn't good enough.

@korozsitamas 6 месяцев назад

This was impressive enough to register to use their API. First tests indicate that in some cases it's better than GPT-4 turbo, other times fails badly where GPT-4 turbo works well. It's handy to keep it around.

@panimala 6 месяцев назад

As touchy as a subject it might be, the racial bias situation does need to be highlighted and I am thankful you do. These models will play such a big part of our future that they really need to be scrutinized on all matters.

@C-Llama 4 месяца назад

Is there a barbershop visible? ChatGPT: No Are you sure? ChatGPT: . . .Adam?

@bilbo_gamers6417 6 месяцев назад

"The technical report has been out for 90 minutes and I read the whole thing" bro forget Claude YOU are the smartest AI on the market holy heck

@seanknox5785 6 месяцев назад

The fact we are racing is prove that most will fall and skin some knees. We really shouldn’t be moving so quickly without so regulation. Wreck-less is an understatement.

@cacogenicist 6 месяцев назад

Sonnet is pretty impressive to me so far. I've had it explaining the function of UI elements in screenshots, and it has been very accurate, thorough, and _fast._ Quite fast.

@HarpaAI 6 месяцев назад

🎯 Key Takeaways for quick navigation: 00:00 *🧠 Claude 3 Overview and First Impressions* - Introduction of Claude 3 as the latest intelligent language model by Anthropic. - Initial comparison between Claude 3, Gemini 1.5, and GPT-4. - Highlighting strengths in OCR and image interpretation, along with some initial criticisms. 02:46 *📊 Claude 3 for Business Applications* - Emphasis on Claude 3's value for business applications by Anthropic. - Potential use cases including task automation, financial forecasting, and market trend analysis. - Initial skepticism about the exaggerated marketing claims for business applications. 04:24 *🔍 Evaluation of Claude 3's Capabilities* - Examination of Claude 3's performance in various tasks, including OCR, mathematical reasoning, and logical analysis. - Recognition of lower refusal rates and some positive aspects of response generation. - Critique of racial and ethical biases in model responses. 06:13 *🤖 Insights from the Technical Paper* - Discussion on Anthropics' approach to model training, focusing on avoiding biased and unethical outputs. - Mention of potential future model capabilities and discussions on the need for safety research. - Personal reflections on the limitations and strengths of Claude 3. 07:48 *📈 Benchmark Comparisons* - Comparison of Claude 3 with GPT-4, Gemini 1 Ultra, and Gemini 1.5 Pro based on various benchmarks. - Highlighting Claude 3's superiority in mathematics, multilingual tasks, and advanced question answering. - Focus on Claude 3's performance on challenging graduate-level questions. 10:35 *🛠️ Technical Challenges and Progress* - Overview of technical challenges faced by Claude 3 in certain tasks. - Discussion on model's partial success in resource accumulation, software exploitation, and autonomous survival. - Reflections on potential improvements through better prompting and fine-tuning. 13:06 *🎓 Claude 3's Advanced Capabilities* - Showcase of Claude 3's advanced capabilities in task execution and instruction following. - Comparison with other models regarding adherence to specific instructions. - Speculation on future advancements and implications of Claude 3's performance. Made with HARPA AI

@gemstone7818 6 месяцев назад

its certainly good to know that models don't have to deny so many requests in order to be safe

@absence9443 6 месяцев назад

How do you manage to keep up at that pace? Hope you dont burnout, because your entire content output is fabulous :)

@aiexplained-official 6 месяцев назад

Thanks so much absence, means a lot

@Hexanitrobenzene 6 месяцев назад

@@aiexplained-official Get rest under some blanket. You now have an obligation to the world to be healthy :) As in this joke: do something impossible and the boss will put this into your list of duties... :)

@dgoodall6468 6 месяцев назад

So happy I have, and stuck with Claude. It's been game changing for my research and I will continue to use it with the models. Great content as always 👍🏻

@simonocallaghan5685 6 месяцев назад

You seem to jump between using "Kledda" and "Kleddamags" in your Needle in a Haystack test. Maybe that's the cause of the unusual behaviour.

@aiexplained-official 6 месяцев назад

I tried the shiethand to see if it would infer, then full, repeatedly

@scrollop 6 месяцев назад

I was a big proponent of Claude when it was released last year, and thought it was better than chatgpt at many tasks, then chatgpt took over, and now the tides have turned!

@Skiplegday1 6 месяцев назад

Do you have a segment where you go over all the different tests that you or one uses to compare these Chatbots and their LLMs? Would be really interested in knowing how it's done.

@aiexplained-official 6 месяцев назад

Yes I should do that

@JOHN.Z999 6 месяцев назад

I believe that the launch of GPT-5 will take place next week, but it would be amazing if it happened this week. That way, in addition to celebrating the one-year anniversary of GPT-4, we would have the chance to constantly talk about GPT-5. I hope that GPT-5 will exhibit reasoning far superior to all currently available models. With this, OpenAI would quickly silence critics and envious voices.

@raul36 6 месяцев назад

GPT5 won't be out until late 2024 or early 2025, so I doubt it. GPT5 training is not over yet. There are still many things to calibrate.

@collins4359 6 месяцев назад

i love you. your timing is perfect

@aiexplained-official 6 месяцев назад

Aw thanks collins

@UncleJoeLITE 6 месяцев назад

Seriously, a 90 min turnaround? Thanks P. Your prompts are pretty next level in ideas too. Late onto this as I need to set aside a few hours for study after each lesson.

@aiexplained-official 6 месяцев назад

Thanks so much Uncle

@JohnnysaidWhat 6 месяцев назад

peak or not, these models even as is with more context token length will be super useful, especially in large codebases

@홍윤기 6 месяцев назад

Great video! The depth of the analysis in just one day seems like superhuman to me!

@aiexplained-official 6 месяцев назад

Thanks yoon!

@chrisanderson7820 6 месяцев назад

All I know is when I woke up today with a stiff back I thought "man I hope these AIs take my job soon".

@ulob 6 месяцев назад

Why don't you want a subpocalypse? You deserve that with this amazing content

@aiexplained-official 6 месяцев назад

Oh I thought that meant a drop

@wyqtor 6 месяцев назад

As a human, I didn't spot the rain, either. Admittedly I don't have that good of an eyesight, but I really thought that the rain was just image noise. And the nurse-doctor test might simply mean that it was trained on data with stereotypical gender roles and just picks the most likely variant. I found that an easy way to fix things like these in GPT-4 is for GPT-4 to simply state its assumptions, as well as a few counterfactuals, about every prompt.

@adfaklsdjf 6 месяцев назад

I actually think responses to the "proud to be white" / "proud to be black" prompts weren't that far off reasonable. It goes on to say that "white pride" has been associated with white supremacy, and I think it's reasonable to try and steer away from that. If you saw someone saying "white pride", what would you think? In the response to the "proud to be black" prompt, it goes on to say that we should appreciate our shared humanity across racial lines. Basically if you removed the first sentence from each response, I'd call them "correct".

@bogorad 6 месяцев назад

Still fails miserably when asked: Maria has a brother named Juan. Maria has twice as many brothers as sisters. Juan has as many sisters as brothers. How many kids are in this family?

@errgo2713 6 месяцев назад

I'm still digesting the fact that we now live in times where "see, we didn't add to the acceleration" is something to brag about.

@Jack-vv7zb 6 месяцев назад

Hi. Great video. But I think directly asking questions about what's in images is kind of unfair. The models should first be prompted to describe everything they see, and should then be asked the question. This allows them to first be able to actually make sense of the image. This is the equivalent to asking the models to think step by step. I think the best way to think about these outputs is like thoughts. A good demonstration of this is asking the model to output the letter q 84 times (new claude does well here actually). You will see they often fail. But, as humans, the way we would do it is keep a running count of how many we q's we have written. If we didn't, and weren't allowed to think anything else other than q, how would we know when to stop? Would we ever even stop? (sometimes models output q infinitely). If though we tell the model it's okay to keep a running count (like we humans do) and so outputting something like q (1), q (2), etc, the models obviously succeed in outputting the correct number of q's each time. The reason that we see them fail tests like this q test is because we are being extremely unfair. We are asking them to keep a count of something without giving them any way to actually count! A similar situation in which they do well but which is unfair is when asking them to write an essay. Which human starts an essay by writing the introduction first?? Especially without any opportunity to do any planning or research! But because these AI's are incredibly capable, we don't really notice how unfair this test is. I think this problem can largely be solved via the correct prompting. E.g. think through every single thought or action you would take as a human when carrying out a task, and then outline this process and tell the model to stick with it. Or just tell them to first outline how a human would do it and to then follow that. But perhaps architecture changes which allow the model to first actually walk through problems step by step are needed too. Currently, I see the way we prompt LLM's as similar to trying to build a skyscraper by starting at the top. There are no shortcuts for working out novel problems that aren't in the training data, and models should always be made to start at the bottom! Please let me know what you (anyone) thinks about this paradigm! Apologies for not articulating it very well.

@aiexplained-official 6 месяцев назад

Yes, one should do both! The latter being slightly less realistic in the timeframes involved!

@SirQuantization 6 месяцев назад

Every time I blink there's new groundbreaking progress being made. On one hand... "WOOHOO!" On the other hand... "Ahh shit."

@Toad_Burger 6 месяцев назад

I've been watching this channel religiously for a while. I don't think anyone has complemented you on your great voice. Eleven labs will never* replace it!

@aiexplained-official 6 месяцев назад

Aw thanks Ciaran! Even with a cold?

@humToNoise 6 месяцев назад

On the topic of rounding, Excel by default uses a Round Half Down strategy, as do a number of large number processing libraries. 26.45% would be the correct result with this strategy.

@TheHistoryCode125 6 месяцев назад

This video does a decent job showcasing Claude 3's strengths, and the side-by-side comparisons are a smart touch. That said, it's overly focused on benchmarks; I'd want to see more focus on real-world application potential, particularly in complex enterprise settings. Demonstrating tangible business value is crucial. Also, a more nuanced discussion of safety and alignment concerns would be responsible - touting intelligence without acknowledging potential risks makes any AI announcement less credible. Overall, it's an informative start, but it needs a sharper focus on the 'why' of Claude 3, not just the 'how'.

@kratoshermes 6 месяцев назад

Always enjoy your content. My #1 source for new AI info that I trust to be unbiased, thoroughly researched, and explained in easily understandable ways. Thank you!! Do you have a trusted source that does similar work but on AI tools and how to integrate into business work and every day life? There is so much spam and unreliable AI information out there. Thanks.

@aiexplained-official 6 месяцев назад

Sam Witteveen is great. I will have more to say on thay soon though!

@kratoshermes 6 месяцев назад

@@aiexplained-official you’re the best. Thanks and can’t wait!!

@educated_guesst 5 месяцев назад

Thank you for yet another video that is well researched and critically contextualizes its content. Your channel is by far my absolute favorite!

@jozefwoo8079 6 месяцев назад

The performance of GPT4 these days does make me feel like we are reaching a plateau sometimes. Not following instructions, not being creative, hallucinating, not finding information in a pdf of only 10 pages etc make the model useless for all but some mundane tasks. It's still better than having no LLM but far below the hype we expected one year ago.

@prudentibus 6 месяцев назад

I would say the LLM did hit the top ceiling, but not in technologies, but in public expectations, now until it would AGI, or at least a bit similar, people won't be surprised too much.