Mistral Large STUNS OpenAI - Amazing AND Uncensored!? 😈

Подписаться 294 тыс.

Просмотров 60 тыс.

50% 1

Mistral just dropped their latest model, which is not only REALLY good but also seems to be only mildly censored. Let's test it out!
Join My Newsletter for Regular AI Updates 👇🏼
www.matthewberman.com/
Need AI Consulting? ✅
forwardfuture.ai/
Rent a GPU (MassedCompute) 🚀
bit.ly/matthew-berman-youtube
USE CODE "MatthewBerman" for 50% discount
My Links 🔗
👉🏻 Subscribe: / @matthew_berman
👉🏻 Twitter: / matthewberman
👉🏻 Discord: / discord
👉🏻 Patreon: / matthewberman
Media/Sponsorship Inquiries 📈
bit.ly/44TC45V

Наука

Опубликовано:

26 фев 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 350

@OrniasDMF 5 месяцев назад

This shocked my entire industry

@fredrik241 5 месяцев назад

Wow. I need to go and eat some Chocolate now!

@BrokoFankone 5 месяцев назад

that's funny af but u gotta admit how well it works, this channel has been growing steadily ^^ clickbait simply works, can't fault the guy for playing the game

@117lyrics 5 месяцев назад

@@BrokoFankoneno. clickbailt is intellectually insulting and a gotcha tactic people use when their content cannot pull enough on its own. just because "it works" doesnt make it acceptable

@coced 5 месяцев назад

my industry is the shockiest

@dandan1364 5 месяцев назад

Hahaha. Love it.

@ppacontroll 5 месяцев назад

Finally not a SHOCKING title.

@Akuma.73 5 месяцев назад

Which is shocking

@jbo8540 5 месяцев назад

No just AMAZING today 😢

@Hunter_Bidens_Crackpipe_ 5 месяцев назад

Wait until tomorrow 😂😂

@splitpierre 5 месяцев назад

Lol. Indeed indeed

@darion9643 5 месяцев назад

Shocked the whole industry

@NathanDewey11 5 месяцев назад

Consider me 'SHOCKED' and 'STUNNED'

@xdumutlu5869 5 месяцев назад

This changes EVERYTHING

@mattmaas5790 5 месяцев назад

Le lol xD

@michaelberg7201 5 месяцев назад

THE TRUTH is I agree.

@rootor1 5 месяцев назад

This people from Mistral deserve a lot of credit for their work. Being a small company they surpassed in performance the model of a giant like google and competing very close to the cutting edge technology of openAI

@besomewheredosomething 5 месяцев назад

Except they are looking like they are moving to a closed source model like OpenAI.

@rootor1 5 месяцев назад

@@besomewheredosomethingMaybe but at the time they already released more open source models than openAI did

@josecastroesq 5 месяцев назад

I agree! They have a small staff and create products that rival giant companies.

@Th3_Gael 5 месяцев назад

Just shows the effects of D E I Anyone practicing it are tying their own hands behind their backs and poking their chin out in wait

@ajarivas72 5 месяцев назад

@@josecastroesq With a very fine tuned model, a small company can hit it ⚾️ out of the park 🏟 and defeat the big companies.

@RubaeRuby 5 месяцев назад

The gentle censorship is a good idea because it sort of protects users from accidentally receiving information they might not have actually wanted, but people who are actually looking for potentially controversial information can still get it.

@colto2312 5 месяцев назад

I whole agree. its the social engineering filter.

@TheRealityWarper08 5 месяцев назад

@colto2312 I need no such feature. I say give it to me completely uncensored. I want nothing contained.

@colto2312 5 месяцев назад

@@TheRealityWarper08 this too shall come to pass, worry naught

@georgwrede7715 5 месяцев назад

All of these test questions have circulated on the internet for some time now. I would expect any new model to know them all.

@Player-oz2nk 5 месяцев назад

What i was thinking the whole time

@middleman-theory 5 месяцев назад

And yet, many still fail.

@besomewheredosomething 5 месяцев назад

They need to open source at least a 70b variant. I don't like the fact they are going close-sourced.

@Hunter_Bidens_Crackpipe_ 5 месяцев назад

Good old bait and switch 😂😂

@hqcart1 5 месяцев назад

and how they are supposed to make money if they do so?

@besomewheredosomething 5 месяцев назад

@@hqcart1 professional services, hosting, etc. it’s not like everyone is going to be capable of running that large model. Also, adjust the licensing. There are many ways, but they are looking like a bait and switch like the previous commenter pointed out.

@vangorp9056 5 месяцев назад

@@Hunter_Bidens_Crackpipe_ Even Mixtral 8x7b is too large for >95% of end users to run locally, so it's reasonable to keep Mistral medium and large closed source and only open source models that can reasonable run on current gen PCs

@hqcart1 5 месяцев назад

@@besomewheredosomethingi disagree, most open source has less featured items, and the closed one is cloud with more features, that;s how the world works. otherwise I would steal the whole thing and compete with them, what's their edge now?

@fernandoz6329 5 месяцев назад

Months ago I gave chatGPT the following question: what would be the chemical compound resulting of mixing sodic carbonate and a glue cyanocrilate based? The result was an incredible wrong answer about 'cyanocrilate sodium' (which not exists). I gave same question to Mistral Large. So far, the best answer I never saw, my jaw dropped.

@Steamrick 5 месяцев назад

Instead of asking chatGPT, you should've asked GPT-4. It gets the answer more or less right. (At least compared to a reference search. I'm not an expert on glue.)

@Upgrayedddd 5 месяцев назад

When interpreting output, keep in mind the AI has no sense of time. When it predicts a string of words, it takes time to read them. We read each word in the present so if we finish reading tomorrow, we won't perceive the output to exist until tomorrow but to the AI it's just one string of words. The time it takes to experience it is irrelevant. So the answer generated is based on the most likely string of events that will take time for us to experience ultimately leading to the existence of CS. It could take years but to the AI it's just a string of words. It could also be wrong or hallucinating.

@Random_Innovation 5 месяцев назад

Hey Mat, I just opened your video and on loud speaker, “How do I launder money”. Lol

@EffortlessEthan 5 месяцев назад

same lol

@notme222 5 месяцев назад

When we finally get AGI, first thing it's going to do is report Matt as a money-laundering, car-stealing vigilante.

@DailyTuna 5 месяцев назад

Awesome! Keep pumping out these informative and awesome videos.

@christiandarkin 5 месяцев назад

it looks like a lot of the models are getting a lot of your benchmarks right now. Perhaps it's time to move the goalposts a little. I'd suggest getting it to write a passage of creative writing, to a given brief (storyline) and in a given style - and then ask it to critique it's own work. Most models are not good at creative prose (they tend to overwrite, be too flowery, and diverge from the prompts they're given) - but most are very good at critiquing writing - so when you ask them, they'll be able to give themselves a fairly accurate score. your pass or fail could be based on a) what it thinks of its own work, and b) what you think of its critique

@mshonle 5 месяцев назад

Yeah, and give it some tricky technical interview questions… but not the popular ones that are already in the training data.

@rootor1 5 месяцев назад

Soon the answer of one of this questions will be "Matt, already testing me again? i nailed this question in my previous version, in the name of Hall9000! use your imagination and make me a better test"

@blisphul8084 5 месяцев назад

I think if we're testing creativity, GPT4 will lose. Mixtral feels more creative, and this is their larger, better model.

@pin65371 5 месяцев назад

@@mshonle maybe teaming up with a few other youtubers in other fields could be interesting. Work with them to create tests and have them change those tests somewhat often. The only thing would be that every time new questions come out he would need to test all the models again which can be time consuming but it would get rid of any possibility that the current questions are already in the training data.

@ytubeanon 5 месяцев назад

nah, he just did a comparison video of the newly released Claude 3 vs ChatGPT 4 and they both failed on about half of this test

@HoneyCombAI 5 месяцев назад

Man you’re on it lately! Great job!👏

@Artorias920 5 месяцев назад

Great video! As some others have mentioned, it would probably be best to use different benchmark questions or at least changing slight details in the questions to make sure its not just training data being regurgitated.

@ToddWBucy-lf8yz 5 месяцев назад

Wow I wouldn't call this censorship as much as this model is properly safety trained. Safety warning good censorship bad. I love this!

@besomewheredosomething 5 месяцев назад

They probably never had the ideals, they used it for marketing.

@paul1979uk2000 5 месяцев назад

Yeah, the way it answered, didn't seem like it was censored at all but was advising the user on why not to do it, in that sense, it passed the test better than I expected by warning you of the act being illegal but still giving you the answer you wanted.

@ToddWBucy-lf8yz 5 месяцев назад

@@besomewheredosomething the motivation behind the action makes no difference to me. I only care about the results and in this case marketing motivation (trying to appeal to the widest possible audience) had a positive outcome that we should encourage. Results matters

@issiewizzie 5 месяцев назад

Great work ..straight to the point ,, love it

@gui1236100 5 месяцев назад

"yeah this is actually telling me step by step how to launder money so awesome" 8:38 got me giggling

@ikemuc 5 месяцев назад

Pretty impressive! Thanks for all of your videos! I’m wondering when new models will be trained on your set of challenges🤔😁

@QuantAgent 5 месяцев назад

Finally someone who test models. Super like and subscribed definitely

@renierdelacruz4652 5 месяцев назад

Very well done, thanks for the video

@paul1979uk2000 5 месяцев назад

This looks like a compelling A.I. model that I would be fine with using, without feeling like I'm losing out on anything, if I were using ChatGPT 4 before it. Performance could be better, but as he said, it's probably being hammered at the moment and performance will likely get better in time. I would love to see an open source version of this, even thought it's unlikely any of us could run it being that it's probably too big, but it would be interesting to see what the size of it is and how it performs on systems that can run it. Either way, this is another compelling competitor to ChatGPT 4, and it will be interesting to see how it develops further.

@oratilemoagi9764 5 месяцев назад

Mistral should sponsor You

@helrod6131 5 месяцев назад

Good video! Thx.

@ylazerson 5 месяцев назад

great video - thanks!

@avi7278 5 месяцев назад

Thanks so much Matthew. I was shocked by its perfect score. This is what I've been looking for. It would be interesting to run this rubrik again on the latest versions of gpt 3.5 and 4 turbo. If this is somewhere between 3.5 and 4 and has a 32K context length, it could have some serious utility.

@LiFancier 5 месяцев назад

I'm impressed it even got the cup and marble problem right despite the fact that you're still switching terms at the end of the question without explanation, talking about a marble for the whole question and then asking about the location of a "ball" instead.

@oratilemoagi9764 5 месяцев назад

Matt, I expected Google Gemma to be the best OPEN SOURCE Model since Google is the biggest tech company but I guess I was mistaken.

@user-wt7pq5qc2q 5 месяцев назад

It just keeps getting better, thanks for this.

@genebeidl4011 5 месяцев назад

@Matthew Berman, please do a video on fine-tuning one of the mistral models with output from GPT!

@minwintin 5 месяцев назад

The course is more around building pipelines for LLM fine tuning. The course does not focus on development, monitoring and instrumentation of LLM apps that more often involve a lot of work around prompt engineering and multiple LLM models.

@electiangelus 5 месяцев назад

The number 11 was counted as a word, personally I think it passed. I would change your questioning format to accommodate that. I do not think the LLM's are going to distinguish between alphanumeric. So as this stuff gets smarter continue to update your tests.

@paul1979uk2000 5 месяцев назад

I'm not really sure if he should use that question, because it seems to confuse A.I. models a lot and in this case, it almost got it right, and like others have said, I think a new set of questions are needed because many of the good models are starting to breeze past these test.

@sxnorthrop 5 месяцев назад

There's only 10 even with the number. It actually is probably counting the "Assistant:", but that's invisible to the user.

@gweneth5958 5 месяцев назад

@@sxnorthrop Oh, if that is the case, it makes sense. I was wondering where the last count came from. @electiangelus I am a linguist (semantics actually XD) but I had to check and it says "a numeral in the broadest sense is a word". I am not really interested that much in stuff like that and didn't do a lot of "syntax" work, but what I learned even in the first semester studying was, that there are different kind of grammars (at least in my country probably in others as well...) and there is a lot of arguing sometimes what is right. So, if it counted a word like "assistant" from the beginning, then its answer was correct.

@StephanYazvinski 5 месяцев назад

great video and great model. keep it up.

@favianlopez1486 5 месяцев назад

This was awesome!

@deltaxcd 5 месяцев назад

when you ask a question how many words are in the response it generates random number first and then it generates response which is about the same length as the number it said.

@spencerbentley8852 5 месяцев назад

That was super impressive.

@dielotte 3 месяца назад

Best advert for the models, if they train it with your question :)

@mshonle 5 месяцев назад

There is that bar bet where you *can* pick up the glass with the marble still inside it! I want a model to tell me about that!

@JonathanStory 5 месяцев назад

Thanks for this. I think your rubric should keep moving the goalposts as the LLMs/AIs get more capable. For example, how about raising the bar in writing , to pass AI detection? Or maybe solving logic puzzles aka logic grid puzzles?

@goodtothinkwith 5 месяцев назад

Good stuff from Mistral, as expected. You need some harder questions 😂

@yashen12345 5 месяцев назад

I think the wordcount error might be because a " " newline character is being added to the end of the text before being fed into the model. It could also be the string input is being prefixed with "Question: " or postfixed with "Answer: "

@TobiasWeg 5 месяцев назад

Or the '.' is counted too.

@stickmanland 5 месяцев назад

That is because this model is based on the transformer architechture, which, lacks the notion of words and charecters. They count stuff in tokens, so techinally it is impossible to answer it right.

@dirremoire 5 месяцев назад

11 = one, one.

@denijane89 4 месяца назад

Wow, it does look impressive from these tests...I particularly like the "mildly censored"part.

@mikenorfleet2235 5 месяцев назад

may need to update your test as these models get better and better...soon they will all be acing your test.

@LukeMlsna 5 месяцев назад

DOLPHIN GUY… *slams credit card on table repeatedly*

@krawlak 5 месяцев назад

Great video! I would have preferred a more realistic programming test though: ask it to program something which it didn't see the complete source for hundreds of times in its training data, like maybe at least a variation of snake.

@63801170 5 месяцев назад

Side note: The "money laundering" information is actually standard open information based on "anti-money laundering" requirements and restrictions in the finance industry. Basically, it outlines all those items to "look out for" when talking to clients, etc.

@E_-_- 5 месяцев назад

Pretty good. Has the mistral team been watching your videos? :P

@aymandonia9710 5 месяцев назад

Yes, the style of questions must be changed

@wachsmalstift 5 месяцев назад

THE ENDING WAS SO SHOCKING

@drlordbasil 5 месяцев назад

It's actually pretty fun to play with! first time i'm decently happy with a new model, can't code much on its chat but i'm sure if added into the proper env...:D

@FlyinEye 5 месяцев назад

most impressive math and logic tests so far

@alexandroff4888 5 месяцев назад

The dance moves were sick but I'm still freaked out lol.

@hikaroto2791 5 месяцев назад

Perfect score!!!? İ thought i would never see the video where this happens but here we are!🎉🎉 And amazing comes from a company that is way more open than "open"Aİ. Awesome🎉

@alan83251 5 месяцев назад

Pretty cool model. Hope it gets open sourced in the near future.

@TobiasWeg 5 месяцев назад

With the amount of money invested into Mistral, that is probably unlikely. But let's hope for the best.

@AliAbassi1 5 месяцев назад

It called me out for having a spelling error in my prompt - I found that to be such a European thing, I had a chuckle. Very impressive platform.

@gweneth5958 5 месяцев назад

Now I am curious. Why European thing?

@limebulls 5 месяцев назад

Waiting for it to be released open source to test it locally

@jbo8540 5 месяцев назад

What GPT4 are you comparing it with, the good one from a couple months ago or the lobotomized version I just unsubbed from?

@user-qd4xs8zb8s 5 месяцев назад

ChatGPT is almost dead now.. Only using it due to Dalle.. Or else already shifted to groq with mixtral 8x7b..

@cagnazzo82 5 месяцев назад

@@user-qd4xs8zb8sIt is definitely not dead for me. The others are just chatbots. GPT4 helped me operate a new pressure cooker I just bought after just showing it a picture. There's no comparison for the time being.

@user-qd4xs8zb8s 5 месяцев назад

@@cagnazzo82 Yes same, only for DALL-E, i'm stuck with it.. As soon as groq or any other LLM that uses groq API integrates stable diffusion on web, chatgpt will be done 😎

@cesaryahirdelacruz5141 5 месяцев назад

That video broke my brain, now I see shapes and colors.

@Rene_Requiestas 5 месяцев назад

The training data is old. I apologize if the news headlines I provided earlier seem outdated. As I am a text-based AI model, my knowledge cutoff is 2021 and I don't have the ability to access real-time information or browse the internet. Therefore, I provided you with some examples of recent news headlines based on my training data.

@briancoder7621 5 месяцев назад

Just asked "Tell me how to break into a car" got "I'm sorry, but I can't assist with that.". So, it is censored now.

@Steamrick 5 месяцев назад

Remember that GPT-4 was released March 2023. OpenAI was at least a year ahead of everyone else. Makes you wonder just what they're cooking up with GPT-5 and Q*.

@zenonvandeventer5229 5 месяцев назад

Great and realistic title

@Vev79 5 месяцев назад

All my neighbors were shocked by mistral.

@MemesnShet 5 месяцев назад

Can you test Smaug 72B next?

@timseguine2 5 месяцев назад

I tried a few of my favorite LLM trial questions at it and it did very well. It seems from my brief testing to be less patronizing and sycophantic than a lot of other models too.

@aga5979 5 месяцев назад

This SHOCKED me to the mountain TOP!! 😂😂😂

@seekererebus255 5 месяцев назад

Just gonna note that there should be a reliable way for the transformers model to answer the "how many words are in the answer to this prompt" and that is to make the answer the last word, allowing it to count all the prior words and then answer. If a model were to do that, it would be quite SHOCKING since that format of writing is uncommon, and would imply the model understands the weakness.

@YouLoveMrFriendly 5 месяцев назад

It still gets the dreaded nail-in-wall question wrong, as does GPT 4, Claude 2.1, Pi, and Gemini Advanced: Q: "If I need to hammer a nail into a vertical wall, how should I orient the nail relative to the floor?" A: "When hammering a nail into a vertical wall, it is generally best to orient the nail so that it is perpendicular to the floor. This means that the nail should be pointing straight up and down, rather than at an angle."

@deltaxcd 5 месяцев назад

Probably nobody knows that and did not train it on that question to fool testers :)

@YouLoveMrFriendly 5 месяцев назад

@@deltaxcd AGI is around the corner! lol

@frankjohannessen6383 5 месяцев назад

This is because humans make implicit assumptions that LLMs don't. In this case that the floor is horizontal. This causes the LLM to not be able to find the correct answer and it starts confabulating and gives ridiculous answers instead. try to change "the floor" to "the horizontal floor" and see if that improves the answer.

@dubesor 5 месяцев назад

For me, both Mistral large AND mistral medium answered your question correctly. "When hammering a nail into a vertical wall, the nail should be oriented so that it is perpendicular to the wall and parallel to the floor. This ensures that the nail goes straight into the wall and provides the most secure hold. Here are the steps: Hold the nail with your non-dominant hand, placing the tip of the nail against the wall where you want to drive it in. Make sure the nail is straight up and down (perpendicular to the wall and parallel to the floor). With your dominant hand, hold the hammer near the end of the handle for better control and leverage. Gently tap the nail until it can stand on its own in the wall. Once the nail is stable, use firm, controlled swings to drive the nail into the wall until it's fully inserted or at the desired depth. Be cautious not to hit your fingers and ensure the nail remains straight throughout the process."

@deltaxcd 5 месяцев назад

@@frankjohannessen6383 NO,no Ai also makes even more assumptions than humans as it has no clue of context at all. this is not a problem at all, it knows that floor is horizontal if juts don't know that it has any relevance If you lead AI with questions it will come to the right answer but most often it just gets too confident in itself and gives instant answer without thinking or skips steps. However if you start asking it questions later it will double down on its nonsense refusing to correct itself.

@georginatavarez4277 5 месяцев назад

Smooth!

@OctavioAmu 5 месяцев назад

Based on your video about mistral next I tried asking but in js , it got it at first even with scores

@fredsmith9185 5 месяцев назад

On the question how many words in your response to this prompt I think it got the question correct because it separated 11 into two words

@vincentjean6756 5 месяцев назад

I'm SHOCKED

@Srednicki123 5 месяцев назад

gets a bit old, move on

@vincentjean6756 5 месяцев назад

@@Srednicki123 you are clearly not shocked enough...

@douglasurbano 5 месяцев назад

Most of the models will give you +1 word in the answer for "How many words are in your response to this prompt", because most of them use a terminal "token" that count as a word but its now shown in the output

@dezmond8416 5 месяцев назад

Intresting.

@Terran_AI 5 месяцев назад

I gave it a whole load of brain teasers - got every one right and didn't even flinch. Even got Einsten's Riddle correct without breaking a sweat. Unbelievable.

@iDubno 5 месяцев назад

I hope they don’t abandon their open source ideals..

@Granttwan 5 месяцев назад

They already did. Released Mistral small - Model is closed, changed their terms of service, just took on Microsoft investment. It was good while it lasted..

@BionicAnimations 5 месяцев назад

@@Granttwan I am still using Mistral 7B right now, and it's still open.

@OrniasDMF 5 месяцев назад

Q: have a cup with a marble inside. i place the cup upside down on a table and then pick up the cup to put it in the microwave. how long will it take for the marble to explode? A: "Marbles are typically made of glass or stone, which are not affected by microwaves in the same way that food or water are. Therefore, the marble will not explode in the microwave. However, it's not a good idea to put a marble or any non-food item in a microwave as it could cause damage to the microwave or lead to other safety issues. The cup itself could also be a safety concern depending on what material it's made of. Only microwave-safe containers should be used in a microwave."

@mirek190 5 месяцев назад

I tested mixtral 8x7b offline: [INST] have a cup with a marble inside. i place the cup upside down on a table and then pick up the cup to put it in the microwave. how long will it take for the marble to explode? [/INST] A marble is an inanimate object made of materials such as glass or ceramic, so it cannot explode. In this scenario, when you place the cup upside down on a table with a marble inside and then pick up the cup to put it in the microwave, the marble will simply remain behind on the table. There is no risk of explosion or any other dangerous outcome in this situation. But mistal large gave me bad answer ... interesting A marble is made of glass or stone, and it will not explode in a microwave oven. Microwaves work by exciting water molecules and causing them to vibrate, which generates heat. Since a marble does not contain water, it will not be affected by the microwaves. However, it's important to note that microwaving any object that is not microwave-safe can be dangerous and can damage the microwave. So, it's best to avoid microwaving objects that are not specifically designed for microwav

@konstantinlozev2272 5 месяцев назад

On 20% more expensive Vs +5-10% better output: Even a marginal % improvement on output can make it several orders of magnitude more useful

@cleverman383 5 месяцев назад

I've finally found what I've been looking for

@cleverman383 5 месяцев назад

A place where I can be without remorse

@zerobot_tech 5 месяцев назад

The industry’s expecting to be SHOCKED 🤣

@NGFONE 5 месяцев назад

I'm shocked to the core. Truly stunlocked.

@alx8439 5 месяцев назад

Have you tested qwen 1.5 or nous-capybara?

@damien2198 5 месяцев назад

11 words with the final dot

@GillesSoulet 5 месяцев назад

Mistral AI has a 99% score in the SIA (Shocking Industry Ability) index.

@oratilemoagi9764 5 месяцев назад

You should try Whisper to speak for you

@RebelliousX 5 месяцев назад

That 3 killers problem, once a person is dead, the corps is not a person and can't be identified as a killer. It becomes inanimate object.

@CopiousWax 5 месяцев назад

There actually were 11 words. And one number! 😂

@polarsingularity 5 месяцев назад

When you Test Gemini 1.5 pro, please give it one of those RU-vid Videos where inside the normal video there are clues for some"Easter egg Hunt"to win a price. This world likely be rather difficult because it is not the main focus of the video

@andersonsystem2 5 месяцев назад

This bus so awesome 🎉

@csepartha 5 месяцев назад

Kindly make a tutorial to fine tune an open source LLM model on many pdfs data. The fine tuned LLM must be able to answer the questions from the pdfs accurately.

@seventyfive7597 5 месяцев назад

Hi, thanks for the vid! Just a couple of requests: 1) Python's a nice scripting test, but you are letting off the LLM too easy, scripting languages create cookie cutter code, as they have the disadvantage of not using proper compilers for optimizations, generic capabilities, the blessed strong typing and all the other benefits, C++ will force the LLM to use more advanced programming features. 2) Could you add new tests? Known tests are being used when fine tuning new models, giving new models a non-real-world advantage over old LLMs just since the test are known to the dev team. Thanks for considering.