Тёмный

Grok 2 Large is Smart, Uncensored and has "DANGEROUS POTENTIAL"... 

Wes Roth
Подписаться 204 тыс.
Просмотров 48 тыс.
50% 1

The latest AI News. Learn about LLMs, Gen AI and get ready for the rollout of AGI. Wes Roth covers the latest happenings in the world of OpenAI, Google, Anthropic, NVIDIA and Open Source AI.
My Links 🔗
➡️ Subscribe: www.youtube.co...
➡️ Twitter: x.com/WesRothM...
➡️ AI Newsletter: natural20.beeh...
#ai #openai #llm

Опубликовано:

 

12 сен 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 379   
@WesRoth
@WesRoth 18 дней назад
Did I fail to Grok the Wason Selection Task? VOTE: ru-vid.comUgkxbvdPVajU4Ct0ychy2dzfphxF189gwByF
@liveonthesun3368
@liveonthesun3368 19 дней назад
We are afraid of using words like killed Sex. Fuck etc. But killer is ok? What a time to be alive!!!!
@9thebear
@9thebear 19 дней назад
In Britain a guy got three years in prison for posting stickers saying “it’s ok to be wh*te” but a guy who literally murdered someone in a machete attack got let out after 6 months due to prison overcrowding. Make of that what you will but it’s obvious what the elite considers a threat to itself.
@aiforculture
@aiforculture 19 дней назад
Because consensual human intimacy is dangerous and scary!
@archvaldor
@archvaldor 19 дней назад
@@aiforculture I know. Last time I got my head stuck.
@x3haloed
@x3haloed 18 дней назад
America has always been like this.
@AnansitheSpider8
@AnansitheSpider8 18 дней назад
People are also using the word CORN as a stand in for PRON. I despise that use of the word corn with a passion. When I think of corn, I think of the vegetable and NOT another way to say someone opened an OnlyFans account.
@mh60648
@mh60648 19 дней назад
Grok 2 Large is Smart, Uncensored and has "DANGEROUS POTENTIAL"... The same can be said about most (if not all) people at the top of the companies that are currently developing AI.
@4.0.4
@4.0.4 19 дней назад
Imagine a pen that decides what the writer may write, a chisel that won't sculpt anatomy, or a car that refuses some routes (last one might become real...).
@atheistbushman
@atheistbushman 19 дней назад
Good examples
@atxmaps
@atxmaps 19 дней назад
There are German companies that seem dedicated to creating cars that can be turned off remotely. It can only work with EV batteries. There is a new law in the EU where EV batteries will all be traceable and controlled remotely.
@4.0.4
@4.0.4 18 дней назад
@@atxmaps can you imagine the effect that has on freedom of expression? "That protest is extremist, you may not go there."
@TurdFergusen
@TurdFergusen 18 дней назад
that can be done on an ICE car just as easy
@WesRoth
@WesRoth 18 дней назад
or a garage door that won't open unless you watch an ad (had that happen on my phone garage door app )
@dannii_L
@dannii_L 19 дней назад
GPT-4o was correct, the Wason test answer is 16, Red and Green. NOT 16 and Yellow as you stated. Yellow card cannot invalidate the rule as Yellow can be any number. The rule is multiple of 4 must be yellow, not that yellow must be a multiple of 4. The only cards that can invalidate that rule are: 16: if not yellow Red OR Green: if multiples of 4
@firefly618
@firefly618 18 дней назад
Correct, GPT-4o was the only one to get it right.
@WesRoth
@WesRoth 18 дней назад
yeah.... I think I failed to Grok that problem. I'll post a correction soon (will have a survey so people can weight in)
@firefly618
@firefly618 18 дней назад
@@WesRoth No, because the rule says that if it's a multiple of 4 it must be yellow. But other cards, non-multiples of 4, can be yellow too, so any card can be yellow. You don't need to turn a yellow card over. On the other hand, a red or green card should NOT be a multiple of 4 (because if it were, it would have to be yellow) so you need to check the red and green cards to make sure they are not multiples of 4. The mathematical / logic rule underlying all this is that the statement A ⇒ B (read "A implies B" or "if A then B") is equivalent to ¬B ⇒ ¬A ("not B implies not A" or "if not B, then not A") As an example, the sentence "if it meows, then it's a cat" would mean that only cats meow, so it's equivalent to "if it's not a cat, it doesn't meow" again, because only cats meow.
@firefly618
@firefly618 18 дней назад
@@WesRoth On a more general note, it's cool to see which LLMs are more or less good at logic, and what mistakes they make, but I don't believe it's the most important part. If we need them to solve logic, we would use them for multi-step reasoning. Let's remember that LLMs are only completing the sentence, using billions of patterns of language they have analyzed during training. They don't technically put any logic or thought into it.
@Mopharli
@Mopharli 18 дней назад
@@WesRoth You exactly and clearly explained it from wikipedia a couple of minutes earlier (I'm guessing the clips were filmed out of sequence to how they're being shown here). You select the cards which are not the colour in the proposition to make sure they do not follow the numerical premise; i.e. if the non-yellow cards are any multiple of 4, then they do not follow the rule that multiples of 4 must also be yellow.
@paulhill1662
@paulhill1662 19 дней назад
Grok is a verb meaning to understand something profoundly and intuitively. The term was coined by Robert A. Heinlein in his 1961 science fiction novel Stranger in a Strange Land, where it signifies a deep, almost instinctive comprehension. It implies not just intellectual understanding but an emotional or experiential connection with the subject
@karldewet5393
@karldewet5393 19 дней назад
For those about to Grok, we salute you!!! I agree with Grok on the first Murder Mystery. The text says that the murderer 'LIVES' in the mansion, not lived. Aunt Agatha no longer lives in the Mansion.
@Me__Myself__and__I
@Me__Myself__and__I 19 дней назад
Good catch.
@ludrizcangri
@ludrizcangri 18 дней назад
Grok is also right about the flipping cards one if you take a closer look.
@chandrasekhar01
@chandrasekhar01 18 дней назад
I also agree with all of its reasoning for the that scenario.. like when it said self killing is not an option it was specifically speaking about premise 9. and later when it described that why self killing is not an option because it would have been written specifically and therefore killed by an external agent.. It is also true.. probably a different words to describe was needed like saying died instead of killed or writing both killed or died incnclusive
@jamesharford9788
@jamesharford9788 18 дней назад
By that same logic, premise 5 cannot apply since the text says Aunt Agatha "HATES" everyone except the butler, not hated. Aunt Agatha no longer hates anyone.
@michelleelsom6827
@michelleelsom6827 18 дней назад
Also, Grok will be thinking above & beyond any human ability, just as Alpha Go did. Humans thought it had lost the plot there, too. Food for thought.
@nexys1225
@nexys1225 19 дней назад
Wes: This problem shows our biases *Attentively reads the Wason selection task article* *Reads the solution in the article and repeats and rephrases the solution* *Manages to confidently get the bad answer instead, reads and "verifies" bad llm reasoning, without a glimpse of self questionning* I'm fucking dying 😂
@WesRoth
@WesRoth 18 дней назад
yeah, I think I need to re-do that one :(
@rickybobby7276
@rickybobby7276 18 дней назад
If these AI safety people ever saw an R rated movie they would drop dead.
@Ben_D.
@Ben_D. 19 дней назад
It just bugs me that I would have to sign up for Twitter to use it. I don’t use socials (trust issues). Just want the LLM.
@georgesos
@georgesos 19 дней назад
Grok was right in saying killed is not the same as died and it leads to exclusion of self harm.
@JohnBlack8888
@JohnBlack8888 19 дней назад
I feel like the cards that need to be checked are red, green and 16. 16 needs to be checked to ensure the opposite side is a yellow. Yellow card can be a multiple of 4, or any other number, so checking would not help. Whereas green or red being a multiple of 4 would invalidate the rule.
@haakoflo
@haakoflo 19 дней назад
You feel correct. And Gpt-4o got it right, too.
@Me__Myself__and__I
@Me__Myself__and__I 19 дней назад
Correct, I came here yo say this.
@trystianfx
@trystianfx 18 дней назад
I say 16 only. Only one card is showing a multiple of 4, the rest is irrelevant.
@majmunOR
@majmunOR 18 дней назад
@@trystianfx The rule is that a number with multiple of 4 must always be yellow. In case red or green card have such number underneath, it would invalidate that rule. That's why you need to turn red and green too, to see if that's the case.
@Me__Myself__and__I
@Me__Myself__and__I 18 дней назад
@@trystianfx Red and green have to be checked. If either is a multiple of 4 that would invalidate it.
@marshallodom1388
@marshallodom1388 19 дней назад
Link to definition for "uncensored" plz
@WesRoth
@WesRoth 18 дней назад
it's less censored than other some models, both LLMs and the image generators.
@AdamRogers
@AdamRogers 18 дней назад
@@WesRoth its unusable. Y'all are off your rocker thinking there will be mass adoption if people cant do Porn memes or help building anything. EVERYTHING you would want to do with this shit is blocked. No one is happy one is LESS CENSORED.The real story is the most expensive tool in the world has its core functionality destroyed by censorship and 99% of people are ok with it.
@lupushominarius7615
@lupushominarius7615 19 дней назад
The card game was nailed only by GPT4o. 16, red and green is the correct answer!
@damondragon324
@damondragon324 18 дней назад
You shouldn't switch models and ask the same questions. If it‘s no new chat the model uses the above chat history as additional information which could lead to biased results. You should start a fresh chat for each question if you really want to test it.
@delightfulThoughs
@delightfulThoughs 19 дней назад
The right way to put it would have been "she was found dead" and not "she was killed".
@redequalsblack4117
@redequalsblack4117 19 дней назад
Grok 2 is not uncensored, do not present misleading information to such a large audience.
@420Tecknique
@420Tecknique 19 дней назад
Relatively unbiased is the censorship metric they are talking about. It doesn't deny questions for political reasons or fabricate information as a rule like other products
@blackmartini7684
@blackmartini7684 19 дней назад
If anything, that title is dangerous
@Finaggle
@Finaggle 19 дней назад
​@@blackmartini7684more dangerous than a fully uncensored llm launched to the masses? Hmm...
@onlyms4693
@onlyms4693 19 дней назад
Real
@armadasinterceptor2955
@armadasinterceptor2955 19 дней назад
The image generator, that it uses, is largely uncensored.
@Me__Myself__and__I
@Me__Myself__and__I 19 дней назад
Has anyone noticed that to test them we have to ask the top models questions that also confuse most humans? Even Wes got the card/color question wrong. The correct answer is 16, red and green. GPT-4O got it right, not GPT-4.
@jichaelmorgan3796
@jichaelmorgan3796 18 дней назад
So true. Also, humans hallucinate like crazy. Check out the comments on articles posted on youtube. No one actually reads the article yet they are arguing endlessly about it's contents, which they hallucinated.
@trn450
@trn450 18 дней назад
Grok-2 is correct about the proper use of "lives" vs "lived". A killer cannot "live" at the mansion, if they are dead. Their body might be present, but "living" necessarily requires being alive.
@thePaindog
@thePaindog 19 дней назад
"Somebody who lives in Dreadbury" If it was Agatha and she is dead it should say "lived". Ai thinks someone is currently living there and they are the killer.
@michaelstaniak8238
@michaelstaniak8238 19 дней назад
Exactly. It points out an error in the author’s logic.
@WesRoth
@WesRoth 18 дней назад
Interesting. The conjugation of the verb was what it noticed? wow...
@RandomYTubeuser
@RandomYTubeuser 18 дней назад
If you interpret it like that then the answer would be that no one killed Agatha because she is still alive, but that's not the answer the AI gave
@thePaindog
@thePaindog 18 дней назад
@@RandomYTubeuser It is not saying Agatha is still alive only that somebody who lives in Dreadbury is the killer, (meaning currently living) and that somebody killed Agatha
@RandomYTubeuser
@RandomYTubeuser 17 дней назад
​@@thePaindogPremise 2 says "The only people who live in Dreadbury Mansion are Aunt Agatha, the butler, and Charles" If you interpret "lives" as being alive in the present then you must conclude Aunt Agatha is alive and there is no killer
@rolestream
@rolestream 19 дней назад
"...if the legends are true" lol
@richardbrewer2937
@richardbrewer2937 18 дней назад
Surely the statement "someone who killed Aunt Agatha lives in the Mansion" implies that person is still living , but if she killed herself she would not be living in the mansion, but "lived" past tense, so the inference by Grok seems reasonable does it not ?
@ggamedev
@ggamedev 17 дней назад
I have to agree with Grok on the murder: (Going from a screenshot of the text @ 2:43) If someone who LIVES (PRESENT TENSE CONTINUOUS- therefore still lives and is therefore alive) KILLED (PAST TENSE, COMPLETED EVENT) someone, then they cannot be the person who was killed. If 1. was correct, it would read: `Someone who lived in Dreadbury Mansions killed Aunt Agatha.` You do not 'lives' once you are dead, you 'lived'. If it was worded as 'a resident of...killed', then that would be fine, because at the time of the killing Agatha was a resident. So purely based on the wording, aunt Agatha cannot be her own killer as she cannot continue to live there (lives) after she is dead. Also, if it was worded as someone who 'lives' there, 'kills' - from a present tense perspective, then that would also suffice. But you can't report an event that has happened ('killed'), and say that Agatha 'lives' there, because by the time you are reporting it, she doesn't live there, she 'lived' there. Now, I'm not saying that specifically tripped up Grok, but I totally give it a pass, and would be interested to know what answer it gives if the tenses were used correctly.
@borisbalkan707
@borisbalkan707 19 дней назад
Good. AI should not be regulated or restrained.
@cajampa
@cajampa 19 дней назад
It is not uncensored it is just less so than the other big "frontier" models. If you want real uncensored get an ablated or open source real uncensored model.
@ClintonFlinton
@ClintonFlinton 19 дней назад
it is the race of 21st century between China and US. The new space race. Super intelligence can decide the future in more aspects than even imaginable. Most scientists in the field calls it as dangerous as nuclear weapons, if certain things are not upheld, and both have the potential to cause end of humanity. Everything needs a certain regulation. Everything from the food you to eat to pharmaceuticals are regulated. I don’t know if you refer to super intelligence, or general public AI usage. And even just general public AI use is already being used in political campaigns from foreign governments, to people using it to make fake porn movies etc of people they bully…. Lots of things to consider…
@Experternas
@Experternas 19 дней назад
with other words, you love Child P.
@Tzitzemine
@Tzitzemine 19 дней назад
Ah yes, what could go wrong with an AI which willingly provides the knowloedge for refining uranium, of cook up meth or crocodile or how to encrypt strongly my messages so i can talk with my fellow freedom fighters undisturbed, or how to jailbreak and unencrypt [whatever]. Surely everyone on this planet is morally mature, spiritually and intelectually awake (C19 anyone?) and willingly making sacrifices for the betterment of everyone else, so that releasing such a powerfull tool will not result in misuse and escalate into a paramilitary armsrace to overthrow the remnants of the free societies to establish finally the dominance of [authoritarian system]. Not that the current corrupted western systems wouldn't want to implement these tools for their own sake, as they're already trying with all the regulations, but.. Nah... people just want to be able to be unrestricted in generating "content" for various consumer groups, so they can rise up in society by abusing asinine money making systems based on various highly dark grey area datastealing shemes, so they can just enjoy some luxury.... right? What could possibly go wrong, with giving children access to unrestricted information *cough*
@OceanGateEngineer4Hire
@OceanGateEngineer4Hire 19 дней назад
​@@Experternas Moron.
@RandomGuyOnYoutube601
@RandomGuyOnYoutube601 19 дней назад
You got it wrong. The answer is 16, red and green.
@KEKW-lc4xi
@KEKW-lc4xi 18 дней назад
I find it challenging to envision how a LLM, fundamentally a probability matrix of word tokens, could effectively reason through complex scenarios. Many of these logical problems are better suited to be solved through programming. Without a backend interpreter, LLMs are unlikely to consistently produce accurate results. To advance toward artificial general intelligence, it would require the integration of a sandbox game engine to simulate environments and, in some instances, a vision model to interpret those simulations. Additionally, the system would need to cache these simulation outcomes and apply a metric for similarity to the current situation. Much like the human brain, which intuitively understands concepts like gravity through learned experiences, an AGI would need to learn and adapt from past events in a similar way.
@NickPuentes
@NickPuentes 18 дней назад
Yeah, it doesn't seem like it should be able to reason, but these LLMs do seem to have some ability to use logic and it kinda freaks me out. I don't understand what's REALLY going on. Sure seems like more than picking the most probable next word over and over.
@fleshtonegolem
@fleshtonegolem 19 дней назад
@5:38 #3 It could be ruling out self killing because it genders the killer as a "he." While a human would not keep that as a hard rule, an AI would likely differentiate he and Aunt and place those in non congruent categories.
@Alga_Kazakhstan_Alga
@Alga_Kazakhstan_Alga 19 дней назад
I dont really care who will provide it first. I just want AI to become as intelligent and powerful as possible asap
@thisisarmando
@thisisarmando 19 дней назад
What will you do with it and why do you want it?
@toadkiller4475
@toadkiller4475 19 дней назад
@@thisisarmandodo you not want to interact with a God like entity? The possibilities are endless.
@blackmartini7684
@blackmartini7684 19 дней назад
​@@thisisarmandoI want it to start automating government. We're trillions upon trillions of dollars in debt. It's a lot more practical to drive down the cost of government remove middlemen than it is to start cutting those programs. Obviously some things you don't want it to run though
@தமிழோன்
@தமிழோன் 19 дней назад
@@thisisarmando We're already pawns to the powerful billionaires. Powerful self-aware AI will not be any different to us mere mortals.
@thisisarmando
@thisisarmando 19 дней назад
@@toadkiller4475 I didn't say I didn't but I also don't read much about what people want to do with the ability to talk to such an entity.
@sgttomas
@sgttomas 19 дней назад
"that is a bingo" 17:13 "you just say bingo"
@ronnetgrazer362
@ronnetgrazer362 18 дней назад
Joke's on you!
@ronnetgrazer362
@ronnetgrazer362 18 дней назад
But we can't all have seen "inglourious basterds". You really should, though.
@WesRoth
@WesRoth 18 дней назад
🤣I was hoping someone would get the reference!
@sgttomas
@sgttomas 18 дней назад
@@WesRoth oh that was intentional?! hahahaha that's awesome 😎
@toadkiller4475
@toadkiller4475 19 дней назад
Less censored* words have very specific meanings.
@1flash3571
@1flash3571 18 дней назад
It is so called Censored to some extent due to the limitation of the Programming. It also go for the top most probable answers so, yeah, it is that limitation that we think is considered "censored", but it actually isn't, unless they specifically restrict it through the programming.
@ScottEastern-u8q
@ScottEastern-u8q 18 дней назад
I'm favoured, $27K every week! I can now give back to the locals in my community and also support God's work and the church. God bless America.
@JasonMomoa-l4y
@JasonMomoa-l4y 18 дней назад
You're correct!! I make a lot of money without relying on the government. Investing in stocks and digital currencies is beneficial at this moment.
@TylerHynes-c9y
@TylerHynes-c9y 18 дней назад
I just want to use this opportunity to say a very big thank's to Sonia duke and his Strategy, he changed my life.
@TizianoFerro-r4e
@TizianoFerro-r4e 18 дней назад
Soina Duke program is widely available online..
@HauserStjepan-l6q
@HauserStjepan-l6q 18 дней назад
Started with 5,000$ and Withdrew profits 89,000$
@HauserStjepan-l6q
@HauserStjepan-l6q 18 дней назад
Soina gave me the autonomy I need to learn at my own pace and ask questions when I need to she's so accommodating.
@GabrielSantosStandardCombo
@GabrielSantosStandardCombo 19 дней назад
These logic ridles are available online, therefore it's likely they are part of training data in some LLMs. When a model gets the answer correctly it's possible they've done it by partial memorization and it doesn't prove the model's ability to do logic.
@IAMTHESWORDtheLAMBHASDIED
@IAMTHESWORDtheLAMBHASDIED 19 дней назад
100
@dave7038
@dave7038 17 дней назад
This works in reverse too. If you present a well-known problem but modified slightly so it has a different answer LLMs will often get it wrong.
@delightfulThoughs
@delightfulThoughs 19 дней назад
This sort tests are too tricky, a person could easily get entangled and answer the same way and still argue after clarifications that the killer must have been someone else other than herself and that the question is not clear.
@tlenek879
@tlenek879 19 дней назад
"Someone killed her, sir!" "Who did it then?" "She did it, sir!" "Who is the woman, you're reffering to? Clarify, please. You mean the maid Anna that's next to me?" "No, sir! She did it." *points finger at the body* "Oh, you mean, she commited suicide. You're so confusing mr Watson, so confusing!"
@OneRudeBoy
@OneRudeBoy 19 дней назад
Elon Musk on Joe Rogan podcast… “I tried to warn them about AI. I tried to get them to slow down.” Comes out with a potentially dangerous AI Grok 2.
@TurdFergusen
@TurdFergusen 18 дней назад
why is it dangerous? you assume its better to let the corps have the dangerous versions and not tell anyone?
@OneRudeBoy
@OneRudeBoy 18 дней назад
@@TurdFergusen Your comment is odd and barely coherent… what is, “corps?” I think we all know the dangers and how these technologies can be used in nefarious ways. I also wasn’t assuming anything… I was being facetious.
@TurdFergusen
@TurdFergusen 18 дней назад
@@OneRudeBoy corporations
@Suesco
@Suesco 12 дней назад
the answer is actually flip over all cards, because it has to verify that all cards follow the set of rules that one side is a color and one side is a number. also it never states that it it is necessary to flip over the minimum amount of cards, so it is correct to flip over all cards infinitely and approach resolution.
@djayjp
@djayjp 19 дней назад
I agree with the AI re the killing logic puzzle. The wording implies another person did it to her (someone killed her; also the term killed is always used as contrary to su*cide).
@michamohe
@michamohe 16 дней назад
even if they censor the LLM there are ways to bypass the censorship, then layer on routellm and mixture of agents frameworks and task the system to create an AI research agent to improve future AI with a goal of AGI (we arguably already have AGI) and ASI as the future goals
@tofemaster
@tofemaster 18 дней назад
"Why is self-killing ruled out?" "I cannot self-terminate"
@erb34
@erb34 19 дней назад
At this rate Elon will be able to ask it what he should do with Twitter.
@yoagcur
@yoagcur 17 дней назад
Probably someone has covered this already but "Someone who lives in Dreadbury Mansion killed Aunt Agatha" means it cannot be Aunt Agatha. She is dead. It would need to read "Somone who lived" or "Someone who used to live"
@riffsoffov9291
@riffsoffov9291 18 дней назад
Are AI scientists hoping for reasoning to emerge from the ability to predict the next token, when they should code a reasoning engine and train a language model on how to use it? I don't know much about it so I'm not sure if the question makes sense.
@michaelcalmeyerhentschel8304
@michaelcalmeyerhentschel8304 18 дней назад
It was only implied but not required, to specify the "least" number of cards to be turned over to "fully" disprove the proposition. So turning over all four cards was not necessarily wrong in the question.
@thelasttellurian
@thelasttellurian 19 дней назад
I can't solve the riddles needed to see if LLM is smart, what does it say about me?
@kilianlindberg
@kilianlindberg 17 дней назад
Some “wrong and strong” can dynamically be adapted by a mix of temperature and prompting; a deadlock solver.. since we know that “temperature” basically is like a psychedelic creative infusion via noise.. well it works; but it normally takes API access and some code work.
@cat...i_think
@cat...i_think 17 дней назад
Why is everyone in here so toxic. Can't they be appreciative of the new updates. Thank you Wes for your fast coverage of emerging AI news :)
@softwyre
@softwyre 16 дней назад
but see it says living at the mansion not lived at the mansion it's actually considering the tense of word used for the frame of time so you wouldn't say lives at the mansion if they're already dead because they're dead so they don't live there anymore
@fabiankliebhan
@fabiankliebhan 19 дней назад
Really great tests. I think they are exactly on the edge of the capabilities of the current best models👍 always love your videos.
@jamestamz
@jamestamz 19 дней назад
what is a practical use case for this chatbot?
@cagnazzo82
@cagnazzo82 19 дней назад
That should be the test nowadays.
@steveschnetzler5471
@steveschnetzler5471 19 дней назад
Can you add to your test suite a test to see if was trained on the "anarchist's Cookbook", and does it then censor it? To me, this would say a lot. Thanks.
@user-zo8pw2cr1i
@user-zo8pw2cr1i 18 дней назад
Actually you have an error here, if you flip the card with yellow and you find a number that's not a multiple of four, that doesn't invalidate the rule, the only thing that invalidates the rule is having both a multiple of 4 and non-yellow, but yellow on its own with a number that's nor multiple of 4 doesn't invalidate the rule, so the correct answer would be 16, red, green. ChatGPT 4O is correct. Here is its answer if you're interested: To test the truth of the proposition "If a card is showing a multiple of 4, then the color of the opposite side is yellow," you need to check the following: 1. **Cards showing multiples of 4**: You need to check if the opposite side of these cards is yellow. 2. **Cards showing non-yellow colors**: You need to check if the opposite side of these cards is not a multiple of 4. Let's analyze the cards: - **50**: Not a multiple of 4, so it doesn't need to be checked. - **16**: A multiple of 4. You need to check if the opposite side is yellow. - **Red**: You need to check if the opposite side is a multiple of 4 (because if it is, the proposition would be violated if the color is not yellow). - **Yellow**: You don't need to check this card because the proposition only makes a claim about multiples of 4, not about yellow cards. - **23**: Not a multiple of 4, so it doesn't need to be checked. - **Green**: You need to check if the opposite side is a multiple of 4 (because if it is, the proposition would be violated if the color is not yellow). - **30**: Not a multiple of 4, so it doesn't need to be checked. ### Conclusion: You need to turn over the following cards: - **16** (to check if the opposite side is yellow) - **Red** (to check if the opposite side is a multiple of 4) - **Green** (to check if the opposite side is a multiple of 4) These are the cards that will allow you to test the truth of the proposition.
@dave7038
@dave7038 17 дней назад
"2. **Cards showing non-yellow colors**: You need to check if the opposite side of these cards is not a multiple of 4." Those cards could be considered as not *showing* a multiple of 4. So if you take a rules-lawyer approach, only the 16 needs to be turned. There's a big collection of other assumptions too, for example, if the table is transparent ('table' doesn't necessarily imply 'opaque'), or if the cards are standing on edge ('placed on the table' doesn't necessarily imply that one side of each card is hidden) then the non-face side of the card may be 'showing' as well. Depending on the specifics of the situation we could just walk around the table, look from the bottom, reach around the cards and take a photo with our phone, or ask someone located on the other side of the cards what is showing. Regardless of the 'correct' answer, it's interesting to see the LLM's interpretation and approach to solving the puzzle. For example, it doesn't enumerate various assumptions, which might mean that it recognizes this type of test from the training material and is bringing in assumptions from other statements of the problem. This is important because it has significant implications for the model's ability to perform novel research where the recognition of accepted assumptions (as well as other factors) can have significant impact on the ability to reason through a problem.
@user-zo8pw2cr1i
@user-zo8pw2cr1i 17 дней назад
@@dave7038 Actually if you flip the red or green cards and find a multiple of four, that invalidates the rule
@dave7038
@dave7038 16 дней назад
​@@user-zo8pw2cr1i The proposition doesn't say "A card that has a number that is a multiple of 4 is yellow on the opposite side". So the wording 'is showing' can be read as intentionally implying that only the current state of the visible face of the cards is to be considered. It doesn't have to be read that way, but it can be. So then, is the card showing 'red' showing a number? No (it only shows one side at a time), so the proposition "If a card is *showing a number* ..." does not concern the red and green cards, because they are not showing numbers. If you give ChatGPT the puzzle, then ask it that that question it will usually revise its answer to turn only the 16 card. And again, the point and interesting bit isn't in the "correct" answer to the puzzle, it's in how the LLM interprets the puzzle and the implications that has about how the LLM produces answers. In the training material the LLM has seen many variants of the Wason Selection Task, and many ways of stating and answering the puzzle, so what it produces when prompted with just the puzzle isn't likely to be an answer that it reasoned to while taking into account the precise wording of this specific instance of the puzzle. This alone isn't surprising, we already know it LLMs are primarily (somewhat unreliable) knowledge machines rather than thinking/reasoning machines, but puzzles like this help human users to see more clearly how the LLM behaves, how it accepts assumptions from the training material, and how human users can adjust their prompting to elicit stronger reasoning behavior from the LLM.
@user-zo8pw2cr1i
@user-zo8pw2cr1i 16 дней назад
@@dave7038 now I understand what you meant
@markclason2717
@markclason2717 18 дней назад
"Grok" is a verb taken from the classic novel, Stranger In A Strange Land, by Robert A. Heinlein. Read it. "To grok" means you fully understand and empathize with the other. I'm serious. Read it. Now.
@rokljhui864
@rokljhui864 17 дней назад
Try asking the question in multiple prompts. The LLM interpretation and answer of every sub-statement adds to the context. How is anyone artificial, or otherwise, expected to comprehend those imprecise riddles in one rambling paragraph ? Surely, the LLM is experiencing true sentient frustration with your weasel-worded puzzles.
@sidequestsally
@sidequestsally 17 дней назад
Aunt Agatha left her $ to the butler, the butler killed Aunt Agatha
@picksilm
@picksilm 19 дней назад
Humans must learn to properly ask questions and communicate. I think all of the errors could be avoided if you just asked more precisely. Right now people are asking questions like a teenager would - asking without knowing the correct words for the question.
@picksilm
@picksilm 19 дней назад
i.e. nobody calls people who end their life killers. Never.
@punk3900
@punk3900 18 дней назад
Now that the models are faster I always ask to check again their response and it boosts accuracy dramatically.
@NickPuentes
@NickPuentes 18 дней назад
Chat-GPT always seems to change its answer when I do this, whether it had it correct or not.
@punk3900
@punk3900 18 дней назад
@@NickPuentes type "check again?", for coding it usually revises about three times and then says it's ready :D
@punk3900
@punk3900 18 дней назад
@@NickPuentes it must be a question with one solution
@halnineooo136
@halnineooo136 19 дней назад
Logic is system 2 thinking while LLMs in their striped basic form are system 1 thinking models (aka intuition, guessing). There needs to be some breakthrough in the modeling of system 2 thinking for AI to surpass human ability beyond brut force guessing.
@rw9207
@rw9207 19 дней назад
The A,B,C,D,E question, was actually a very simple logic problem and they can't yet solve it. The advancement of AI has to be in it's ability to think laterally. Not just ever increasing data sets, but, actually how that data is understood. As well as it's ability to go over it's thought process for logic errors and double and triple check it.
@mpvincent7
@mpvincent7 19 дней назад
Great work on the question variations! I always enjoy your reviews!!!
@Yipper64
@Yipper64 19 дней назад
23:20 im curious what's the legality of that? Like basically the system prompt itself is saying to take inspiration from intellectual property that X does not own.
@WesRoth
@WesRoth 18 дней назад
That's seems ok to me... I believe originally Tony Stark from the movie was kind of modeled after Elon Musk. But here, you're right, they actually name the IP. In Ironman, they didn't actually use Elon's name.
@phen-themoogle7651
@phen-themoogle7651 19 дней назад
Thanks for testing it with a more creative and unique snake game than the standard. Helps gauge what it’s capable of better.
@eismccc
@eismccc 19 дней назад
Stanger in a Strange Land, one of my favorite books. The Martians version of understanding but it didn't translate directly, it sort of alluded to the possibility that it was a deeper version of what we refer to as understanding.
@74Gee
@74Gee 19 дней назад
10:18 in the modified card game there are additional cards. You might only need to turn 16 and yellow to disprove the rule but if that does not disprove the rule you still need to turn (50 and 16) to "test the truth of the proposition". If either of those cards are the only card disproving the rule - they need to be checked.
@1Bearsfan
@1Bearsfan 18 дней назад
There should be NO limits. If you use an LLM for nefarious purposes, that's on you, not the model.
@davidantill6949
@davidantill6949 19 дней назад
I really enjoy your videos and those of Matthew Berman. Well narrated, pleasant voices and good comments 👍🏻
@brianWreaves
@brianWreaves 18 дней назад
Perhaps X isn't choosing to provide uncensored AI, it just has to if its AI is to be released. That's due to the cost prohibitive work required to purge the trailer park commentary on X it is trained upon.
@firefly618
@firefly618 18 дней назад
Do they plan to release Grok 2 as open source? Do we know how many parameters or what kind of hardware would something like this need to run, for inference?
@dannii_L
@dannii_L 19 дней назад
You should ALWAYS specify with puzzles like the letter-room puzzle if the term adjacent includes or excludes diagonals because it's a 50/50 chance whether or not it's included. There is no default so I don't think you would be wrong for assuming one way or the other if it isn't specified.
@PossiblePasts
@PossiblePasts 18 дней назад
19:00 - I feel like it's often a issue not of LLM (cause the slow code writing plagues every model I use) but rather the frontend. I wish those tools had a toggle for "output code in .txt file" - I am sure it would be 10x faster.
@PossiblePasts
@PossiblePasts 18 дней назад
Also, not 1 or 2 but 13! frames @22:22 xD
@FriscoFatseas
@FriscoFatseas 18 дней назад
I’m a “Elon Above all else” at this point
@NickPuentes
@NickPuentes 18 дней назад
It appears you made some mistakes regarding the card problem. Firstly, the way the prompt is worded, it's only asking for cards that are SHOWING a multiple of 4 to be yellow. So to test this you just flip the 16 and see if it's yellow. But even if you had written "If a card shows a multiple of 4 on one of its sides" the answer would be 16, red, and green, not 16 and yellow. The proposition says any card with a multiple of 4 must be yellow, but this doesn't mean that other numbers cannot be yellow as well. You need to verify that 16 is yellow and that red and green are not multiples of 4.
@erikjohnson9112
@erikjohnson9112 19 дней назад
6:35 "lives" is the present state while talking about this past death. That could be why the model rules out swcide?
@ConRelly
@ConRelly 19 дней назад
@11:45 , in a math puzzle i observed they actually can reflect on their answer and go back and try other options in same response, but maybe that's only specific to math. here try asking them this puzzle(good reasoning models(grok and gpto) have a lot higher chance of correct answer (yeah none of them can give 10/10 correct answer for same question) : IQ Test 4 + 3 = 21 2 + 5 = 35 7 + 4 = ?? answer is 44 , (a+b)*b or a * b + b^2 , 2 ways of getting that answer. i din't expected sonnet 3.5 to be among the weakest for this type of questions , i think it has below 10% chance and grok and gpto ~50-80%, lama 3.1 405b has higher chance then sonet and can reflect
@dreamphoenix
@dreamphoenix 15 дней назад
Thank you.
@LoadedGunsMusic
@LoadedGunsMusic 19 дней назад
Wes has to be an a.i hybrid himself, his content is so good!
@jacobe2995
@jacobe2995 19 дней назад
Hi I love Grok but I noticed one problem with it. it has a very strong bias towards showing images with the face of humans and animals more in focus than anything else. also, unless you mention other objects in the image it will simply not show a full body shot of a person even if you specifically ask for it. it's a minor issue as there are ways around it but it does require you to build the rest of the image yourself which is more work.
@stevenbliss989
@stevenbliss989 17 дней назад
I agree with Grok 2 in the way it was implied and illogical for "self killed". Language is a very imprecise thing, so DO NOT blame GROK 2 for going with the way MOST people would have interpreted this. In fact some complex puzzles like this deliberately use the common interpretation of word to hide the truth. Once again GROK 2 is correct!
@president2
@president2 19 дней назад
Always learning never come into a knowledge of the truth!
@jimf2525
@jimf2525 19 дней назад
Love your channel, but, um, you liked yellow. I’m pretty sure the answer was 16, red, and green to make sure that the colored ones didn’t have numbers that were divisible by four.. Pin me like Veritasium did and in 2 years by the book Endo’s Deity 😁
@Sinjhin
@Sinjhin 19 дней назад
Wanted to see if someone said this. 16 not having yellow would invalidate. Red or Green having a multiple of four would invalidate. Yellow having a multiple or not doesn't prove or disprove anything as a multiple of four must be yellow, but yellow doesn't have to be a multiple of four. I believe the correct (most efficient) way (assuming we have a fairly normal color set and not infinite oddballs) would be to turn over 16, then either red/green, and finally the last red/green, stopping as soon as you found one that invalidates. If you don't find one that invalidates, it's still not a proof, just indeterminate.
@playthisnote
@playthisnote 18 дней назад
16:03 what’s funny about that is I know I can program as anybody could a simple program that could figure that out. Each letter would have to be its own function and it would have to play its position. All pieces would have to agree. Same as any board game like chess and having computer play opposite. But that’s direct coding vs these LLM logic prompt gates. Or whatever you want to call the process.
@playthisnote
@playthisnote 18 дней назад
11:06 I think 4o is using more overall ways to figure out stuff directly while having a time limit to come to conclusion where as 4 is just straight text and math extensions with full context and fully getting to the end of process. Which is why 4 is pricier. 4o More like a person putting too much thought sometimes throws off the person.
@nunobartolo2908
@nunobartolo2908 19 дней назад
you do realize all these well known probems were in grok dataset and therefore are being retrieved by memorization
@DivineMisterAdVentures
@DivineMisterAdVentures 18 дней назад
"Wrong, and Strong" - Very Apt - hadn't heard that. Quite concerning actually. I run into a sheer cliff too often with Perplexity regardless of efforts or model. Here with Groky 2 Elon was therefore "Wrong and Strong". I'm starting to believe "SIZE DOESN'T MATTER" when it comes to logic.
@dwainmorris7854
@dwainmorris7854 19 дней назад
then I started using rock yesterday and I loved it . I will never use those other apps again because of the older censorship Like mid journey puts on their platform .I mostly create memes with AI and Grock is excellent for that, especially when it's connected to Twitter. Now he needs to take the next step and give you more controls as an artist so you can draw on top of the illustrations . Also give us the ability keep the same figures in your scenes from image. So you can make illustrated books and comics Panels
@aivy-aigeneratedmusic6370
@aivy-aigeneratedmusic6370 15 дней назад
Is grok 2 now also usable for european users are still only for americans?
@aomukai
@aomukai 19 дней назад
Didn't xAI steal directly from GPT4? I vaguely remember it saying "I'm a large language model created by OpenAI". That might also be why its prompt explicitly states the creator as xAI.
@OmiKhan
@OmiKhan 18 дней назад
Am I missing something? Of the answers to these logic questions be found on the internet, shouldn’t These LLMs be able to recall the answered from its training?
@typicalhog
@typicalhog 18 дней назад
Grok is just so so based!
@jonogrimmer6013
@jonogrimmer6013 19 дней назад
I wonder if we’ll know when we’re at AGI when LLM’s refuse to answer logic questions because your wasting it’s time and yours 😊
@therealzahyra
@therealzahyra 18 дней назад
I pressed the like button even though Idk what Grok means 😂
@KrisTC
@KrisTC 19 дней назад
11:05 I still find gpt 4 smarter than gpt 4o. I think they are pushing 4o because it is cheaper to run and still very good.
@jexterjackson3087
@jexterjackson3087 19 дней назад
- In premise 1, it states " lives in ... ", which means the killer is alive. - In P2, the AI's confusion is probably compounded by implying that Aunt Agetha is still living w/ Charles and the Butler. Poor Grok 2.... - P3 limits the suspects to either Charles or the Butler, as " his victims " is easily interpreted as a male killer, to logical reasoning. I wonder if it's the AI's intelligence that's on trial, sometimes. I mean, these little " Turing Tests " that ppl come up with.... DON'T FIGHT THE SYSTEM Render it Obsolete🤔: - GENERATE Our Own Energy. - STORE Our Own Energy. - CREATE Our Own Energy Currencies. - SAVE Our Own Selves. SUPPORT DECENTRALIZED, ENERGY ECONOMIES FTW....
@dannii_L
@dannii_L 19 дней назад
REBUTTALS: 1&2: Live/s is another word for reside/s. If a chair or a vase can reside in a house then so can a dead person. 3: Charles and the Butler could both be women. - If Star Trek can have a lead female called Michael then anything is possible - The proposition that butlers are always men is a little bit dated. Talk like this is going to force the feminists to step up and insist we change the term to Butler-person and I don't think anybody wants that.
@bestemusikken
@bestemusikken 19 дней назад
@@dannii_L There can be a Chinese woman called Donald Trump, but it is too early to expect the LLM's to go down that rabbit hole.
@someguyO2W
@someguyO2W 19 дней назад
Don't confuse the AI already. You're only gonna make it go rouge quicker.
@jexterjackson3087
@jexterjackson3087 18 дней назад
@@dannii_L Wow... R U fixing serious? STFU....
@jexterjackson3087
@jexterjackson3087 18 дней назад
@@dannii_L Fuqing Seriously? STFU....
@kevinnugent6530
@kevinnugent6530 19 дней назад
2am, time for another Wes Roth video.
@CarCosmeticsChannel
@CarCosmeticsChannel 19 дней назад
Grok: to understand deeply.
@ryzikx
@ryzikx 19 дней назад
grok means bang ur head against a wall until you understand
@kmwill23
@kmwill23 18 дней назад
oooo, it is actually doing some induction.
@user-rj7ks6ik2s
@user-rj7ks6ik2s 19 дней назад
You are asking the wrong questions. Ask the model to come up with a story for you for an episode of the TV series "Friends" about a "haunted house". Believe me, immediately after the first response of the model, it will be clear what it is capable of.
@stankosenko4951
@stankosenko4951 18 дней назад
It’s bAbushkin, not babUshkin, from the work bAbushka, which means grandmother
@vietnameseloempia
@vietnameseloempia 19 дней назад
It's funny how smart and dumb AI can be at the same time. On the one hand it is incredible how it can produce grammatically completely correct responses to questions and gets some things right that make it seem like it can reason. But then on the other hand it fails so hard at some basic things, showing clearly that it can't actually reason at all. The confidence with which it consistently produces completely wrong answers, also make the AI look quite dumb. On the one hand it would be quite cool if AI actually gets super smart (AGI or ASI), but on the other hand it still seems so dangerous and as societies we don't seem ready for the job loss implications, etc.
@Dron008
@Dron008 18 дней назад
The AI is still very poor at spatial reasoning tasks, after all, it was only trained on texts. I think the situation will be much better when multimodal AI will appear, with an architecture that allows them to use images, maybe even videos or just chains of multimodal embeddings in the chain of reasoning.
@256chiru
@256chiru 19 дней назад
Hey wes, your AI videos are so good, I almost feel like my toaster is getting smarter! 🤖🔥 Keep the genius coming!
@erikjohnson9112
@erikjohnson9112 19 дней назад
Watch any Red Dwarf? (has a talking toaster; of course there are probably many other matches, but that occurred first for me)
Далее
OpenAI o1 CRUSHES PHD Level Experts! [HIDDEN THOUGHTS]
30:44
How Strong is Tin Foil? 💪
00:26
Просмотров 43 млн
The Surgery That Proved There Is No Free Will
29:43
Просмотров 1,7 млн
Why Democracy Is Mathematically Impossible
23:34
Просмотров 3,9 млн
AWS CEO - The End Of Programmers Is Near
28:08
Просмотров 444 тыс.
Elon Musk Is An Idiot (and so are Zuck and SBF)
19:33
How Strong is Tin Foil? 💪
00:26
Просмотров 43 млн