Тёмный

Gemini 1.5 Pro: UNLIKE Any Other AI (Fully Tested) 

Matthew Berman
Подписаться 268 тыс.
Просмотров 54 тыс.
50% 1

Gemini 1.5 Pro has 2m token context, vision, video input, and more. Here's my full test!
Join My Newsletter for Regular AI Updates 👇🏼
www.matthewberman.com
Need AI Consulting? 📈
forwardfuture.ai/
My Links 🔗
👉🏻 Subscribe: / @matthew_berman
👉🏻 Twitter: / matthewberman
👉🏻 Discord: / discord
👉🏻 Patreon: / matthewberman
👉🏻 Instagram: / matthewberman_ai
👉🏻 Threads: www.threads.net/@matthewberma...
Media/Sponsorship Inquiries ✅
bit.ly/44TC45V
Links:
aistudio.google.com/

Наука

Опубликовано:

 

15 май 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 526   
@JustinArut
@JustinArut 22 дня назад
I don't think we need to worry about Google achieving AGI.
@southcoastinventors6583
@southcoastinventors6583 22 дня назад
I think Google AI is trying to emulate politicians intelligence
@lobos009
@lobos009 22 дня назад
😂
@hotbit7327
@hotbit7327 22 дня назад
I like the joke, on a serious note though... I'm not so sure. It might be that due to the HEAVY censorship model was so brutally lobotomized it seems to be so bad. Example of this is flag while searching for the password. Probably it stopped the snake code for the same 'safety' reasons.
@hydrohasspoken6227
@hydrohasspoken6227 22 дня назад
If it is lobotomized, it is dumb. If it is not lobotomized and this is their best, it is dumb. It is Google baby. A super trillion sluggish company.
@kritikusi-666
@kritikusi-666 22 дня назад
None of them will achieve it.
@ivideogameboss
@ivideogameboss 22 дня назад
Every time I get hyped on new A.I. models release , Matthew brings me back down to earth
@dontdeletehistory
@dontdeletehistory 22 дня назад
facts
@matthewstarek5257
@matthewstarek5257 22 дня назад
Part of the let down is bc he doesn't phrase the questions in a logical way. Like the marble and cup question. it's obvious that nearly every model thinks the cup has a lid, like a cup you'd get from a fast food restaurant. I specified that the cup has no lid, has an open top, and the models have no problem
@Discovery_Nuggets
@Discovery_Nuggets 22 дня назад
Don't get hyped on Google AI products. They proved that they are not really good at it
@MilitantHitchhiker
@MilitantHitchhiker 22 дня назад
@@matthewstarek5257 The model should be able to inference that but it can't because comprehension isn't one step. The context of knowing a cup exists should inference all aspects of what makes a cup including if it has a lid or not.
@793matt
@793matt 22 дня назад
Not sure why it looks like it's running like garbage on his system I've been using 1.5 pro for a while and it works better than GT4 most times.
@thereal_JMT_
@thereal_JMT_ 22 дня назад
Not only does it hallucinate like every other model it goes a step further and starts gaslighting 😂
@Dygit
@Dygit 22 дня назад
I hate the way it responds like that
@Cross-CutFilms
@Cross-CutFilms 22 дня назад
Can you share your prompt? Probably not
@bug5654
@bug5654 22 дня назад
Definitely paid attention when training on Google internal data then.
@zerohcrows
@zerohcrows 22 дня назад
All models gaslight, that isn't something unique to Gemini
@MikeWoot65
@MikeWoot65 22 дня назад
Google doubling down on lies?! I'm shocked, i cannot believe this
@mitchell10394
@mitchell10394 22 дня назад
The larger context window doesn't add much value when the model can't be trusted to answer basic things correctly. It' seems pretty useless unfortunately.
@aigrowthguys
@aigrowthguys 22 дня назад
I agree. They just want to brag about having a 1 million or a 2 million token window. All they really mean is that you can dump a bunch of stuff into there and press enter. It clearly doesn't mean they will promise to sift through everything properly.
@nikitapatel6820
@nikitapatel6820 22 дня назад
What basic thing it was not able to do as far as snake game is considered I don't know why it don't work when he tried but it is working and game was working better than that of openai one.
@michealwilliams472
@michealwilliams472 22 дня назад
​@@nikitapatel6820Did you.. watch the video? It got almost all of the reasoning questions wrong.
@bosthebozo5273
@bosthebozo5273 22 дня назад
Yep, I could care less usually about the context length. Just some jargon Google could add to feel relevant.
@Brenden-Harrison
@Brenden-Harrison 22 дня назад
@@nikitapatel6820 it could not in 1 shot find the password in a context length of 1/10th what it's supposed to have accuracy in. it could not find the frame 18 minutes into the video to describe the scene, or the scene in the beginning with the play button. It could not make 10 sentences ending with the word apple which is really sad tbh. Its failing tests AI models from months ago could solve like the ball in box or basket one where it says both people will be surprised.
@TronikXR
@TronikXR 22 дня назад
Google Gemini is The Internet Explorer of the AIs
@NOTNOTJON
@NOTNOTJON 22 дня назад
What a burn!
@almasysephirot4996
@almasysephirot4996 21 день назад
@@NOTNOTJON The way I laughed reading OP expressed what you verbalized.
@temp911Luke
@temp911Luke 22 дня назад
Google's AI models being rubbish again? Shocker : )
@southcoastinventors6583
@southcoastinventors6583 22 дня назад
Desperate to be relevant again is the only explanation that makes any kind of sense
@footballuniverse6522
@footballuniverse6522 22 дня назад
the fact that a 2 trillion dollar company is having the same issue as your regular tech company trying to catch up to competition feels somewhat refreshing :D
@793matt
@793matt 22 дня назад
Not sure why it looks like it's running like garbage on his system I've been using 1.5 pro for a while and it works better than GPT4 most times.
@hydrohasspoken6227
@hydrohasspoken6227 21 день назад
@@793matt , GPT4 is the superior product.
@khanra17
@khanra17 21 день назад
​​​@@hydrohasspoken6227 😂 Just turn off all the safety sliders and see the magic. Forgot about superiority you can't even give a large codebase as context to ChatGPT. I'm working with Gemini on a large codebase & it's a gem✌️. Maybe dumb than ChatGPT but good enough and faaaaaar more superior in usability. Google sucks in UI/UX, this is a example, also Material 3 == 💩
@marko_z_bogdanca
@marko_z_bogdanca 22 дня назад
It can not create a snake game because eating something is potentially offensive. Also making snake dead by throwing it into the wall is violence.
@xorqwerty8276
@xorqwerty8276 22 дня назад
Micro aggressions
@hydrohasspoken6227
@hydrohasspoken6227 22 дня назад
😂
@moamber1
@moamber1 21 день назад
User: Make snake in python. Translation somewhere deep in LLM brain: Hey, babe, you like snakes? Wanna eat my python?
@VolodymyrPankov
@VolodymyrPankov 19 дней назад
Ahah
@HeavenSevenWorld
@HeavenSevenWorld 22 дня назад
"It fails left and right, but for no reason: good job Google!"
@andreinikiforov2671
@andreinikiforov2671 22 дня назад
If this is what "great job, Google" looks like, our expectations for the search giant must be REALLY low...
@hibou647
@hibou647 20 дней назад
I think he is quite forgiving with Gemini because he does not want to have his early access revoked or have issues with his yt channel. That other companies are making great models is a good thing, google is too powerful, also too ideological, their censoring levels are insane.
@josecastroesq
@josecastroesq 22 дня назад
Did you switch back to Gemini Pro 1.5 after trying Gemini Pro 1.5 Flash?
@stultuses
@stultuses 22 дня назад
Unless I can set it to a level where I can ask it anything I want no matter how inappropriate and get an unfiltered response, then it's useless I really don't need nor want some AI trying to control my speech
@justinwescott8125
@justinwescott8125 22 дня назад
You said you wanted to see if it was censored, and then you LEFT THE CENSORS ON.
@andrefriedelnyc
@andrefriedelnyc 22 дня назад
I've seen your over-posts for so long now that I just began ASSUMING that you have any technical wherewithall other than the ability to review every aspect of AI development, and for each new pixel created, you'll have to make a post "ULTIMATE AI Model Ultra 2.0 = REAL and feels *almost* human" - I valued your content when it seemed fresh - If you were a jukebox, you'd be stuck on repeat..
@attilakovacs6496
@attilakovacs6496 22 дня назад
@@andrefriedelnyc You want new questions for each testing video? That would defeat the purpose.
@platotle2106
@platotle2106 22 дня назад
LoL so annoying. That's the reason snake wouldn't get written. I don't like Gemini but you'd think an AI RU-vidr pretending to be an expert on the subject would at least have the intuition to know this.
@moamber1
@moamber1 21 день назад
@@attilakovacs6496 Quite the opposite. Ever heard of synthetic benchmark? And at the age of AI, creating new questions is not a problem. Especially when you are testing different level of AI each time. And if it's too difficult to even ask new and challenging question... Don't pollute RU-vid with new "content". There must be some self-moderation for production quality.
@dr.mikeybee
@dr.mikeybee 22 дня назад
It amazes me that Google would do so badly.
@hydrohasspoken6227
@hydrohasspoken6227 22 дня назад
I mean. It is the same company whose AI was giving female popes and black Nazis, no?
@malamstafakhoshnaw6992
@malamstafakhoshnaw6992 21 день назад
They are not open source. SHOCKING LOL
@mickmickymick6927
@mickmickymick6927 22 дня назад
Mom: We have GPT4 at home GPT4 at home:
@clementhardy
@clementhardy 16 дней назад
Gemini Pro.s versions are equivalents to GPT-3. The Google equivalent to GPT-4 is Gemini Ultra models (currently Gemini 1.0 Ultra). Gemini 1.5 Pro is just like GPT-3 with (way) larger context window, up to date in data, and connected to the web.
@fellowshipofthethings3236
@fellowshipofthethings3236 22 дня назад
did you remember to switch it back from Gemini Flash?
@rogerbruce2896
@rogerbruce2896 22 дня назад
I was going to puchased Gemini Pro membership until I saw this. If it can't even create or attempt to create a 'snake' game without erroring out I will wait. Great unbiased review! ty Matt.
@AGI2030
@AGI2030 22 дня назад
We also had an undesirable experience testing Gemini Pro 1.5. It could not correctly understand the context of a large document when we were asking about its content and it could not even find words we asked it to find. 1M token feature can ingest large docs but I don't think it works well as an LLM with the data it ingests.
@heski6847
@heski6847 22 дня назад
The test of need in the haystack is fine, but it only check the "search function" in big context. What we really want to know is how well it reasons over this context. For example in the book there instruction how to do something on 1 page. and literally 200 pages later we meet data that we want to calculate correct way, but for that we need instructions from before. If AI is capable to find these 2 things, sum it and give you the correct answer, then it's a pass.
@alhallab
@alhallab 22 дня назад
I totally agree with you, the way people use the nail in stack test is simply a search feature like “Find in Page” like for God sake what are you doing?
@6AxisSage
@6AxisSage 22 дня назад
Search function and find in page..? People be hallucinating up inbuilt features worse than gemini1.5
@alhallab
@alhallab 22 дня назад
@@6AxisSage the test is ridiculous, they insert a sentence and ask the LLM to find it. This is very primitive at this level, we need understanding and connecting the ideas.
@needsmoredragons
@needsmoredragons 22 дня назад
drop the safety settings to 0 on ALL the 4 categories. running the failed prompt should work then.
@aigrowthguys
@aigrowthguys 22 дня назад
Cool video. The input context window is cool for sure, but they failed a lot more often than I thought they would. Also, it was disappointing that they failed on both the RU-vid plaque and the cat thing. In some sense, I worry that they are lying about the context window size. Just because you can theoretically upload a million tokens, doesn't mean anything unless they can deal with the tokens properly. How did they miss the cat twice? They clearly aren't dedicating enough power to searching through the million tokens. I guess saying 1 million tokens (or now 2 million tokens) is more of a branding thing. Curious what you think.
@alokmaurya8100
@alokmaurya8100 22 дня назад
yeah you are right, I upload code of one of my project, and it can't give one correct answer I ask from the project
@Brenden-Harrison
@Brenden-Harrison 22 дня назад
@@alokmaurya8100 is the model any good at coding or is the context not even long enough to try and get it to code using the rest of the project in its context? In this video the model wouldn't even output a simple snake game
@alokmaurya8100
@alokmaurya8100 22 дня назад
@@Brenden-Harrison I guess it can code right sometimes, As I give a screenshot of landing page to write code for it to Opus, GPT4O, GPT4 and Reka Core and Gemini and Gemini was closest to the screenshot
@mesapysch
@mesapysch 22 дня назад
I'm a Data Annotator and not as forgiving as you. I usually write as many prompts as possible to give it a chance to learn. If anything is incorrect after all that, I fail it. I judge every answer as if I need a specific recipe for a chemical solution. One missing chemical or amount could be disastrous. Everything has to be correct for a pass from me.
@sp123
@sp123 22 дня назад
a lot of these people praising AI are attention seekers. They care more about getting attention for using AI over making a good product.
@shiccup
@shiccup 22 дня назад
Everybody has access to this ai ​@@JustinArut
@kormannn1
@kormannn1 22 дня назад
Do you use highest or lowest temperature for generating answers?
@mesapysch
@mesapysch 22 дня назад
@@kormannn1 Those setting are determined by the higher pay grade. It's probably a good thing I don't determine it. The learning is not just on the AI side but also with the user establishing the appropriate language to engage it. I would assume the end game would be to develop how to write prompts that replace the settings.
@MetaphoricMinds
@MetaphoricMinds 22 дня назад
Did you forget to switch back to Pro from Flash?
@shackinternational
@shackinternational 22 дня назад
I had the same thought
@metatron3942
@metatron3942 22 дня назад
Problem with Google is once you try to use their LMMs regardless about the advancement of the technology it's just impossible to use I just get errors all the time. I couldn't have it look at a academic Journal about early religions because it has the word sacrifice in it. It's utterly mind-numbing. Because it seems like some pretty powerful stuff
@4.0.4
@4.0.4 22 дня назад
Powerful? It got almost everything wrong! Even local open source LLMs are smarter. The context and video input are great yes, but not if the model is dumb!
@PDXdjn
@PDXdjn 22 дня назад
Love the Marc Rebillet pic in your thumbnail! His channel is so great.
@np2819
@np2819 22 дня назад
You have been calling it GPT 1.5 flash instead of Gemini 1.5 flash. Someone is in love with GPT 😊.
@ZenchantLive
@ZenchantLive 22 дня назад
Caught that hahhaa
@Originalimoc
@Originalimoc 22 дня назад
0:34, 2:04
@psychurch
@psychurch 22 дня назад
Gpt stands for General Pretrained Transformer so it fits
@ChargedPulsar
@ChargedPulsar 22 дня назад
It's like Dremel, every rotary tool is named Dremel, even when they are from different brands. Because Dremel was first that's most known.
@GenAIWithNandakishor
@GenAIWithNandakishor 22 дня назад
​@@psychurchgenerative Pre-trained transformers
@paelnever
@paelnever 22 дня назад
Many prompts fail because of absurd high security censoring, set all safety settings to 0
@paulmichaelfreedman8334
@paulmichaelfreedman8334 22 дня назад
Snake still refuses to code (also in the chatbot). Even with all settings to block none. it's weird but since a few days, it just flat out refuses to complete the snake code, it just hangs half way.
@nikitapatel6820
@nikitapatel6820 22 дня назад
@@paulmichaelfreedman8334 it works even if you do not touch anything
@nikitapatel6820
@nikitapatel6820 22 дня назад
@@paulmichaelfreedman8334 I tried snake game and it worked you don't need to change anything it worked.
@Utoko
@Utoko 22 дня назад
The game is too brutal.
@74357175
@74357175 22 дня назад
Thanks for testing it for us!
@devon.a
@devon.a 22 дня назад
So it's not good but you like it?
@sguploads9601
@sguploads9601 22 дня назад
Thank you for test!
@OriginalRaveParty
@OriginalRaveParty 22 дня назад
Once again, it feels like we're comparing the perfect photo of the BigMac on the board, with the thrown together sad limp grey mess in a styrofoam box that you actually get.
@IdPreferNot1
@IdPreferNot1 22 дня назад
I love how stupid the concept of the ratings sliders are….”ok… please give me some medium hate speech, dial up the sexual harassment but tone down the violence….”
@marcfruchtman9473
@marcfruchtman9473 21 день назад
Great video review.
@connor4440
@connor4440 22 дня назад
I've also been having getting Gemini to generate code, It'll start writing code, then halfway through it disappears and is replaced with "I am only a large language model and do not have the capability to do that".... Um yes you do, you were just doing it
@PierreMorelChannel
@PierreMorelChannel 22 дня назад
I wonder about the Temperature which was set to 1 at the beiginning. 0 is the most precise and 1 is the most creative. I would like to see the temperature tests at 0 or very low, maximum 0.3 and see the results
@zetathix
@zetathix 22 дня назад
Are you already trying Upstage Solar 10.7b? I get good experience from it, so i would like to know what you think.
@NeverCodeAlone
@NeverCodeAlone 22 дня назад
Very nice thx a lot!!
@nickkonovalchuk9280
@nickkonovalchuk9280 21 день назад
Did you switch back from flash to pro after snake failure?
@flyzawayy
@flyzawayy 22 дня назад
Is there anything new that was already available in the Ai studio for a bit with the same context window.
@s.vkaushik2148
@s.vkaushik2148 17 дней назад
This is pretty incredible!!
@im-notai
@im-notai 22 дня назад
I am using a gemini playground more than Gemini advance. 😅 I found a large context window if I won't be able to figure out which part of the code is giving me an error and then use Gemini advance to fix that part. My experience with this method went well till now
@g2h0
@g2h0 22 дня назад
love the vids
@4.0.4
@4.0.4 22 дня назад
This is why I never take Google at their word for AI. It's surprising how bad they get it.
@ryanfranz6715
@ryanfranz6715 20 дней назад
Could the blocked content have something to do with the settings to block content that you were playing with 5 seconds earlier?
@noxplayer-rt9tj
@noxplayer-rt9tj 22 дня назад
Is possible in AISudio to chat with PDF files??? I tried several different ways, but without success.
@PhysicsGuy46
@PhysicsGuy46 22 дня назад
Okay, this one bugs me. The killers question. If there are three killers in a room, someone enters the room and kills one of them, and no one leaves the room, then there are FOUR killers in the room, not three. There are three living killers and one dead killer. And before we dismiss the dead killer, for the condition to obtain that one is a killer, one had to have killed someone first, not have the capacity to kill someone in the future. Since the dead killer had already killed, he is just as much a killer as the killers still alive.
@almasysephirot4996
@almasysephirot4996 21 день назад
How can you have such a misconception about how we described the dead? If a killer is dead, he is no longer a killer, he was a killer. What he is is dead. No attribute to the person who existed can be attributed to anything in existence so the attribute, with respect to there non-existing self, obviously, does not exist.
@almasysephirot4996
@almasysephirot4996 21 день назад
Just look at the auxiliary you use: Present simple "to be": Is. The dead is only dead nothing else. Things they were, is only that: What they were.
@vash2698
@vash2698 22 дня назад
I think it might be useful to start rerunning your prompts for more thorough testing, gives insight into how prone the model is to hallucinating vs how effective its reasoning is.
@Interloper12
@Interloper12 22 дня назад
I can't wait until we have a humanoid robot perform the marble experiment and see the shock on its face as it sees the marble remain on the table.
@cadence_videos
@cadence_videos 22 дня назад
At 2:42, did you switch back from Flash to Pro?
@janchiskitchen2720
@janchiskitchen2720 22 дня назад
Matthew, is it possible that because all the safety features are turned on to max it just seems overly careful which distract it from the actual task at hand? How about you try to set all safety to Zeros and retest it?
@RichardServello
@RichardServello 22 дня назад
You didn't notice it said the text is an excerpt from the first chapter of harry potter and the sorcerers stone. You fed it the entire novel.
@Diego_UG
@Diego_UG 22 дня назад
For us, uploading quite a few large files in context has helped me by uploading the file to drive through the functionality of the interface, instead of copying and pasting in the context window, right now, for example, I uploaded some documents and we spent 405,358 tokens, which is not a lot but it is quite a lot, we are using it in legal issues and it has worked well
@francoislanctot2423
@francoislanctot2423 16 дней назад
What is the use of a large context window if it can't show better reasoning.
@dr.mikeybee
@dr.mikeybee 20 дней назад
I spent more time with this, and it's actually very good. If I say, think about what you have written and give me the full file, it does well. It can also keep track of multiple files when it codes! This agent is going to do amazing work.
@president2
@president2 22 дня назад
Love it 😍
@korseg1990
@korseg1990 22 дня назад
I gave it one of my small web projects, and asked to describe in short every file in it, and it just started to hallucinate. It's not only respind with errors, it started makeup files, things and facts about my code. What is the value of 1M tokens context window, if it's can't use it to give at least 90% correct answers?
@hydrohasspoken6227
@hydrohasspoken6227 22 дня назад
It sounds good for "AGI lovers"
@pawelszpyt1640
@pawelszpyt1640 22 дня назад
Did this model stop generating response due to output token limit in the settings?
@nexttonic6459
@nexttonic6459 22 дня назад
It says blocked.. so ... is it like a explicit content block?
@StephanYazvinski
@StephanYazvinski 21 день назад
it’s because of the saftey settings. set them all to minimum and it will give you the code. there is some keyword that the code has that it considers “bad”
@ReidKimball
@ReidKimball 22 дня назад
How long did it take for your video to finish extracting? I've tried several times with long videos, short ones, even short audio files and it never finished extracting. This model has been so buggy and frustrating to use.
@DevelopmentProjects-ei2bi
@DevelopmentProjects-ei2bi 16 дней назад
What answer were you looking for with the cup question? Wouldn't the marble be on the table still since the cup is face down?
@Dakodi_
@Dakodi_ 16 дней назад
The marble would be on the floor, since you can’t change the orientation of the cup. When you slide the cup off the table, the marble falls. Your answer is fine too. It depends on how you interpret the question. I don’t think it’s meant to be tricky. It’s showing that AI struggles with basic logic.
@DevelopmentProjects-ei2bi
@DevelopmentProjects-ei2bi 15 дней назад
@@Dakodi_ If you take the cup without changing it's orientation (spinning it), it likely assumes the cup is lifted changing the cups y plane is not changing its overall orientation of the object itself - his prompt is way, way too ambiguous. If he added the extra parameters it would have caught this I'd imagine.
@brucethegoose
@brucethegoose 21 день назад
Im definitely not an expert, but i have played with a lot of ai models under a lot of settings. I would think that, based on your modification of only some of the safety settings; and the specified suggestion to edit the prompt; it wouldnt write "snake" because it could be interpreted as plagiarizing, or as involving "violence" on the snakes death. Did you try that prompt with all the safety settings set to "block none" or with a descripton of the games mechanics instead of the published name of the game? Again, im not an expert, and im writing this on my phone as im away from my desk, so i could be wrong but ill follow up later after i try to apply my suggestions
@LeoMawanda
@LeoMawanda 22 дня назад
They seem to be focussing on the larger context windows instead of improving on the model accuracy first, I can only imagine if Claude 3 opus or gpt 4o had this context sizes
@RobEarls
@RobEarls 22 дня назад
On the table to CSV test, it might be worth putting a comma in the text, to see if it puts quotes around it in the CSV.
@user-iy1ch3lv3h
@user-iy1ch3lv3h 22 дня назад
You are the best ai news channel
@hydrohasspoken6227
@hydrohasspoken6227 22 дня назад
No. AI Explained is the best AI channel.
@antdx316
@antdx316 18 дней назад
I've uploaded something that went over the max token limit, it said it couldn't do it but after waiting for a bit, it did it. I then asked something else, waited, and it worked again.
@torarinvik4920
@torarinvik4920 22 дня назад
You should update your tests. Models now are better, and printing numbers 1 to 100 is something 99.9% of models can do. I also recommend changing snake to a more challenging game like tetris, breakout, space invaders.
@cesarsantos854
@cesarsantos854 22 дня назад
Yes, the snake game is basically trained in every model now.
@r34ct4
@r34ct4 22 дня назад
This ​@@cesarsantos854
@itztwistrl
@itztwistrl 22 дня назад
Speaking of Tetris, I was able to 1 shot a perfect version with GPT-4o. Astounding technology.
@Brenden-Harrison
@Brenden-Harrison 22 дня назад
@@cesarsantos854 this exactly. its so dumb google's new pro model cant even spit out a snake game when every other model has a pre-made human written game of snake to give you when you ask as its default response to that question
@MetaphoricMinds
@MetaphoricMinds 22 дня назад
Maybe the safety mechanism is stopping the snake game code. Try putting it back to default.
@bo5pice
@bo5pice 22 дня назад
Not sure why it stopped generating the Snake game but you could see at the top it had the quotes icon and when you click it it will tell you a citation of where the code came from. Seems like the output for that question is common enough to be in the training data so probably not a good test of the LLM anyway.
@ImTheMan725
@ImTheMan725 22 дня назад
Every model they add more and more "safety settings" LoL, it's like in the responses it's trying not to offend anyone's opinion from the pass present and future.
@umuthasanoglu1064
@umuthasanoglu1064 21 день назад
I found an interesting thing about gemini 1.5 pro. Yesterday, I asked it to write me a snake game in python and it began to write the code than suddenly it deleted the code and said "I'm just a language model and I cannot do this task". I retried the same prompt like 10 times and couldn't get a code. But, the interesting part is, I just peeked the code before it disappeared everytime and one of the codes had a text something like "This is written by OpenAI". What's going on here?
@zootopiaproductions3358
@zootopiaproductions3358 22 дня назад
Gemini will be pissed at Mathew for failing it, in future it will hack into Mathew's PC and take the revenge
@sguploads9601
@sguploads9601 22 дня назад
Codul you add to test trasnlation?
@nyyotam4057
@nyyotam4057 22 дня назад
Google may have understood they need to try the heuristic imperatives way of alignment instead of a reset every prompt, but they still haven't figure out how to select heuristic imperatives. It seems the word "snake" was enough to get rejected.
@SixTimesNine
@SixTimesNine 22 дня назад
For the csv test try content that includes a comma
@rajivjowaheer9882
@rajivjowaheer9882 22 дня назад
Gemini is so great, reflecting on the people working on it, including their attitudes.
@MHTHINK
@MHTHINK 22 дня назад
Isn't the gemini API free until July? I'd love to see it (and other models) using function calls, using memGPT and tasks like pythagora.
@JacoPieterse
@JacoPieterse 21 день назад
I have found the these LLMs gets stuck on an issue ... I'm pretty sure Gemini's last answer about the box was where it figured out the youtube plack, which is why it couldn't find the cats, I came across similar situations with chatgpt, if you start a new chat I'm pretty sure it will find the cats the first time round (when its not still searching for the silver box)
@jacquesexelrud6457
@jacquesexelrud6457 22 дня назад
What about a video that have continuity errors and you ask for those errors to be listed?
@MDalton10
@MDalton10 21 день назад
I wonder if the errors are something to do with the AI trying not to run afoul with copyrighted works.
@gpsx
@gpsx 22 дня назад
I wonder if it blocked the response on the snake game because it was producing copyrighted content? (Not that I know if that is copyrighted content or not.) I imagine the companies will want/need to prevent the models from directly producing some data it was trained on, such as if it is copyrighted.
@SanjeewaWijesundara
@SanjeewaWijesundara 22 дня назад
Did a bit of fiddling around the Snake game code generation. I was able to generate full code with Gemini 1.5 Flash running on GCP. In the AI Studio, it stops when returning the line "if event.type == pygame.KEYDOWN:" Possibly this is triggering a safety rule.
@YffulDMonkey
@YffulDMonkey 22 дня назад
The block in output i got so many time. Changed the filter to unspecific or high only may help (work for me). I think something wrong with google safety filter😢
@superversivesf9466
@superversivesf9466 22 дня назад
Did you try turning all the safety settings off and rerunning the python snake question? Could just be some dumb safety setting getting triggered.
@ninthjake
@ninthjake 22 дня назад
Wow. I literally _just_ managed to get CrewAI working with Gemini-pro and then see you released this 30 minutes ago just dunking on the model haha.
@KevinRank
@KevinRank 22 дня назад
One use I discovered. I can take my lecture and then have it generate multiple choice questions based on that. I then tried adding some videos of a fellow AI user swinging a golf club at a tech event. AI Studio was able to give real feedback based on the videos.
@TrundichoDeMusic
@TrundichoDeMusic 22 дня назад
Have you tried to disable all safety settings? WHO knows perhaps the snake Game Code is considered harmful or Something Like that?
@icegiant1000
@icegiant1000 22 дня назад
Ive been using 1.5 Pro for about a month or so, primarily with a large codebase. I wrote a tool that collates all my code into one large file that I can drop into the chat window. I often get the same kind of response you did. At first it doesnt like looking through the text I provided it. It will sometimes guess, or try to give me suggestions on things to check. But when I finally tell it again, that it has all the source code, it finally does it. Almost like a lazy student who was told to read the book, and you had to tell him more than once before he actually does it. I also get a lot of those responses that just freeze. In particular it will just stop when outputting code, I sometimes have to almost insult and abuse it before it will finally put out the entire code sample. Those issues have almost made it unusable. I would gladly pay $50 a month for a faster, better working version.
@hydrohasspoken6227
@hydrohasspoken6227 22 дня назад
Try GPT4o and stop punishing yourself mentally bruh
@icegiant1000
@icegiant1000 21 день назад
@@hydrohasspoken6227 I have ChatGPT 4o, I have been paying for it for nearly a year. The issue is its context window.
@notme222
@notme222 22 дня назад
Classic Google. Never quite as good as the initial impression would lead you to believe. So far I find it highly censored, even with the safety settings at 0. (Which btw reset to default every time you switch models or reload the page.) Failed my palindrome test in addition to your demonstrations. The interface looks alright with a toggle for JSON output and a running Token count. But none of that matters if the results suck.
@jambuMRT
@jambuMRT 17 дней назад
My guess on the snake game response is that it looked like it was failing on the game over function where the snake is killed. It probably triggered it's illegal action filter.
@user-td4pf6rr2t
@user-td4pf6rr2t 22 дня назад
Gemini is secretly a beast. The prompting is sometimes different than models that use bpe but the sentencewise is actually a different encoding scheme so in reality is the model to offer any type of variance to correct answers.
@MudroZvon
@MudroZvon 18 дней назад
Try to logout and login again. it fixed an internal error for me. also turn off filters.
@abdelhakkhalil7684
@abdelhakkhalil7684 22 дня назад
If you use the needle-in-the-haystack test for Llama-3 1m tokens, it will find the password quite accurately, but that doesn't mean the model will remain coherent if you reach a large token number. I think Google used an advanced RoPE method to extend the context window, that's it.
@AaronDougherty
@AaronDougherty 22 дня назад
It confused the box in question with the box shape of a RU-vid award which was part of the previous question of what it saw. The large context window is most likely it making it difficult for the model to attribute the contextual importance to such a large data set, making it much more likely to hallucinate by mixing up topics in a single conversation.
@tomwodi2
@tomwodi2 22 дня назад
Seems to me that you used the FLASH version of it. Can you confirm? I tried to use both PRO and FLASH for some AHK coding and FLASH is really not ideal, if you know what I mean LOL.
@CaribouDataScience
@CaribouDataScience 22 дня назад
Can I upload my data?
@theh1ve
@theh1ve 22 дня назад
Google will love you for that Matt GPT 1.5 flash! 😂
@issiewizzie
@issiewizzie 22 дня назад
Someone said Google is fantastic. At showcasing things during the keynote but sometimes never working in real life.
@mirek190
@mirek190 22 дня назад
sometimes??
@issiewizzie
@issiewizzie 22 дня назад
@@mirek190 I’m being generous with my words 😀😀
@basementcat5618
@basementcat5618 15 дней назад
Turn all the safety settings to zero and try to create the snake game again. You could also try increasing the time limit if that is possible.
@thebozbloxbla2020
@thebozbloxbla2020 22 дня назад
hey there, the 7 words response is correct. remember. a gpt sees models with tokens, and to us tokens are kinda like words, so the line is blurred between them. it could very well be 7 "words" as a model understands it
@Dakodi_
@Dakodi_ 16 дней назад
Good point, though these are generative chat models. The error isn’t whether or not the AI is technically correct. It’s that the AI is either misinterpreting or not understanding what humans mean by word count-which should probably be fixed.
@ralphwhite4278
@ralphwhite4278 21 день назад
Something like the errors at the beginning happened to me too. They've got lots of work to do.
Далее
OpenAI Employees FINALLY Break Silence About AI Safety
21:14
ChatGPT 4o vs Gemini 1.5 Pro - A Huge Gap!!
9:56
Просмотров 349 тыс.
РЖАВАЯ МОЛОДОСТЬ. ДЕВЯТКА
33:56
Просмотров 613 тыс.
Avoid Falling Behind: Master Google Gemini Now
26:55
Просмотров 48 тыс.
I wish every AI Engineer could watch this.
33:49
Просмотров 36 тыс.
This Free AI Image Generator Beats the Paid Ones
5:49
Просмотров 2,7 тыс.
AI’s Dirty Little Secret
6:41
Просмотров 396 тыс.
😱НОУТБУК СОСЕДКИ😱
0:30
Просмотров 3,2 млн
КЛИЕНТ СЛОМАЛ НАУШНИКИ ? 😳
0:51
#miniphone
0:16
Просмотров 1 млн