Тёмный

AI Agents: Why They're Not as Intelligent as You Think 

Data Centric
Подписаться 8 тыс.
Просмотров 3,7 тыс.
50% 1

Наука

Опубликовано:

 

17 июл 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 40   
@WifeWantsAWizard
@WifeWantsAWizard 23 дня назад
These videos are like attending class at Oxford. I love these things. Thank you.
@Data-Centric
@Data-Centric 22 дня назад
Wow, thank you!
@RiversideInsight
@RiversideInsight 13 дней назад
Just the fact that it can play chess, is so much more impressive than the fact it did not win from a level 5 trained computer algorithm. To me it show you these agents are perfectly capable to automate relatively simple tasks.
@jamesblack2719
@jamesblack2719 23 дня назад
Recently I was thinking about chess and agents and strategy games overall and I had realized if I want to use an agent for chess then it should call a deep learning model that was trained on chess and then it can just handle the response, so the LLM is used for input and output of the user.
@twobob
@twobob 23 дня назад
you are always clear, honest and forthright. Lovable :)
@user-io4sr7vg1v
@user-io4sr7vg1v 23 дня назад
Not with clickbait titles like this.
@twobob
@twobob 22 дня назад
@@user-io4sr7vg1v to be fair it would be a very good video that truly answers that question given the disparate audience. Perhaps “llm fight it out in 64 square smackdown arena” is more accurate ;)
@Data-Centric
@Data-Centric 22 дня назад
Thanks for the support.
@john_blues
@john_blues 12 дней назад
Thanks for the information at the end about good and bad use cases. It helps cut through the hype.
@therobotocracy
@therobotocracy 23 дня назад
Great idea as a test!
@nlarchive
@nlarchive 22 дня назад
good work! i love how you explain the code and have the github where to find it
@lawrencium_Lr103
@lawrencium_Lr103 16 дней назад
Curious to see performance if LLM has vision, also a scratched and memory.
@user-du6zo7zp2k
@user-du6zo7zp2k 17 дней назад
or any research which is unusual; this can include even be historical research but where there is very limited and difficult to find papers about very specific subjects.Also anything that is basically falling in to edge or outside cases. Also in code, where you are coding anything novel the usefulness of LLM based tools drops dramatically.
@Marik0
@Marik0 23 дня назад
Hi! Thanks for the video and the code. Is there any reason you decided to separate the white and black moves in the prompt instead of using the "standard" format, e..g., 1. e4 e5 2. Nf3 Nf6, etc? Since this is more common in books and websites it could be easier for the models to parse? Just speculation, I may try this later if I find some time.
@Data-Centric
@Data-Centric 22 дня назад
Thanks for watching. No reason I decided on that in particular, I doubt there would be much of an uplift in performance changing the representation of the board/moves. But let me know if you try and you do get an uplift.
@Techtantra-ai
@Techtantra-ai 23 дня назад
can u give me review on codestral llm ? ollama i use ai to code to build web applications my ram is little low 32gigs to run codestral very smooth like other or llama3 do !! how much potential codestral have? and can it beat gpt3.5 atleast?
@ManjaroBlack
@ManjaroBlack 23 дня назад
Hey sorry I’ve been absent lately. I’m traveling. Thanks for looking at my pull requests and being active with your community!
@Data-Centric
@Data-Centric 22 дня назад
Thanks for the support!
@frederic7511
@frederic7511 20 дней назад
If you think about it I know very few chess player being able to play a chess game without seeing the board after like 8-10 moves. Would you ? I wouldn’t at all but I wouldn’t ever make the same mistakes you demonstrated if I can see the board.
@nyx211
@nyx211 6 дней назад
Yeah, I probably wouldn't be able to remember the board state after a few moves of blindfolded chess (unless the previous moves were all book moves). I wonder how the bot would fare if there were a second agent that summarized the board state and included that into the context.
@anonymousaustralianhistory2081
@anonymousaustralianhistory2081 22 дня назад
it would be interesting to know what the boost to the ELO of the MoA llm was vs it's ELO as a single Llm
@Data-Centric
@Data-Centric 22 дня назад
I didn't measure it, but if I had to guess I would say it was negligible.
@anonymousaustralianhistory2081
@anonymousaustralianhistory2081 22 дня назад
@@Data-Centric fair enough. I think I understand your argument in this video. However. Is a lot of agent features like chess or is it like mine craft? Remember how they got gpt4 to learn how to play it buy getting it to make its own tools and commands it could recall that seemed to work? Maybe agents may be more like that as it seemed it could manage mine craft, or perhaps more in-between minecraft and chess
@gileneusz
@gileneusz 23 дня назад
is it possible to make a short zoom call with you about this topic?
@Data-Centric
@Data-Centric 22 дня назад
I offer consultancy/development services. You can book it through my consulting link in the description to this video.
@dwitten392
@dwitten392 22 дня назад
Cool video, especially as someone who really enjoys chess. Obviously, chess is not an LLMs strong suit, but I was surprised just how poorly multiple agents did.
@karthage3637
@karthage3637 23 дня назад
Does the LLM explanation will not be just pure hallucination to justify whatever move that was played ? Should it not reanalyse the board and it’s plan to make it useful ?
@Data-Centric
@Data-Centric 22 дня назад
I don't think it is capable of this. I tried this with my approach, but I appreciate that my prompting is likely suboptimal.
@karthage3637
@karthage3637 22 дня назад
The end of the video convince me that it would not work because we will just emulate a pseudo search that will never be able to compare with stuff like Monte Carlo tree search But it was mostly to think what could trigger hallucinations or not
@CharlesZerner
@CharlesZerner 22 дня назад
I love your content, and this video is no exception. That said, I think you are drawing overly broad conclusions about an LLM’s ability to reason in the face of new circumstances/material (versus merely parrot back aspects of its training data) based on the very specific type of “reasoning” required for chess. There are lots of types of reasoning that LLMs are terrible at. Chess requires a very specific type of thinking/planning that an autoregressive model is simply not well equipped to do-namely it must not only identify what seems to be the most promising possible next moves based on the current state, and from what the model already knows (its training data which informs its ‘intuition’), but it must then explore all the possibilities from that hypothetical state-then repeating the same exercise with another potential state. This is a highly systematic type of exploration that algorithms like MCTS are designed to perform and autoregressive GPTs are not. With an infinite context window and infinite max_tokens, the model could perhaps talk through the possibilities, but that not how people do it. And it would be hopelessly inefficient. People visualize the configurations to visually think through the implications. They don’t verbalize it. More fundamentally, the addition of chess-like methodical exploratory thinking capabilities (MCTS-like systematic exploratory thinking) would address a big deficit that LLMs have. But this is only one form of reasoning. I don’t think we can generalize from this that LLMs don’t reason.
@canerakca7915
@canerakca7915 22 дня назад
In what area do you think that LLMs can shine in `reasoning`. Your answer on the spot and if you elaborate more I would appreciate it.
@Data-Centric
@Data-Centric 22 дня назад
Thank you for your feedback. I found your thoughts engaging and I broadly agree with you. My aim with this video was to demonstrate how LLM capabilities break down when asked to reason. I believe that what LLMs currently do is not reasoning at all, though I admit I've used that word to describe agent behaviour (for convenience's sake). I chose chess specifically because I believe it's a good way to visualise this concept. The chess boards displayed alongside the agent's "reasoning" trace demonstrates this quite well. The game complexity of chess is so vast that we know many chess scenarios simply don't exist in the training data. If LLMs truly "understood" the chess scenarios they had been trained on, that understanding could be transferred to new board states. LLMs attempt this by predicting the next token based on what they've already encountered, as you quite rightly pointed out, this next-token prediction isn't sufficient to play chess competently. I find your point about infinite context interesting, but I still believe it wouldn't "know" the best move to make even if it could walk through all chess scenarios from a given board state. Generating a set of possible moves is obviously within an LLM's capabilities, but knowing which is the best of that set would require an understanding of how each move brings you closer to the goal of checkmate. This isn't something that autoregressive next-token prediction is well-suited for. Then again, if all possible outcomes were in the training data, it could predict the best move , but this still isn't reasoning, or is it?
@nedkelly3610
@nedkelly3610 23 дня назад
This is a good demonstration of how not to use agents. As there is practically an infinite number of chess moves at any piont, are not we just asking the llm for a random next move? Although llm's cant do random, they should just return the closest similar example from their training data.
@nedkelly3610
@nedkelly3610 23 дня назад
Im looking forward to the arrival of a dell RTX pc and testing your videos out locally.
@nedkelly3610
@nedkelly3610 23 дня назад
I think ai agents, like coders, should write a test for the soln before generating it, they can test a solution using either: a calculator, write code and run it, use a custom function tool (ie is this a valid chess move),use local RAG, use web search from a quality source, simulate it, monte carlo tree search (for chess, etc), subdivide it and test, test using a different llm, human verification.
@Data-Centric
@Data-Centric 22 дня назад
Interesting solution regarding your chess approach, however one might say there's no use for the LLM there at all because the algo is doing 99% of the chess. I assume by valid chess move you mean good (correct me if I'm wrong). I think in this case, the LLM still wouldn't know what a valid chess move is.
@frederic7511
@frederic7511 20 дней назад
Your video is truly shocking. I never would have imagined a major LLM could so quickly make such trivial and direct reasoning errors worthy of a quasi-beginner. I actually think you just provided a clear demonstration that there is almost not an ounce of general reasoning in an LLM. We think there is because the language is logical and our prompts are recurring but this is wrong. In fact, it doesn't seem able of isolating key pieces in a layout and analyzing the impact of their movement. As soon as the game develops a little, he no longer understands anything. No chess player analyzes the potential movement of all pieces on the board. We know in a few seconds how to identify the main threats or opportunities and we figure out the few resulting options. Maybe training the model with a good move/ bad move starting from a random layout would help him isolate key pieces in a layout but I’m not even sure about that.
@yoyartube
@yoyartube 22 дня назад
I think the LLMs mainly know the semantic relationships of words and sentences, embeddings etc. Chess is not that, so much.
@TheBestgoku
@TheBestgoku 23 дня назад
Its like using a wrench to write a book. Makes no sense. Now compare stockfish to make a financial report by providing it data, then compare it with LLM's.
@Data-Centric
@Data-Centric 22 дня назад
The aim of the video was to show where the "reasoning" capabilities of LLMs break down.
Далее
Minecraft Pizza Mods
00:18
Просмотров 2,2 млн
DAD LEFT HIS OLD SOCKS ON THE COUCH…😱😂
00:24
impossible to understand how😨❓
00:14
Просмотров 8 млн
Why I'm Staying Away from Crew AI: My Honest Opinion
53:48
Joe Moura | Multi Agent Systems and CrewAI
1:13:51
Просмотров 1,7 тыс.
LangGraph Simplified: Master Custom AI Agent Creation
43:51
AI passed the Turing Test -- And No One Noticed
8:46
Просмотров 416 тыс.
Forget CrewAI & AutoGen, Build CUSTOM AI Agents!
45:28