Stanford "Octopus v2" SUPER AGENT beats GPT-4 | Runs on Google Tech | Tiny Agent Function Calls

Подписаться 195 тыс.

Просмотров 48 тыс.

50% 1

Learn AI With Me:
www.skool.com/natural20/about
Join my community and classroom to learn AI and get ready for the new world.
Octopus v2: On-device language model for super agent:
arxiv.org/html/2404.01744v1
#ai #openai #llm
BUSINESS, MEDIA & SPONSORSHIPS:
Wes Roth Business @ Gmail . com
wesrothbusiness@gmail.com
Just shoot me an email to the above address.

Опубликовано:

30 июн 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 155

@vincearo1768 2 месяца назад

I just realized the correlation of the word 'agent' to the Matrix movie. In the movie the agents are the ones who keep track of everything and keep the AI system working IE agent Smith.

@glamdrag 2 месяца назад

i thought that thats the reason it's called agent... or?

@user-jc2ts8ol8l 2 месяца назад

@@glamdragcomputers and programming came before the matrix.

@b.b6656 2 месяца назад

the word 'agent' in general describes something or someone that causes a change or exerts power (autonomously) in some kind of way.

@geobot9k 2 месяца назад

When I worked in a call center they didn’t treat us like people and called us “agents”. Always creeped me out. This experience and they weird way language is used with AI makes me think when AI become unquestionably sentient the owner class will do everything in their power to deny it so they can build a system of techno slavery

@lLenn2 2 месяца назад

Coincidence.

@friendlybetty 2 месяца назад

Openai needs to release gpt5 😂

@DJVARAO 2 месяца назад

They already have GTP7. But I guess is MS the ones who will call the release dates.

@hqcart1 2 месяца назад

do you really think Octupus v2 is better than gpt4? this video was made with 0 research, if this was the case, you will see it #1 model on huggingface and everyone talking about it, but it's like the rest of garbage models, buried alive.

@kira6353 2 месяца назад

@@DJVARAO how the hell would they have gpt7 by now? they are just done training gpt5 probably (they're probably testing it now) and for gpt6 and 7 it would require much much more computational power to train and a lot more time, it's crazy to say that they have it already

@Alex_1729 2 месяца назад

THey will. I presume they are waiting and then they will drop it

@jameslynch8738 2 месяца назад

@@hqcart1If it went through Stanford's research I have little doubt about it, myself. I've seen some excellent and brilliant work from them.

@seakyle8320 2 месяца назад

plural of octopus -> octopussies

@pjtren1588 2 месяца назад

Achtually it's OctoPLUSes, because you're adding to them.

@mtdfs5147 2 месяца назад

@@pjtren1588no it's octopussies

@2BluntsLater 2 месяца назад

Octopi

@mtdfs5147 2 месяца назад

@@2BluntsLater no it's octopussies

@centuraxaum5951 2 месяца назад

Octapussies is good. One for each day of the week and one in standby mode.

@hqcart1 2 месяца назад

Why no one is getting STUNNED and SHOCKED???

@arturoarturo2570 2 месяца назад

Not funny comment anymore, just shut up.

@kingrara5758 2 месяца назад

Too many things..

@OscarTheStrategist 2 месяца назад

This is very nice, although I am surprised at the latency numbers discrepancy because I would imagine standford researches would at least try to compare 1 to 1. Llama 2 is not inherently slow (not to that degree) it all depends on the hardware running it. But of course, that’s not the point of the paper. Specialized tiny models do really well on their niche task, we’ve seen that. The reason we need both tiny and bigger and more robust and generally more intelligent model is for orchestration. With smaller and more specialized models for execution. That reduces costs in a major way. I run some agentic workflows for software development and sometimes you can spend $150 (GPT4) in a day and still not achieve what was needed. That may not sound like a lot but I’m only running tiny experiments. For large operations, reducing the costs by 1,000x using smaller models for execution is a necessity.

@paulmuriithi9195 2 месяца назад

funtion calling workflows really must rely in MoE stacks. this reduces costs significantly for anybody using commercial LLM's. your post displays the goodness of small specialized models.

@DJVARAO 2 месяца назад

Awesome breakdown!

@xalgo7318 2 месяца назад

Companies will always think of the "more, more, more" approach, that's kinda the main idea of companies especially within big tech. It's nice to know that smaller things like this are on the horizon, producing great results with less effort. I can't wait to see what smaller models can do as well as what brand new news we get with AI from your channel.

@BlimeyMCOC 2 месяца назад

Coming right after OAI are investing in a new AI device, interesting

@willbrand77 2 месяца назад

just imagine what a massive data center running millions of these tiny models could achieve

@monkeybird69 2 месяца назад

This is what I've always been saying, it's all about the algorithm, once they have that right then compute size, speed and power can be reduced.

@alteredalley 2 месяца назад

Helpful video ❤

@dreamphoenix 2 месяца назад

Thank you.

@vaendryl 2 месяца назад

> allround performance of a model goes up as you scale up > specialized performance can go up as you scale down. hmm. sounds like the human brain with all the small localized and specialized regions combined with big-ass frontal cortex. maybe, AGI can be achieved by having many different models work together very closesly.

@patricklanquetin9373 2 месяца назад

Did this paper reduce dramaticaly the semantic field to the function call small semantic area only? If yes, why outperform larger models is new?

@EffortlessEthan 2 месяца назад

soooo epic!

@justtiredthings 2 месяца назад

Shocking that a 2B parameter model is fast. Lol.

@elshadshirinov1633 2 месяца назад

Bigger and more general or smaller and more specialized. This has always been the trade-off.

@yahiiia9269 2 месяца назад

All you gotta do is attach them all. Generalist communicator which speaks to hive specialists. In fact, you could train on a model on what not to do and negate it as well, like I did with an image Lora.

@stevokebabo 2 месяца назад

Small model is essentially what Tesla is doing with FSD. The software has to run local in the car and using 13 volts architecture. Maybe 48 volts on the Cybertruck and Robotaxi vehicle to be announced in August.

@mattgscox 2 месяца назад

This hints at a route to AGI that can be achieved by more than a single company with $Bns - a LLM which is good at reasoning and coordinating agents, still "big" and maybe a few Tn parameters, with an *extensible* plug-in architecture of agents with function-calling to access APIs, where each agent is highly attuned to a *single* task - very small but highly proficient. There's no reason the agents need to be run in the same datacentre, they could even be federated - like Folding@Home. Imagine a world where every website/server has an Agent Interface, as opposed to an API.

@H1kari_1 2 месяца назад

Finally a video with a durantion I can watch without reschudeling my day :D

@evgenyminkevich6587 2 месяца назад

Hahaha. Well said.

@cacogenicist 2 месяца назад

If the subject matter warrants it, long videos are good.

@lasagnadipalude8939 2 месяца назад

Imagine when we'll have actual neuromorphic chips/analogic ai chips as a normal thing in the market like 10 years in the future. With some sort of multi-octopus v96 in every electronic device doing gpt5 level calculations, consuming 2 watts, with multimodal models constantly checking and improving each other generation hundreds of times before giving a perfect answer in a second. Even on things where safety is fundamental

@cagycee5296 2 месяца назад

If Apple releases Siri with agent abilities and with scores above GPT-4, OpenAI will be in trouble. You got to think about how many people have iPhones. Same with Google with Android. OpenAI doesn't have influence like those two because either everyone has an iPhone or Android.

@rchgmer863 2 месяца назад

Bro now stanford is beating ai sheesh, the race is heating up

@jameslynch8738 2 месяца назад

That news about beating the Go AI came from them.

@fire17102 2 месяца назад

How did they control the phone though?

@SedriqMiers 2 месяца назад

Ah the vampiric squid.

@nerlind 2 месяца назад

Recently saw an article by a comp science professor saying how AI is a waste of time. The energy consumption was one of the points. I thought to myself...can he not see the trend? Sigh.

@gunnarehn7066 2 месяца назад

Exactly. As decentralized AI approaches human capacity, so will its energy consumption. We hope.

@nerlind 2 месяца назад

@@gunnarehn7066 I think it's really just a matter of when instead of if. You look at super computers back in the day and their energy consumption vs operational capabilities were very different than what they are today. It is just strange to me how a professor in the field does not understand that. But maybe there is some clout in some nay saying.

@BennySalto 2 месяца назад

What also is bound to happen: all these models are pretty rough when it comes to how efficiently they run, once optimised and perhaps moved away from Python there is double digit performance to be gained on pretty much every device (perhaps even more on non-cuda hardware). These all could very quickly become small & way way faster.

@ByGodsGrace99 2 месяца назад

One thing I am excited about with AI is how it’s going to change gaming. Imagine in GTA talking to an NPC, and the character walking buy says “F*** you” and you respond, and he responds based on how you respond, literally could say anything that’s not coded. Would be insane.

@Valentin_Teslov 2 месяца назад

That was already done. There are demos doing exactly this

@footube3 2 месяца назад

If the key to making smaller models reliable at performing function calling is training them on a thousand examples of each function API they need to support, would I be right to think that we likely need separate fine tunes of Mixtral (the best OS LLM at coding for its size) for SWE Agent, Open Devin & Devika; where each fine tune contains a thousand examples per API that those agents support. If so, given just a few examples per function API, a model like Claude Opus could automatically generate the thousand unique examples for us, making this an easier task to perform. Thoughts?

@footube3 2 месяца назад

Or, perhaps even better, auto-generate the data by using an LLM that can reliably perform function calling against any API without having to be finetuned on it (notably GPT4). For example, run SWE-Agent/GPT4 against half of SWE-bench, and use the function call data that that generates as training data for a SWE-Mixtral finetine. Then test the result by running SWE-Agent/SWE-Mixtral against the other half of SWE-bench?

@CodeCraftTube 2 месяца назад

Multion does not work anymore...bt it was so good

@juancarlospizarromendez3954 2 месяца назад

Training a lot of maths's data does not make a good mathematician, the problem is how the brain that calculates the maths is still being undiscovered.

@roccovergoglini7670 2 месяца назад

Is anyone here STUNNED or SHOCKED? Or BOMBSHELLED? Wes must be slipping with those AI-generated video titles.

@EnricoRos 2 месяца назад

Few issues: this paper is hyperbole, poorly written, with clear differences in parts written by gpt and by the authors, where the main innovation is something that practitioners do every day (adding tokens). Not sure endorsing this sort of work does well to the community.

@Charles-Darwin 2 месяца назад

what's a little more garbage on the dump going to hurt? I'm done with this piss poor 'infotainment', it's just click junk

@NeoKailthas 2 месяца назад

OpenAI think they can decide how much progress humanity should have. They need to be humbled.

@prasiu12 2 месяца назад

So that's where Marvel's Hydra came from ^^.

@mithallinho 2 месяца назад

Wes, do you ever take a break? It's Sunday :)

@mithallinho 2 месяца назад

You are increasing FOMO with this

@inout3394 2 месяца назад

Make movie about Grow API and his custom chips

@Datdus92 2 месяца назад

Neat, I'm early.

@Juttutin 2 месяца назад

We need to backronym AGENT stat!

@Juttutin 2 месяца назад

Artificial Generative Expert Neural Technology is lame. You can do better! And perhaps Quill or his siblings can come up with some ideas.

@BlackMita 2 месяца назад

ChatGPT-5 is gonna be shockingly underwhelming. I can feel it.

@lyndonsimpson1056 2 месяца назад

Yeah if you think about it, the gap between GPT and chatGPT(3) was huge. The difference between chatgpy3 and chatGPT4 was good but not as huge. If the pattern continues we will see a better chatGPT5 but nothing revolutionary. This could be wrong though but i think the progress went up super fast and now the better it gets the slower the progress will be. It's like we are reaching closer and closer to cgi level graphics in games (unreal5 is pretty damn close to movie cgi level) and so the progress slows down of course. Could it be the same here ? It's honestly hard to tell.

@u007james 2 месяца назад

what is the context token size?

@GerardSans 2 месяца назад

I read super agent in a paper title and my instinct is to block the authors and forget.

@eizoone1276 2 месяца назад

Its all about function calling, not about perfomance

@fire17102 2 месяца назад

Does anyone know what they used to control the phone? (After the llm made a decision...) Is it a VM ? Is it an accessibility app? Anyone know how to do this?

@jameslynch8738 2 месяца назад

I'll see if I can find it.

@fire17102 2 месяца назад

@@jameslynch8738 found it?

@testales 2 месяца назад

So they wrote a whole paper about having finetuned a tiny model that is a liiiittttle bit more accuracy in terms of function calling than GTP 4 which was already at over 98% and that this tiny model has fast response time (which is to be expected)?! I don't get where the big adavancment here is and I find it so absurd and craving for attention to compare these small models with GTP4 and claiming they beat it at something. I mean despite tiny models being very light weight and hence fast, you can only do so much with a tiny bit more accuracy in terms of funtion calling and if the model itself is dumb it still can't solve more complex tasks, like for example this often used "write me a snake game" task. You can have a swarm of 1000 dumb agents and they most likely wont be able to get it down except by pulling the solution from the internet. I have yet to see a 2b model that has about the same performance in terms of reasoning than just decent 7b model like Open Hermes 2.5. Also to get anything more complex done other than creating a new calendar entry and creating another search interface for google, you want a big context size, so decreasing it is not an advantage but a disadvantage! Maybe I'm just missing something here.

@pokerandphilosophy8328 2 месяца назад

This also puzzled me. So, I downloaded the paper. Here is what they say: "In our approach, incorporating function information directly into the context is unnecessary, as the Octopus model has already learned to mapping functional tokens to corresponding function descriptions, thereby conserving a significant number of tokens for processing. Given its compact size and the brevity of the context required, the Octopus model demonstrates a reduced latency of 0.38 seconds." So, what they mean is that you can effectively make the model perform its function calling without needing to describe the function calling method in details during inference. The makes the inference less costly and faster.

@testales 2 месяца назад

@@pokerandphilosophy8328 So the new thing (?) is that they not trained a general approach of using functions but trained it how to use a specific API? So bascially what one would usually put into the system prompt is now "hardcoded"?

@ismaelplaca244 2 месяца назад

LOL my grandma beats GPT4 now days

@yukoncorneliusthelegendhim9751 2 месяца назад

dude you're burning me out with all of these tiny, insignificant updates. we get it, there's companies one upping each other daily in a race to be 1st in line with this tech. maybe you should wait a week or two until there's actually some notable news....

@gerardoricor 2 месяца назад

Maybe you shouldn't watch all the videos, or maybe you should unsubscribe this youtube channel and gtfo.

@masteronepiece6559 2 месяца назад

WTF with their Intro citations. They shouldve used IEEE style.

@TT-hl3sm 2 месяца назад

“Octopus” comes from Greek, so the correct plural would be “octopodes.”

@zyxwvutsrqponmlkh 2 месяца назад

Language is not defined by a ridged set of prescriptions, it's a wide scale living meme and it's use is determined by it's users. The correct plural is "octapussies" because language is memetic and octapussies makes for better memes.

@StabbyMcStabStab 2 месяца назад

🐙🐈‍⬛

@njtdfi 2 месяца назад

@@zyxwvutsrqponmlkh"memetic" opinion instantly nullified. people who use that word are so certain they know what they're talking about that they're essentially speedrunning getting stuck in minima. it's adorable.

@zyxwvutsrqponmlkh 2 месяца назад

@@njtdfi I'm a simple memesmith tending my memes and you come and crap all over them, little do you know it only acts as fertilizer.

@somebody-anonymous 2 месяца назад

It's not necessarily octopedes, there's a really nice RU-vid video about it

@Charles-Darwin 2 месяца назад

a 'super agent' as a master of 1-10 compared to gpt - arguably a master of 100s-1000s? such a comparison should lead some to see this is a hype train.

@bagude05 2 месяца назад

Wes, was this video generated with AI?

@1More_Dreamer 2 месяца назад

We should stop giving AI systems Villain names, cause, whats next, Destroyer of GPTs?

@aurora.radial 2 месяца назад

Finally; isn't?

@Fonzleberry 2 месяца назад

'Octopodes'

@fullmentalalchemist3922 2 месяца назад

Not a fan of the two part videos, but im not going to stop watching you. Thanks foe what you do. I get that your audience is pickey AF.

@cccc7006 2 месяца назад

oh, this is not STUNNING or SHOCKING? then I shall skip this one out.

@user-zm8us6tc1b 2 месяца назад

First

@space_ghost2809 2 месяца назад

damn. I almost got it

@user-zm8us6tc1b 2 месяца назад

@@space_ghost2809 Sorry, man, but I'm speed. The fastest commenter alive.

@thecooler69 2 месяца назад

@@user-zm8us6tc1b actually my octopus-agent got here first, but i programmed it to pity the human race and so he gave you the W

@user-zm8us6tc1b 2 месяца назад

@@thecooler69 So humans are still ahead of machines. Glad to know.

@SarahKchannel 2 месяца назад

Your mouse cursor is annoying ! Cant focus on the text with something buzzing on the screen already - get a fidget spinner if you can keep your hand still please !

@neilmcd123 2 месяца назад

These small models are cute and effective and all, but they will never achieve ASI. That’s why the chip race is so important right now. ASI beats everything

@thecooler69 2 месяца назад

the trade off is that you're back to a narrow AI that has to be propped up by traditional programming. a true intelligence wouldn't need "def" anything, it would just open youtube for you. period.

@OscarTheStrategist 2 месяца назад

Your brain with no limbs won’t be able to “just grab a cup with water”, or just “sit at a chair” - if you had no biological limbs but had robotic limbs you would use function calling to interact with them. I hope that makes sense. Function calling is used to interact with the environment that the cognitive entity is in at the time. What we are working toward is a standard method for AI to figure out what it needs to get a task done and do it. This is probably coming out in public soon as in later this year.

@raoultesla2292 2 месяца назад

Dr. Robert Sapolsky; John A. and Cynthia Fry Gunn Professor, Professor of Biology, of Neurology and of Neurosurgery -Stanford Dr. Garry Nolan; Rachford and Carlota A. Harris Professor in the Department of Pathology ,Teal Innovator Award (2012) from the Department of Defense, cloning/characterization of NF-κB p65/ RelA and the development of rapid retroviral production systems -Stanford Dr. Andrew D. Huberman; Associate Professor of Neurobiology and, by courtesy, of Psychiatry and Behavioral Sciences -Stanford Arxix (dot) org. "Octopus v2: On-device language model for super agent" -Stanford I see a pattern.

@weakmindedidiot 2 месяца назад

This has long been possible with small models. I'm not sure why this research paper is news. You can Qstar and Lora a small 7b model to outperform GPT4/Claude at very specific tasks. It's full comprehension of the task that ends up falling apart without a larger model. 7b models are large enough to have a reasonable amount of broad knowledge. For things like art this becomes kinda weird because Claude will continuously out perform on creative stuff versus a small model. It seems the only way to get more creativity is larger models with some allowance of hallucination.