Тёмный

RouteLLM Tutorial - GPT4o Quality but 80% CHEAPER (More Important Than Anyone Realizes) 

Matthew Berman
Подписаться 318 тыс.
Просмотров 30 тыс.
50% 1

Full tutorial for how to use RouteLLM.
Subscribe to my newsletter for your chance to win the Asus Vivobook Copilot+ PC: gleam.io/H4TdG...
(North America only)
Join My Newsletter for Regular AI Updates 👇🏼
www.matthewber...
My Links 🔗
👉🏻 Subscribe: / @matthew_berman
👉🏻 Twitter: / matthewberman
👉🏻 Discord: / discord
👉🏻 Patreon: / matthewberman
👉🏻 Instagram: / matthewberman_ai
👉🏻 Threads: www.threads.ne...
👉🏻 LinkedIn: / forward-future-ai
Need AI Consulting? 📈
forwardfuture.ai/
Media/Sponsorship Inquiries ✅
bit.ly/44TC45V
Links:
gist.github.co...
github.com/lm-...

Опубликовано:

 

8 сен 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 139   
@matthew_berman
@matthew_berman Месяц назад
I'm putting the final touches on my business idea presentation that I'm going to give away, which is partially inspired by RouteLLM. Can you guess what it is? 🤔 Subscribe to my newsletter for your chance to win the Asus Vivobook Copilot+ PC: gleam.io/H4TdG/asus-vivobook-copilot-pc (North America only)
@sugaith
@sugaith Месяц назад
I've actually tried this for a Production use-case and it did not work as expected. Simple questions as "What is the name of our Galaxy" were routed to GPT4o. So the "only when absolutely necessary" is absolutely not true Also there is a delay problem here because it takes a while for the RouteLLM model to run and decide the route.
@louislouisius8261
@louislouisius8261 Месяц назад
Does the models have context of each other?
@sugaith
@sugaith Месяц назад
@@louislouisius8261 not by default but you can add that
@negozumbi
@negozumbi Месяц назад
Could you have modify your initial prompt before the actual query? Do you think you would get a better outcome?
@ernestuz
@ernestuz Месяц назад
@@louislouisius8261 you can transfer the conversation pairs, or a subset if you wish, between models.
@DihelsonMendonca
@DihelsonMendonca Месяц назад
💥 Matthew Berman is so clear and concise. He has that natural talent for explaining things in a way that everybody understand, with emphasis on all phrases, clear diction, intonation, that hooks the listener. People like David Shapiro, Matt Wolf, Matthew Berman, who speaks only the necessary, and value every phrase. This is great. 🎉❤❤❤
@bigglyguy8429
@bigglyguy8429 Месяц назад
Thanks GPT
@negozumbi
@negozumbi Месяц назад
WOW. This is crazy. I had a similar idea last week and mentioned to my team this morning 10AM UK time. I’ve just seen this video and it blows my mind that our thoughts were so aligned. And yes. I’m not sure if people really got it together, and Agentic on device system + Routing LLM technology is the future. I believe models will become smaller and smaller. Crazy how this part of tech is advancing so fast. It is as scary as it’s exiting. I learn a lot from your videos that forces me to read papers and try some of the tech for myself. I really appreciate your content.
@DeeDoo-o8o
@DeeDoo-o8o Месяц назад
Has happened to me for almost a year, I think they collect ideas from our prompts and use them. None of the ones I kept locally have been created yet lol
@sinukus
@sinukus Месяц назад
Is the context preserved across models for subsequent queries? E.g. Where is country x? - goes to the weak model What is good to eat there?- goes to the strong model Does the strong model have the context of country X?
@rodrigogarcia9273
@rodrigogarcia9273 27 дней назад
That is a very good point. I guess you can now use whatever model you want to help you answer this important question. Maybe let us know?
@jaysonp9426
@jaysonp9426 Месяц назад
Awesome video, with GPT4o mini though idk how much I'll be routing lol
@cluelesssoldier
@cluelesssoldier Месяц назад
Came here to say the same thing lol. These techniques are only barely outpacing foundation model improvements - costs are dropping like a rock and latency continues to decrease. Its wild!
@magnusbrzenk447
@magnusbrzenk447 Месяц назад
Matt: you need to discuss how the router logic works. I am nervous that it will not be smart enough for my niche business use cases
@longboardfella5306
@longboardfella5306 28 дней назад
I also would want a choice of models based on use case rather than just weak or strong which seems very limited. This is an area to be detailed for sure
@JaredWoodruff
@JaredWoodruff Месяц назад
I can see this being useful in scenarios where you have local Phi-3 models, then escalating to GPT4-Mini. If there was some way of declaring what each SLM is best suited for, then RouteLLM could kick it to the expert that knows best. Great video, as always Matthew! Cheers from Australia!
@rodrigogarcia9273
@rodrigogarcia9273 27 дней назад
Thank you sir for your videos, your clear explanations and the amount of tools you give us visibility to. Please don't stop. God bless!!
@brianrowe1152
@brianrowe1152 Месяц назад
We were already working on the business idea you mention. I assume more will be now too. just makes sense. Many clients can't afford to just pay $20/user/month for everyone in their enterprise when they don't really know if there will be value yet (ROI). The challenge with RouteLLM is that it doesn't quite do everything you suggest because you only get weak and strong. So local.. or 4o... What if I want Local for really private, and then 4o-mini for simple but the llama3.1 might struggle? or I want a poem, and Anthropic is better. The model limit is the challenge - great solution to start saving $$, but right now its like 4o and 40-mini are the only logical choices for an enterprise and I'd suggest a full solution needs a little more subtle routing options.. Great stuff.
@netherportals
@netherportals Месяц назад
I did not expect llamas to be so important in my life, a person I know has a llama as a pet too
@drogokhal3058
@drogokhal3058 Месяц назад
Hi, thanks for the great video. I love watching them. The idea about RouteLLM is great, but I would say it lacks control. The control needs to be reliable and avoid hallucinations, which is a huge problem with LLMs. For me, it is not clear about the criteria. On what basis is he trying to route ? For production, I think it is better to develop an agent whose task is to understand the requirements and, with some instructions or even fine tunning some LLM model, actually perform the routing. LLMs are not ready for general-purpose use. If you need an AI agent to perform a specific job, then make sure you have a very good specification, good prompt engineering, use function calling, and do a lot of testing. I am using the LamaIndex framework, and I want to do the same LLM routing thing, but using my own agent or agents, which should provide me with more control over verification and decision making. I did some some agents with Lamaindex and it is working OK so far. I can mix models and for output verification, using always top tier LLM model.
@TheRubi10
@TheRubi10 Месяц назад
Sam Altman already mentioned that, in addition to trying to improve the reasoning ability of the models, another aspect they were working on was a mechanism capable of discerning the complexity of a task to allocate more or fewer resources accordingly. Currently, LLMs always use all resources regardless of the task, which is wasteful.
@jtjames79
@jtjames79 Месяц назад
Sounds like consciousness might be an economic problem. Interesting.
@sauxybanana2332
@sauxybanana2332 Месяц назад
@@jtjames79 now you mentioned it, right on
@jtjames79
@jtjames79 Месяц назад
@@sauxybanana2332Hypothesis: Consciousness (or a form of it) is a market of ideas. The currency is attention. Food for thought.
@jaysonp9426
@jaysonp9426 Месяц назад
Great video! With GPT4o mini idk how much I'll be routing though lol
@paul1979uk2000
@paul1979uk2000 Месяц назад
This could work really well with locally run models, especially if using specialised models for given tasks, having multiple models, each one good in their own respected area and much lighter on the hardware because of not being a jack of all trade and could potentially be a game changer for running much stronger A.I. on our local hardware, especially if there is a way to have it where it does a good job at picking the right model for the given task you ask of it and honestly, storage space is a lot cheaper than vram is, so if A.I. models can be switched in and out of memory on the fly, having a lot of A.I. models on your hard drive that are good in their own area isn't going to be that big of a deal but could boost the quality of A.I. massively at a local level while not needing a crazy amount of vram. Mind you, all this only works if A.I. models can be switched in and out of memory on the fly and do so quite quickly that from an end user, it all seems like the same A.I. model, this also would only work if you have a master general A.I. model that can do well at delegating to other specialised A.I. models, after all, having a few hundred GB's of A.I. models isn't going to be that big of a deal with how cheap storage is, and it sure would be a lot cheaper and faster to run if they can be switched in and out on the fly.
@modolief
@modolief Месяц назад
Matthew, can you do a deep dive on how to choose hardware for home use? Can we use Linux? And also, what hardware choice might we make if we wait? Should we wait? Should we buy now to avoid a signed GPU? Etc.
@clint9344
@clint9344 Месяц назад
I agree, it is getting confusing for the beginner..its like learning a whole new language again...lol Great vids keep up Great work be in peace God speed.
@aiplaygrounds
@aiplaygrounds Месяц назад
That is a task not a prompt 😮it is useful for complex tasks, setting up a system prompt and classifying the task level. Great video 👏
@richardkuhne5054
@richardkuhne5054 Месяц назад
I was actually thinking to use it the other way around. Route LLM to have a strong cheap model and a weak even cheaper model and then add it to MOA together with strong frontier models to aggregate the responses so you could potentially build something that is slightly better than gpt4-o while still reducing your cost a bit. Another step would be to also chain it together with memgpt to have a longterm memory. And then use the endpoint in aider as the ultimate coding assistant 😅
@toadlguy
@toadlguy Месяц назад
Matt, do you know how this (or your "business idea") will deal with context? Most chat uses for LLMs involve a conversation. I'm not sure with this sort of simplistic routing that the only way to deal with context is to keep using the model the conversation has started with, although you could keep a separate running context that you pass back to whichever model is being used - I just don't know how well that would work, though. BTW, your "business idea" is what many people are currently working on (or something similar), even some LLM providers may be doing this behind the scenes. There is a distinct lag and additional token cost with more complex queries on GPT-4o mini that may also suggest a similar approach.
@dimii27
@dimii27 Месяц назад
but you do it locally with minimal cost instead of paying the bill to openai
@jean-paulespinosa4994
@jean-paulespinosa4994 Месяц назад
did you know that gpt-4o and mini can run python code? I have tried it and it worked, but only with python, hopefully the other LLM will soon follow with this ability and maybe add more programming languages that can be run directly in the prompt.
@sauxybanana2332
@sauxybanana2332 Месяц назад
you dont need to run llama3 with ollama in terminal, ollama app took over the 11434 port already, as long as you invoke local models with the matching model name, ollama will do the work
@LuckyWabbitDesign
@LuckyWabbitDesign Месяц назад
If Llama 3.1 is looking at "9.11" > "9.9" as a String data type and not as a Float data type, 9.11 is larger because it contains more characters.
@michaelslattery3050
@michaelslattery3050 Месяц назад
Another excellent video. I wonder if this can judge multiple models, and if not, when they might add it. You might want to choose between a smart expensive model, a fast model, a cheap model, and several specialized fine-tuned models. I've tried to do my own LLM routing in my agents, but mine is naive compared to this thing. Mine just asks the weaker LLM which LLM should do a task prompt, and it's often wrong, even when I supply a bunch of multi-shot examples.
@cristian15154
@cristian15154 Месяц назад
🙌 How nice would be if gpt4 had the speed of groq. Thanks for the video Matthew
@j-rune
@j-rune Месяц назад
Thank you, I'll try this out soon! ❤
@asipbajrami
@asipbajrami Месяц назад
hopping langchain to implement this...because if frameworks (for agents) don't support this, it will be hard for developers to implement it
@zhengzhou2076
@zhengzhou2076 Месяц назад
Thank you! Matthew. I like your video very much.
@laiscott7702
@laiscott7702 Месяц назад
another game change project, excited for the next video
@mohdjibly6184
@mohdjibly6184 Месяц назад
Wow Amazing...thank you Matthew :)
@leonwinkel6084
@leonwinkel6084 Месяц назад
How does it decide which model to go for. Knowing the mechanism would be useful. Also possibly having more than 2 models would be great. For example if I tailor a sales message, it’s possible to do it with a 7B model but with a 70B model as far as I experienced it’s much better. I guess everyone needs to test if for their usecases. Overall if the selecting mechanism works correctly it’s great tech and highly relevant. (I don’t need gpt4o for example to tell me that with rstrip(„/„) I remove the last /. I’m sure mixtral etc can do that) If it goes wrong just in few times in production, where people are in an interaction with the bot, then it would not be worth it since the quality of the product cannot be guaranteed. Anyways, it’s all in development, thanks for the video!!
@punk3900
@punk3900 Месяц назад
That's the way to go! Great and thanks for sharing!
@punk3900
@punk3900 Месяц назад
@Matthew_bberman wow, did I won something?
@jamesyoungerdds7901
@jamesyoungerdds7901 Месяц назад
Really great stuff, Matthew - thank you! I noticed the prompt was to keep the cost of the call to the MF router api down to 11.5c - does this mean the router llm uses tokens per cost or does that run locally?
@jeffg4686
@jeffg4686 Месяц назад
Matt, you need to get with Groq to improve their process. They MUST get rid of the just make things up behavior when it doesn't have data. That is REALLY annoying. I have to ask it now - did you make that up?
@flavb83music
@flavb83music Месяц назад
Great getting started thanks. But how to instruct Route LLM the conditions for choosing the strong or weak model ?
@crazyKurious
@crazyKurious Месяц назад
By the way, why do this at all, I simply did ICL, with llama3:8b, and it gives me {key:value} in response whether the query requires complex workload or lite workload and then I case switch. Simple, now I have removed llama3:8b to gpt-4o-mini. We need to get over this increase security crap, all these apis provided by large companies are safe and you have written guarantee.
@vincentnestler1805
@vincentnestler1805 Месяц назад
Thanks!
@matthew_berman
@matthew_berman Месяц назад
Thanks so much!
@AhmedMagdy-ly3ng
@AhmedMagdy-ly3ng Месяц назад
Thanks a lot Matthew 😮❤
@Jeremy-Ai
@Jeremy-Ai Месяц назад
Ok. Matt… I can sense your excitement, and recognize your potential. I am grateful for your work so I will return the favour. “Don’t let your hard work become a fundamental lesson in business foolishness” Take a breath before speaking openly. Seek wise and trustworthy leadership and counsel. I could go on. I don’t have to. Good luck Jeremy
@gaylenwoof
@gaylenwoof Месяц назад
I tried using GPT-4o to solve a seemingly simple problem, but it failed. I'm not the greatest prompt engineer so the failure might be on my part, but I spent several hours trying to refine the prompt and never solved the problem. 👉Question: If GPT4o can't solve it, is there much point in spending hours going from one to another trying to find an AI that can do it? Or is it more like: "If GPT-4o can't, no one can!"? The problem, in case you are interested: My problem is to translate the shape of a object (e.g., a boardgame miniature) into a string of numbers that represent the shape and then characterize other properties of the miniature (e.g., height, width, distribution of colors...). Procedure: I upload a photo of a miniature laying on graph paper. Each cell of the graph paper is numbered. The AI's job is to determine which cells were covered by the object, then list those cell numbers. That list would be the numerical representation of the shape. GPT-4o cannot consistently give correct answers for different miniatures. Perhaps this problem requires too much genuine understanding of visual data and, thus, I may need to wait until we have something closer to actual AGI? Or is there some AI that is better at handing the "meaning" of visual data better than GPT-4o?
@SvenReinck
@SvenReinck Месяц назад
There are times when I find the answer of 3.5 better than 4o. Because 4o sometimes tries too hard to be helpful and it gets annoying. Also it seems 4o can’t just give part of an answer… It has to repeat the complete answer.
@wardehaj
@wardehaj Месяц назад
Great video and love your great businesses idea. Thanks a lot! Does routeLLM support visual input or only text input?
@matthew_berman
@matthew_berman Месяц назад
I believe only text right now
@clint9344
@clint9344 Месяц назад
Good question...would this be where the agents come into play..say a visual agent?
@crazyKurious
@crazyKurious Месяц назад
No, you made a mistake, Apple wont route it to chaTGPT they will route it to a bigger version of the same model in same Silicon slice in the cloud. ChatGPT is only used if you want to use it.
@ernestuz
@ernestuz Месяц назад
I've been doing something similar with an 7B model as front end, if it can't answer the question, it forwarded it to a few bigger models, and then cook a response with all the answers, I call it TPMFMOEX because it's catchy (the poor man's fake mixture of experts). Thanks for your videos!
@hqcart1
@hqcart1 Месяц назад
how do you know if it cant answer?
@ernestuz
@ernestuz Месяц назад
@@hqcart1 It does, not me, "If you are not sure you can answer the query...." There is also a few tens of Response/Answer pair examples injected at start into the context giving examples of what it can and can't answer. A poor man's solution.
@hqcart1
@hqcart1 Месяц назад
​@@ernestuz i am not sure if such a prompt would work for any LLM, you will definitely get answers where LLM has no clue. you should test it for like 1000+ prompts to make sure all LLMs follow the instructions..
@ernestuz
@ernestuz Месяц назад
@@hqcart1 Well, the model I am using tends to be correct, and the one before it as well. At the moment I am giving around 30 something pairs for context injection, the prompts (It has more to it because it has to use tools), and some silly support by the tool that forwards the query if necessary (basically a list of what / where). Because I am using agents, if the task fails, the model can be told and retry with some instructions. It's really a poor man solution, nothing fancy. EDIT: It looks like I am keeping the models secret, not really, the old one was Mistral 7B V0.2 and 0.3, then Llama3 8B and now, since last week, it's actually Codestral (not for coding, it just turns out to be great for this project in particular). Think of every pair you are injecting as a prompt on their own, add a "cost" to forwarding queries, you know, the tools can answer back. I also tend to go to ChatGPT and Claude to ask their thoughts on my prompts, ask them examples, AI produces excellent prompts for AI. Think you can inject context midway during the task, in between generating completions, if the model doesn't go in the direction you want....
@ernestuz
@ernestuz Месяц назад
@@hqcart1 I wrote a long answer that it seems to have been dropped, grrrrr. In short I am using 34 pairs to prime the context and the prompt. There is also a "cost" added to the query, The way the model sees the upstream models is through tools with the associated "cost". I've been using Mistral 7B, though I tried phi and Gemma too (thats another story) and now llama3 8B and Codestral is next, not a code related task, but this model is a very good model for the task.
@smokewulf
@smokewulf Месяц назад
My new LLM stack: AIOS, LLM Router, Mixture of Agents, Agency swarm, Tool Chest, Open Agents. I am working on improvements. Also, a reason and logic engine, a long-term memory engine, and a continuous learning engine
@moontraveler2940
@moontraveler2940 11 дней назад
Is it possible to use LM studio instead of ollama? Would be nice to see a tutorial how to set it up with cursor.
@togetherdotai
@togetherdotai Месяц назад
Great video!
@sugaith
@sugaith Месяц назад
I've actually tried this for a Production use-case and it did not work as expected. Simple questions as "What is the name of our Galaxy" were routed to GPT4o. Also there is a delay problem here because it takes a while for the RouteLLM model to run and decide the route.
@MingInspiration
@MingInspiration Месяц назад
that's a shame. i was hoping that i can have 10 different models behind the scene and let it decide which one to pick to do the job. for example if it knows which one writes the best sql, and which one does the best python, it would just pick the right one. i think the idea is absolutely phenomenal, especially given now the number of models are exploding. it's like googling for the best model to answer your queries
@sugaith
@sugaith Месяц назад
@@MingInspiration yes it would be great if we could easily attach our own trained model to decide the route
@hqcart1
@hqcart1 Месяц назад
It wont work of course! what the hell were you thinking??? it's a bad idea, and will fail, the best way is YOU are the one who should know how to route your SPECIFIC prompts to what LLM based on intensive testing, not a random routeLLM to decide for you...
@mdubbau
@mdubbau Месяц назад
How can I use this with an AI coding assistant in VS Code? Specifically that the ai assistant will use local llm for most tasks and cloud llm for higher difficulty.
@donkeroo1
@donkeroo1 Месяц назад
A model that feeds a model that feeds a model that feeds yet another model. What could go wrong.
@donkeroo1
@donkeroo1 Месяц назад
@Matthew_bberman BTW, love the content.
@totempow
@totempow Месяц назад
Suppose I was using something like Perplexica and OAI and a local LLM like Llama3, what API would it call to at that point? Hypothetically speaking, of course. Also cool delivery.
@restrollar8548
@restrollar8548 Месяц назад
Not sure how useful this really is. When you're writing production agents, you need to send appropriate prompts to the weak/strong models to make sure you get consistency.
@hqcart1
@hqcart1 Месяц назад
it's useless, you are the one who should route your specific prompts based on testing testing & testing.
@LiquidAIWater
@LiquidAIWater Месяц назад
Say If AI LLMs are like workers, then why would a company hire a PHD in "everything" for some simple task? When you think about it, this is how human organizations work.
@bm830810
@bm830810 Месяц назад
okay, this is fine for single questions, but what about context?
@julienduchesneau9747
@julienduchesneau9747 Месяц назад
I dont have an High end CPU nor GPU, so I am not sure if going local is doable for me, I am always confuse looking at what I need to go local, would be nice to have a video that would clarify a bit what you need as a minimum gear to run an acceptable AI. What should people aim in term of PC to enjoy local AI? I know things change fast, can we hope to have models so small one day to run on an old 2010 gear???
@firatguven6592
@firatguven6592 Месяц назад
I wanr the best not the cheapest, therefore better if you show how to integrate gpt-4o(mini) or claude in to mixture of agents as main llm. MoA is really a fantastic framework. I am sorry but i don't see any benefit in this routellm for me. If I can use groq or good open source .models, i don't really want to save any cost there..we need the best of best...therefore improvement in MoA would be nice like custom system prompt or ability to use top frontier models
@Omobilo
@Omobilo Месяц назад
Mate great content.. Any LLM or platform that I can point it to or give it a website to analyze its content with some specifics mentioned in the prompt, instead of me compiling the whole website copy into some document first (tedious) as a prompt?
@nasimobeid2945
@nasimobeid2945 Месяц назад
Does this support which llms can be experts of what? If not, I can only see it being useful if I had a very expensive model and a very cheap model.
@alessandrorossi1294
@alessandrorossi1294 Месяц назад
Any idea why they use Python3.11 and not the lastest version (Python 3.12)?
@kasinadhsarma
@kasinadhsarma Месяц назад
thankyou
@mirek190
@mirek190 Месяц назад
Have you seen leaked tests? Seems llama 3.1 70b is better than gpt-o and small llama 3.1 8b is smarter than "old" llama 3 70b! Insane.
@limebulls
@limebulls Месяц назад
How is it compared to GPT4o-mini?
@rbc2511
@rbc2511 Месяц назад
Does this work within agent work flows, where the general instruction set is extensive, even though the specific instance of the task may not be complex
@modolief
@modolief Месяц назад
Thanks!!!
@knowit999
@knowit999 Месяц назад
the best use case for me would be if I can use it purely locally and have a small model like llama3 for most of the tasks and a vision llm. so it needs to know what choose. can this be used for that? thanks
@dani43321
@dani43321 Месяц назад
Do a video on Mistral NeMo
@rafaeldelrey9239
@rafaeldelrey9239 Месяц назад
Would it work for rag use cases, where the prompt can be way larger?
@mikestaub
@mikestaub Месяц назад
This seems like a hack. The LLM itself should be able to do this internally eventually. Perhaps that is what project strawberry is about.
@eightrice
@eightrice Месяц назад
but what LLM is the router using?
@kbb8030
@kbb8030 Месяц назад
Have you played with Fabric by Daniel Messler?
@hqcart1
@hqcart1 Месяц назад
it's a bad idea, and will fail, the best way is YOU are the one who should know how to route your SPECIFIC prompts to what LLM based on intensive testing, not a random routeLLM to decide for you...
@OMGitsProvidence
@OMGitsProvidence Месяц назад
It’s not a bad concept as a module or tool in a more comprehensive system but I’ve always thought a shortcoming of many models is token waste
@hqcart1
@hqcart1 Месяц назад
@@OMGitsProvidence you cant use it for serious production product.
@jtjames79
@jtjames79 Месяц назад
How much is your time worth? That's a level of micromanaging that's generally not worth it to me.
@BunnyOfThunder
@BunnyOfThunder Месяц назад
Yeah this isn't meant for 1 person typing prompts into their keyboard. This is for large-scale production systems that are sending thousands of prompts per second. You need something automatic, fast, and cheap. I.e. not a human.
@DihelsonMendonca
@DihelsonMendonca Месяц назад
It's a bad idea for one person using a laptop to chat in the free time. Think about great business, corporations who deals with millions of tokens ! 👍💥💥💥
@actorjohanmatsfredkarlsson2293
@actorjohanmatsfredkarlsson2293 Месяц назад
GPToMini support for RouteLLM? API calls LLMs are still to expensive. What are the hardware requirement for building a serious weak model local inference.
@natsirhasan2288
@natsirhasan2288 Месяц назад
Assuming you want to run a 8b model it would need 6-8gb gpu to run smoothlly
@actorjohanmatsfredkarlsson2293
@actorjohanmatsfredkarlsson2293 Месяц назад
@@natsirhasan2288 My MacBook would handle that, but it wont handle the 70B. I would like to have a weak model that's actually on the level of GPToMini (or at least the bigger groq models). I'm guessing this would require a larger graphic card?
@natsirhasan2288
@natsirhasan2288 Месяц назад
@@actorjohanmatsfredkarlsson2293 70B model even in Q4 need like 40-50gb gpu to run... have you try gemma2 27B? The model beat llama 3 70b, its worth trying.
Месяц назад
Does it still makes sense after 4o mini?
@JustinsOffGridAdventures
@JustinsOffGridAdventures Месяц назад
If the snake game didn't' get verified, then you have no idea if you got a correct answer. Right. So, let's say that it failed the snake code. Well then, the prompts answer wasn't very good, was it? Also, during the apple sentence test always give the AI a chance to correct itself. If AI is working properly it should learn from it's faults.
@Robert-z9x
@Robert-z9x 17 дней назад
Loudermilk
@VastCNC
@VastCNC Месяц назад
Can it work 3 tier? If I had a phi model locally, gpt4o mini, and Claude 3.5. I think it would make for an awesome dirt cheap setup.
@toadlguy
@toadlguy Месяц назад
How would a router determine whether gpt4o mini or Claude 3.5 (I assume Sonnet) was more appropriate? They generally produce similar results.
@VastCNC
@VastCNC Месяц назад
@@toadlguy better context length, and code specific tasks. Phi3 128k could max out gpt4o mini and there could be a case to route large outputs with additional context to Claude. In an agentic workflow, Claude could also be the “fixer” when there are errors on gpt4o mini outputs.
@OmerNesher
@OmerNesher Месяц назад
ok, this is massive! how can this be integrated in OUI ? make it a pseudo LLM model?
@OmerNesher
@OmerNesher Месяц назад
@Matthew_bberman I am what? the victor? victor of what?
@OmerNesher
@OmerNesher Месяц назад
@Matthew_bberman it wasn't even in my mindset 😁 that's awesome. What's the next step? Thanks!
@OmerNesher
@OmerNesher Месяц назад
?
@gani2an1
@gani2an1 Месяц назад
Is this not what aider does? My aider uses gpt4o-mini as the weak model
@pacobelso
@pacobelso Месяц назад
Without a shared context window this routerLLM is useless 🤷🏼
@throwaway6380
@throwaway6380 Месяц назад
Why do you dress like that...
@Legorreta.M.D
@Legorreta.M.D Месяц назад
Because it makes him look like Skittles in the process of being digested. He likes it. 🍬🍭
@garic4
@garic4 Месяц назад
I had high hopes for Gemini Advanced, but it was such a letdown. Unintuitive, glitchy, and the results were just awful. Don't waste your time. #disappointed
@blackswann9555
@blackswann9555 Месяц назад
A.I. should be almost free or free with Ads soon.
@8bitsadventures
@8bitsadventures Месяц назад
Ok . 3rd here
@khanra17
@khanra17 Месяц назад
Openrouter Auto is faaaar better. In fact Openrouter is awesome!!!
@crazyKurious
@crazyKurious Месяц назад
No, you made a mistake, Apple wont route it to chaTGPT they will route it to a bigger version of the same model in same Silicon slice in the cloud. ChatGPT is only used if you want to use it.