Faster LLM Function Calling - Dynamic Routes

James Briggs

Подписаться 68 тыс.

Просмотров 10 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

29 сен 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 36

@NicholasRenotte 6 месяцев назад

James this is freaking awesome!

@SaulRamirez-x6e 8 месяцев назад

This is great work! It seems like this approach limits us to one or the other, and from some testing the routing appears to be greedy. How would we have the option to use either static and dynamic routing?

@jamesbriggs 8 месяцев назад

We can include both static and dynamic routes in the same route layer - it will decide to not use a route if none are similar enough (ie returns Route(None)) - but we do want to add route specific thresholds to improve the flexibility here, and down the line some auto optimization seems reasonable too

@SaulRamirez-x6e 8 месяцев назад

@@jamesbriggs That's true that we can mix the type of route in the same route layer, but what I mean is that having a static get time method that modifies the query is no longer possible when you add the function schema. You'll end up getting some sort of TypeError or mapping error. Basically, I want to be able to ask "What time is it in Littleton Colorado" and "What time should I eat lunch" both of which would hit a time-like route but require different logic. And because the route layer appears to be greedy in the sense that if I make a route layer with routes ["get_time_static", "get_time_dynamic"] everything seems to route to get_time_static because the utterances are so similar and it was placed first. I hope my concern makes sense.

@jamesbriggs 8 месяцев назад

@@SaulRamirez-x6e okay yes I understand now, it should actually work if you set up the routes with enough utterances to separate them both - what I would recommend is to test with a ton of example questions and where you see the wrong route being chosen add the query to the correct Route.utterances list - doing this with enough examples should result in the correct routes being identified between them It has been awhile since the logic for route selection was written, so I may need to revisit - but the choice should be based on which route scores highest and should be independent of the route order

@MozgGroupVideos 8 месяцев назад

James, thanks for sharing. It might be really useful and probable can make the conversational agents with much more simple architecture.

@jamesbriggs 8 месяцев назад

yeah, I'm yet to build a full agent using this, but I think we could build a fundamentally different type of agent with this idea - I'm sure there would be both pros and cons but I would love to see it done :)

@manojnagabandi9779 8 месяцев назад

Thanks for the video i really liked the concept. I wanted to ask you if dynamic routing being used needs just OpenAI encoding model for finding the dynamic routes? does the dynamic routing support multiple routes triggering if the query asked contains info about multiple routes?

@123userthatsme 5 месяцев назад

I'm also looking for simultaneous function calling possibilities. I realize this gets more and more indeterminate with each layer of abstraction though, so I'm kinda half-expecting that kinda thing not to exist yet.

@GeorgeFoxRules 8 месяцев назад

‘Agentic workflow’ you’ve earned a new badge #rockstar 🎉

@jamesbriggs 8 месяцев назад

haha

@joehenri1 7 месяцев назад

Hi James, what would be the main difference between dynamic router and doing RAG ? Let's say I have multiple functions. I don't want to pass all of them in my prompt. I could write a few query examples for each of the function and use a retriver to get the function that I need. What would be the advantage of dynamic router vs the simple retriver ? Thanks for your great content !

@3GSnapShot 8 месяцев назад

Hey JAMES, i more than sure that openai is doing the same semantic search when choose function calling. Did you try 10 different functions for Dynamic Routes and also have an option bypassing the function?

@anpham 6 месяцев назад

What is the value of this while we already had Nemo GuardRails? Does the GuardRails config essentially do this and more?

@BPRimal 8 месяцев назад

Doesn't OpenAI API already have this? They use OpenAPI schema as well. How is this different?

@jamesbriggs 8 месяцев назад

This is a more efficient and deterministic way of doing what OpenAI do with function calling, rather than providing description for when to use each tool (which is fed into the LLM call, costing tokens), we define a set of queries that should trigger a route (does not get fed into the llm, saving tokens and time). Because we don’t feed descriptions into the llm for decision making we can also scale the number of tools being used to hundreds, thousands, or more tools (ie routes)

@TommyJefferson1801 8 месяцев назад

Good optimization. I'm currently working on a research problem and will be done within this month. Post that my plan is to contribute to your repo. Also why not use a small LLM (phi) to generate code which you can further run on code interpreter? I'm saying small LLM because it generates fast

@jamesbriggs 8 месяцев назад

that'd be awesome would be great to have you contribute! We can integrate OSS LLMs now too, see github.com/aurelio-labs/semantic-router/blob/main/docs/05-local-execution.ipynb

@dusanbosnjakovic6588 8 месяцев назад

Any experiments on how many routes can you support? And for things like intents, how do you think we can route to hundreds of routes potentially?

@jamesbriggs 8 месяцев назад

it is essentially extreme classification problem, and I have seen that specific use-case applied to thousands of classes - so afaict at least that many - I will test this out though soon

@carterjames199 8 месяцев назад

I have an embedding question could you use the local embedding with an open ai model? Or do the open ai model require open ai embeddings?

@jamesbriggs 8 месяцев назад

you can use open source embedding models, see here for example :) github.com/aurelio-labs/semantic-router/blob/main/docs/05-local-execution.ipynb

@thinkerisme4363 8 месяцев назад

Great, thanks a lot. Intent to use it in my search and chat app!

@jamesbriggs 8 месяцев назад

do it and let us know how it goes!

@SimonMariusGalyan 8 месяцев назад

Can I use it with open source LLM?

@jamesbriggs 8 месяцев назад

yes, see here ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-NGCtBFjzndc.html

@carterjames199 8 месяцев назад

Your using gpt3.5 I dont see you specificy a model can this also work with gpt 4?

@jamesbriggs 8 месяцев назад

yes would work better with gpt-4 even, and we also support cohere LLM, and open source LLMs

@carterjames199 8 месяцев назад

@@jamesbriggs great can you showcase specifying the model in your next video great stuff

@carterjames199 8 месяцев назад

@@jamesbriggs I’m going to build a bit with this and use it for one of my first videos on my channel big ups

@郑斑鸠 8 месяцев назад

why it can make this much faster? it used LLM too

@jamesbriggs 8 месяцев назад

less token input/output because we don't need a full agent description, with many different tool descriptions etc - instead we use semantic route to choose the tool, and pass it directly to the LLM which generates the response second reason for faster speed is that no all tools require dynamic (ie generated routes), some can be static, trigger a function to retrieve data (for example) and return directly to final LLM call - with that you're skipping one of the LLM calls

@llaaoopp 8 месяцев назад

The LLM is only creating the function call output. It does not have to generate a „Thought“. This step is done in by the semantic routing which is less versatile but much faster and more stable (in the scenarios the routes account for)