No video :(

Build Open Source "Perplexity" agent with Llama3 70b & Runpod - Works with Any Hugging Face LLM!

Data Centric

Подписаться 10 тыс.

Просмотров 6 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

29 авг 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 36

@donconkey1 2 месяца назад

Thank you for the excellent video. I appreciate all the detailed steps for setting up a vLLM (virtual Large Language Model) on RunPod. It's a cost-effective alternative to purchasing an expensive PC, which could break the bank.

@wadejohnson4542 3 месяца назад

WORKED as advertised! Well done, John. Thank you.

@jeffreypaarhuis8169 3 месяца назад

Great video again. Can't wait for you to try and run the coding models.

@dierbeats 3 месяца назад

Good stuff as always, thank you very much.

@emko6892 3 месяца назад

Nice video !! But it seems most of your videos on AI agents is around web search.

@gileneusz 2 месяца назад

please check "Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models"

@SeattleShelby 3 месяца назад

Wow - big difference between the 8b and 70b models. Do you think the 70b models are good enough for agents?

@supercurioTube 3 месяца назад

Nice reault here with Llama3 70b fp16. The whole time I was thinking "what about groq?" however. Since the inference for the same model appears to be free.

@alchemication 3 месяца назад

Groq has very low rate limits atm. But yeah speed is amazing

@aimademerich 3 месяца назад

Phenomenal

@Schlemma002 3 месяца назад

Hey, amazing content! I was just wondering, you deploy the pods "On-Demand", does that mean you only pay the GPU time you actually needed it? Or does it cost you as long as the pod is running because the GPU is reserved for you or something like that?

@Data-Centric 3 месяца назад

Thank you! Regarding your question, the example in this tutorial charges hourly. However, they do also provide a serverless deployment Going the serverless route means you pay nothing when the GPU is idle. Here's the doc www.runpod.io/serverless-gpu

@MyrLin8 3 месяца назад

excellent :) thanks, how much GPU do you actually need, ? other than a service.

@publicsectordirect982 3 месяца назад

Same question

@mikew2883 3 месяца назад

Good stuff! 👍 So would this be considered just as secure as hosting on Azure? I mean would your company data be sequestered in its own virtual machine environment?

@Data-Centric 3 месяца назад

Great question. At its core, RunPod is a platform that orchestrates GPU resources. The GPUs themselves are provided by third-party data centers. This is what I pulled from their compliance doc: “End-to-end Encryption: Data in transit and at rest is encrypted using industry-leading protocols. This ensures that your AI workloads and associated data remain confidential and tamper-proof.” “Compliance Adherence: Different data centers might have varying compliance certifications. While we ensure that all our partners uphold stringent standards, the specifics of each compliance are directly managed by the respective data center.” Here’s the doc if you want to read further:www.runpod.io/compliance

@mikew2883 3 месяца назад

Awesome, thanks for the info!.👍 I just found what I was looking for with your link. Here are a list of compliances regarding data security. List of Certifications It's vital to understand that while RunPod does not directly hold certifications like SOC 2, ISO 27001, or GDPR, many of our partner data centers do. Here's a quick snapshot of many of the certifications our data centers hold: ISO 27001 ISO 20000-1 ISO 22301 ISO 14001 HIPAA NIST PCI SOC 1 Type 2 SOC 2 Type 2 SOC 3 HITRUST GDPR compliant

@jarad4621 3 месяца назад

Great vid thanks. Please test the new Microsoft Phi 3 medium etc as agents that might work well as it's much better than. Llama 8b

@Data-Centric 3 месяца назад

I'll be doing a series of test for a variety of open source model Phi will be on the list.

@jarad4621 3 месяца назад

@@Data-Centric Awesome thanks, on a recent video i saw something interesting, the lady mentioned the mistral 7b model makes a great agent for some reasons like architecture and native function calling i think, i see a new one was just released, apparently as agent it works better than other popular local ollama ones, but obviously not the 70b level

@RayWrightRayrite 3 месяца назад

Thanks for the video! How do you find out how much compute time/cost of the queries you run?

@Data-Centric 3 месяца назад

For this deployment pattern pods are running 24/7 until you stop them. So your compute cost is charged hourly (quoted when you deploy a pod!). You could probably work out cost per query from the hourly cost.

@RayWrightRayrite 3 месяца назад

@@Data-Centric so basically it is time based then where you start at the beginning on the request and then stop when the request has completed, so it goes off the elapsed time between start and stop?

@NoCodeFilmmaker 3 месяца назад

I'm curious bro why you chose Runpod over lighting ai?

@Data-Centric 3 месяца назад

No reason other than I haven't used lighting ai.

@figs3284 3 месяца назад

Is there anyway you can add a config option in your github for using runpod serverless? It seems like it could be better when doing inference cost wise.

@Data-Centric 3 месяца назад

I'll look into this!

@kreddy8621 3 месяца назад

Thank you very much, great content

@Data-Centric 3 месяца назад

You're very welcome!

@octadion3274 3 месяца назад

why i cant connect to http 8000 ?

@jarad4621 3 месяца назад

I was wondering if this is really going to be cheaper then say using openrouter or together ai lama 70b at 80c per million tokens I been running thousands of apia calls on a quite a bit of data and userd lea the a dollar on the API I'm wondering if 2$ per hour is going to be cheaper I guess if you running agents continuously for hours they can do unlimited stuff in that house then the GPU rental maybe best and API per token will cost more right I think the only way to know is to test and compare right? Also it's possible that the APIs are quantasized more then the runpoid version do you would get better results from the on demand rental, on demand means only when you use it right so you goto turn it off when done always? Unsure how these rentals work been looking at vast can rent and run vllm there as well and it's apparently the cheapest but when I checked prices for your setup where only about 20c cheaper per hour not a huge diff and I think vast has reliability concerns have you looked at it?

@robxmccarthy 3 месяца назад

Honestly, it's pretty hard to compete with the API costs unless you are saturating the GPUs. GPU rental like runpod is great for defined tasks (like summarizing 10,000 papers or something like that.

@jarad4621 3 месяца назад

@@robxmccarthy Awesome thanks for confirming!