No video :(

ZERO Cost AI Agents: Are ELMs ready for your prompts? (Llama3, Ollama, Promptfoo, BUN)

IndyDevDan

Подписаться 17 тыс.

Просмотров 6 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

28 авг 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 31

@kenchang3456 4 месяца назад

And now I know why I subscribed with alerts on.

@thunken 4 месяца назад

would be cool if you had finger puppets :)

@drlordbasil 4 месяца назад

second.

@miikalewandowski7765 4 месяца назад

😂😂

@indydevdan 3 месяца назад

lemao

@alew3tube 4 месяца назад

I would add to your list: tool/function calling as fundamental for a LLM

@indydevdan 3 месяца назад

Yeah great call out. Definitely a fundamental requirement for LLMs. Especially for agentic workflows.

@AGI-Bingo 4 месяца назад

I currently enjoy the groq free era, and i dont mind using it for development, but as production goes, i wouldnt want my or others private data going to any corp, so going local is definitely on the way to go

@bogdantanasa1374 29 дней назад

Thank you very much for sharing this, I've hit a few bumps in the road to make it work but managed it so thank you for the details. I find interesting that when asked to chose from a list of options, models sometimes choose sentence case instead of the exact lower case as instructed - i'd think no biggie as some people would anyway chose when just answering Yes or No :) In more advanced tests I found that the answers were ACCEPTABLE (though asserts would not be easy to describe :( ). Besides obviously checking myself, I checked a second opinion from a more expensive and advanced model and some of those answers were found to be acceptable because the reasoning makes sense - an example with the SQL NQL test. It would be interesting promptfoo would include a 'get a second opinion' with a different agent; after all, we're trying to automate everything so why not the test evaluations themselves?:)

@reagansenoron6763 3 месяца назад

Hi Dan, thanks a lot for sharing your knowledge. With around 700K open-source LLMs around, its really hard to pick a decent one. Usually we sort it by most downloaded or most likes but its not enough. This benchmarking will really help. BTW, I followed the readme and running bun elm throws 'error: Script not found "elm"

@wellbishop 4 месяца назад

Pretty smart guy you are. Tks for sharing your divinity with us, poor mortals.

@larsfaye292 4 месяца назад

In my opinion, the LPU (such as what Groq is developing) is going to be built into future PCs, dedicated for the sole task of running local models.

@indydevdan 3 месяца назад

I'm betting on apple doing this with the M4 / M5 devices. Not the LPU exactly but the 'apple LPU' equivalent.

@fontenbleau 2 месяца назад

any models needs huge amounts of memory, chip even don't matter because reflect just speed, but without hundreds Gbs in SSD and RAM it won't even starts. In all Apple history any memory play the smallest role, again they bring all attention on CPUs but not important and really expensive part, Apple stuck in their doctrine, it's hopeless.

@acllhes 4 месяца назад

Good stuff

@mylesholloway9223 3 месяца назад

Left a comment under the Git Hub repo, may have forgotten to include the package.json. When running "bun i" I get an error because there are no dependencies in a package.json

@kterry697 3 месяца назад

Yes... forgotten to include the package.json. When running "bun i" I get an error because there are no dependencies in a package.json

@indydevdan 3 месяца назад

wow massive noob mistake. fixed. thank you.

@EternalKernel 4 месяца назад

you should test claude3 hiaku

@6lack5ushi 4 месяца назад

I love your videos and posts, but even with ELM's the biggest Issue I find. unless it's a nasty bug inherent to my system. INSTRUCTION FOLLOWING! I would rather have the legacy GPT-4 than any 4-turbo model. because it follows commands WAY BETTER! I have a terrible feeling MMLU and other benchmarks are hiding the fact models may get more capable but less reliable. or "lazy" I thought it was bloated initial prompts (and human moderating and creating illogical gaps where it just omits) but I think its more sinister. We are optimising for the benchmarks but do not bench mark instruction following in said benchmarks!

@robertmazurowski5974 4 месяца назад

This is not a psychological phenomen, I used gpt 4 since the beginning and I could see when they were dumbing down maybe quantizing their model. Literally something has broken all my automations last weekend. I changed to the new GPT4 turbo model which is supposed to be better than the previous ones according to benchmarks. Unfortunately it sucks. It cannot catch instructions like the previous one used to.

@6lack5ushi 4 месяца назад

@@robertmazurowski5974 Same issue, I use "GPT-4" endpoint that points to GPT-4-0613* (I think). currently the best but also before the super massive context. but 100% same thing happened to me. Try Lamma 3 I had some success but nowhere near legacy GPT- 4

@BangaloreYoutube 4 месяца назад

Now I'm sad my workflows didn't breakare they not complex enough 😅 I switched a few to ollama through grok nothing seems broken yet!!

@robertmazurowski5974 4 месяца назад

@@BangaloreRU-vid In my case it worked before swapping to the new GPT Turbo model. They model doesn't catch instructions properly. Before last weekend GPT 4 turbo was able to process 3-4 function calls based on a prompt, and then answer with another 3-4 functions calls if needed. It cannot do it any more.

@indydevdan 3 месяца назад

"models may get more capable but less reliable" - this is a great call out and observation. I agree with you, instruction following is ULTRA important especially as models improve. If they can't follow your instructions, the capabilities they have are essentially useless. Another interesting finding with MMLU and other benchmarks is that model providers have started TRAINING ON THE BENCHMARKs which is you've trained a model before, you know is a HUGE problem (model contamination). Both of these call outs highlight the importance of what we discuss in this video: having your own domain specific prompt tests to validate the 'true' value of the LLM for your use cases.

@fraugdib3834 Месяц назад

Effin' Righteous man... Have a metric --> Use it often --> Know exactly where you stand in reference to an ever expanding whirlwind of clickbait and noise.

@fontenbleau 2 месяца назад

Apple models not impressed me at all, maybe they deliberately published only the smallest and useless ones. Normal quality LLM like llama 3 70 billions in best quality 8bit GGUF need 90GB of RAM just to start. All these hardware makers can't provide such, everyone showing the powerful CPUs which will be wasted in that laptops as Microsoft required just 16Gb of RAM. Using SSD here impossible, wearing out fast. 128GB of DDR4 costs me exactly $400 which is half of decent GPU or all these fancy laptops.