Тёмный
IndyDevDan
IndyDevDan
IndyDevDan
Подписаться
On this channel we think, plan, and build.

Right now on the channel, we're on the path to evolve into Agentic engineers.
Engineers that build software that works for them while they sleep.

Here are principles of my engineering philosophy and ideologies this channel holds as facts.

- Avoid hype, focus on real valuable tools, technology and products.
- Build real products. There are enough news, hype, trend channels. Let's make something real.
- Listen to feedback but always think for yourself.
- Use the best technology for the job period full stop.
- Keep learning, forever.
- Do what you share. I don't share anything on this channel I'm not betting on with my time, energy and money.
- Cancel out the noise and focus on the signal of value creation.
- Great things happen in the flow. Look for it everyday.

I'm not the perfect programmer, designer, or creator but to succeed you don't have to be perfect, you just have to try, over and over in success or failure.
Комментарии
@pubfixture
@pubfixture 22 часа назад
4-way gold medal of 7 contestants means you need harder questions at the top end to separate them out.
@shockwavemasta
@shockwavemasta День назад
Thanks for continuing this series - it's been super helpful
@davidpower3102
@davidpower3102 3 дня назад
I found it hard to understand how you benched the models. Was this mostly down to personal opinion? Maybe you might explain your tests before discussing the results. Your test tooling looks really nice!
@indydevdan
@indydevdan 2 дня назад
100% personal opinion and vibes. I use promptfoo for more hands on assertion based testing. This notebook is more about understanding what the models can do at a high level.
@CheekoVids
@CheekoVids 4 дня назад
I know you don't do much of model training on this channel. But have you considered training the some of the local models on your good test results then seeing how the refined models perform?
@wedding_photography
@wedding_photography 4 дня назад
12:55 you completely missed that llama3.2:1b failed at SQL. It's missing authed=TRUE.
@indydevdan
@indydevdan 2 дня назад
nice catch
@wedding_photography
@wedding_photography 4 дня назад
"ping" is the dumbest test I have seen. Go tell random people "ping" and see what they respond with.
@DanielBowne
@DanielBowne 4 дня назад
Hands down before local model I have seen for function/tool calling.
@husanaaulia4717
@husanaaulia4717 4 дня назад
isn't qwen2.5 has 3B parameter model?
@Jason-ju7df
@Jason-ju7df 4 дня назад
I wish that you put the model parameter sizes in the video description. Makes it easier to really give weight to your comparisons when you're comparing a 1B model to a 7B model
@johnkintree763
@johnkintree763 4 дня назад
Thanks for including generation of SQL queries among the tested tasks. The ability of models to interface with databases is crucial.
@ariramkilowan8051
@ariramkilowan8051 4 дня назад
Would be cool to test image understanding. Basic OCR to start with then counting objects and doing reasoning over the images. LLM providers often tell us what their models can't do, or can't do well. Use that info as a signal of improvement would be very useful IMHO. Best still is that you can use code to check exactly how correct each model is, this can be harder when dealing with text where you need a human judge or an LLM as a judge (which then needs to be aligned with a human anyway). Also thanks for the video, I check in every Monday. Keep on keeping on. 👍
@zkiyyeller3525
@zkiyyeller3525 4 дня назад
THANK YOU! I really appreciate your honest testing and taking us along with you on this journey!
@peciHilux
@peciHilux 4 дня назад
Wow. nice. What I am missing is technical metrics for comparison, like response time, memory used to run the model...
@NLPprompter
@NLPprompter 4 дня назад
I'm curious do you use 5k context in ollama default model right?
@aerotheory
@aerotheory 5 дней назад
Lots of subs to be had in the SLM area, so many edge cases. Try 70b_q4 compared to 8b models.
@matthewjfoster1
@matthewjfoster1 5 дней назад
good video, thanks!
@Canna_Science_and_Technology
@Canna_Science_and_Technology 5 дней назад
Lama 3.2 hallucinates really bad.
@zakkyang6476
@zakkyang6476 5 дней назад
Interesting project.Since I am a lazy person, I will use another LLM model to score the output each time rather than manually.
@DanInVirtualReality
@DanInVirtualReality 5 дней назад
Awesome! I'd love to see some FIM prompt tests on FIM-purposed models like deepseek coder - I had that as my 'auto complete copilot' in Twinny on VSCode before I moved to Cursor and I was impressed with how often it nailed it, for a really small model (no point in auto complete if it can't offer a completion quicker than my own brain and fingers! And I only have a lowly 1060 6Gb 😅). It strikes me that FIM code completion could be a way to leverage those model's strengths in code reasoning, which could outperform natural language reasoning in instruct-tuned models of a similar size. e.g. a logical setup and a logical next step presented as code with a FIM request on the intermediate action... I'm thinking of tool-choice in particular for my own use case. All the assistant demo scripts I've seen show picking between, like, five tools max which is not a realistic sized toolbox available to an assistant. Keyword-based context-stuffing would help of course, or RAG techniques on the tools and their descriptions, but timeliness may prevent that in-practice. I can imagine code with concise named and typed functions declared, and a comment describing the purpose of the next step - that should perform well with these, I suspect. I just haven't got to the experiment yet 😄 The main benefit is it would address your particular displeasure with extraneous explanation - if present at all it would be commented out code or, at worst, print/log statements.
@samsara2024
@samsara2024 5 дней назад
Thanks for the video! Could you make a tutorial in which a local installation of Llama can learn from the chats you have with the IA. I mean you just talk and somehow it is storing this information internally and not losing it when you close the computer.
@prozacsf84
@prozacsf84 5 дней назад
Bro, it's useless to compare without o1-preview. It is times better
@indydevdan
@indydevdan 2 дня назад
This was a local model focused test. o1-preview would score 100% on these tests, nothing to learn there.
@prozacsf84
@prozacsf84 2 дня назад
@@indydevdan gpt-4o is local ?
@billybob9247
@billybob9247 5 дней назад
What quantization sizes where you using for the models?? Love your channel! Keep it coming !!!
@DARKSXIDE
@DARKSXIDE 5 дней назад
maybe see how they perform with anthropics new contextual rag. then we can download devdocs and make even the slms smarter for coding
@amitkot
@amitkot 5 дней назад
Great comparison, thanks for making this! I'm off to compare qwen2.5:latest with qwen2.5-coder:latest.
@indydevdan
@indydevdan 2 дня назад
Thank you! Qwen2.5 was the real shocker here. When qwen 3 hits - it's prime time for on device models.
@billydoughty7243
@billydoughty7243 5 дней назад
@IndyDevDan - you da man, dan. experienced engineers can appreciate your methodology and the value of your content and the tools you create. inexperienced engineers can learn the value of a methodical, structured approach to software development, which includes analyzing, comparing, and building tools to maximize your productivity. great videos. keep 'em coming.
@techfren
@techfren 5 дней назад
Thank you for continuing to post great content
@DARKSXIDE
@DARKSXIDE 5 дней назад
u too techfren you guys both rock! big fan of both channels !
@vitalis
@vitalis 5 дней назад
Checkout Molmo then
@acllhes
@acllhes 5 дней назад
What happened to your ai personal assistant?
@indydevdan
@indydevdan 2 дня назад
We've been waiting for the realtime_api 🚀
@albertwang5974
@albertwang5974 5 дней назад
what an inspiration video!
@amitkot
@amitkot 6 дней назад
Thanks for sharing this video!
@adamviaja
@adamviaja 7 дней назад
I'm pretty new to coding and I'm def a little lost in this video but I'll have to come back as I learn more!
@BA-ve7xp
@BA-ve7xp 10 дней назад
Is it possible to do this without Cursor? I think I saw a video of vscode plugins for aideer, continue, and cluadedev. I'm new to all of this so I appreciate any pointers.
@JannisSchulze-xz7um
@JannisSchulze-xz7um 10 дней назад
HOW HARD IS IT TO COPY PASTE A PROMPT INTO THE VIDEO DESCRIPTION :(
@saaaashaaaaa
@saaaashaaaaa 10 дней назад
just pivot it to be able push it to host it on vercel as a working frontend with styling ability
@---xu8kc
@---xu8kc 10 дней назад
@IndyDevDan what is so special about duck database?
@user-pt1kj5uw3b
@user-pt1kj5uw3b 10 дней назад
That Mr. Beast production document is literal gold. Tens of thousands of hours of expert advice perfectly distilled. He comes off like a dick but sometimes you need to be a dick to do something great.
@indydevdan
@indydevdan 8 дней назад
I 100% agree.
@blarvinius
@blarvinius 10 дней назад
I can't see, its all too small. Please remember mobile viewing for your videos.
@RainerSt0ff
@RainerSt0ff 10 дней назад
You're missing a point here, more lines doesn't equal better software. Performance of AI code tends to degrade quickly with length and complexity, especially if you want to build applications that are more advanced than a demo. Still, quite amazing how good LLMs have gotten, but the metric we measure shouldn't be the length of code produced
@BigFattyNat
@BigFattyNat 10 дней назад
Is this whole thing kinda not really but kinda really just built around what if your app was based on a slider?
@hamzaessahbaoui7053
@hamzaessahbaoui7053 11 дней назад
It would be great if each response is saved for the prompt.
@indydevdan
@indydevdan 8 дней назад
Definitely coming in v2.
@ctcsys
@ctcsys 11 дней назад
But could it work as Aider front/IDE too? Yes I bet
@ctcsys
@ctcsys 11 дней назад
Sounds great, thanks
@paul310paul
@paul310paul 11 дней назад
Fantastic tool! Thank you so much!
@user-pt1kj5uw3b
@user-pt1kj5uw3b 12 дней назад
This is pretty clean and really powerful, I could really see this catching on
@ginocote
@ginocote 12 дней назад
This guy is so surexcited and go so fast... how many times on adverage do you click pause in each video? > 10 > 25 > 50 >100 >more 😂
@AgenticAI
@AgenticAI 11 дней назад
You can adjust the speed to your liking.
@GiovanneAfonso
@GiovanneAfonso 12 дней назад
you made my day
@uhtexercises
@uhtexercises 12 дней назад
Best stuff as always. Thank you so much for sharing
@egparker5
@egparker5 12 дней назад
Mathematica does the same reactive style reference updates with Dynamic[] expressions. It also has lots of widgets etc., and the killer feature is true symbolic computation.
@dinoscheidt
@dinoscheidt 12 дней назад
Uff… another toy that pushes data scientists away from writing unit tests 😮‍💨 we made so much progress getting them into VSCode and its Notebooks feature with pytest right there
@akshayagrawal6755
@akshayagrawal6755 12 дней назад
marimo notebooks are stored as pure Python files. Cells can be named and used in other python files, and in this way they can be readily tested with pytest. More features to help with this are roadmapped.
@jindrichsirucek
@jindrichsirucek 12 дней назад
Great content!! I was breaking my head with the way how to structure instructions especialy for meta prompting and first I was thinking about json bcs of its unlimited nesting nature. then I realized that XML might be better bcs of the problem closing brackets.. and then I realized the reason why XML is the best format is bcs LLM are trained on websites - tudum tudum tudum tada - XML formated content :D I kind of realized all those things on my own and I was thinking, why is nobody talking about it and then 2 days lateer booom - this video :D thx for references - Ill study what others came up with, since I kinda reinvented wheel on my own :D Thx