Right now on the channel, we're on the path to evolve into Agentic engineers. Engineers that build software that works for them while they sleep.
Here are principles of my engineering philosophy and ideologies this channel holds as facts.
- Avoid hype, focus on real valuable tools, technology and products. - Build real products. There are enough news, hype, trend channels. Let's make something real. - Listen to feedback but always think for yourself. - Use the best technology for the job period full stop. - Keep learning, forever. - Do what you share. I don't share anything on this channel I'm not betting on with my time, energy and money. - Cancel out the noise and focus on the signal of value creation. - Great things happen in the flow. Look for it everyday.
I'm not the perfect programmer, designer, or creator but to succeed you don't have to be perfect, you just have to try, over and over in success or failure.
I found it hard to understand how you benched the models. Was this mostly down to personal opinion? Maybe you might explain your tests before discussing the results. Your test tooling looks really nice!
100% personal opinion and vibes. I use promptfoo for more hands on assertion based testing. This notebook is more about understanding what the models can do at a high level.
I know you don't do much of model training on this channel. But have you considered training the some of the local models on your good test results then seeing how the refined models perform?
I wish that you put the model parameter sizes in the video description. Makes it easier to really give weight to your comparisons when you're comparing a 1B model to a 7B model
Would be cool to test image understanding. Basic OCR to start with then counting objects and doing reasoning over the images. LLM providers often tell us what their models can't do, or can't do well. Use that info as a signal of improvement would be very useful IMHO. Best still is that you can use code to check exactly how correct each model is, this can be harder when dealing with text where you need a human judge or an LLM as a judge (which then needs to be aligned with a human anyway). Also thanks for the video, I check in every Monday. Keep on keeping on. 👍
Awesome! I'd love to see some FIM prompt tests on FIM-purposed models like deepseek coder - I had that as my 'auto complete copilot' in Twinny on VSCode before I moved to Cursor and I was impressed with how often it nailed it, for a really small model (no point in auto complete if it can't offer a completion quicker than my own brain and fingers! And I only have a lowly 1060 6Gb 😅). It strikes me that FIM code completion could be a way to leverage those model's strengths in code reasoning, which could outperform natural language reasoning in instruct-tuned models of a similar size. e.g. a logical setup and a logical next step presented as code with a FIM request on the intermediate action... I'm thinking of tool-choice in particular for my own use case. All the assistant demo scripts I've seen show picking between, like, five tools max which is not a realistic sized toolbox available to an assistant. Keyword-based context-stuffing would help of course, or RAG techniques on the tools and their descriptions, but timeliness may prevent that in-practice. I can imagine code with concise named and typed functions declared, and a comment describing the purpose of the next step - that should perform well with these, I suspect. I just haven't got to the experiment yet 😄 The main benefit is it would address your particular displeasure with extraneous explanation - if present at all it would be commented out code or, at worst, print/log statements.
Thanks for the video! Could you make a tutorial in which a local installation of Llama can learn from the chats you have with the IA. I mean you just talk and somehow it is storing this information internally and not losing it when you close the computer.
@IndyDevDan - you da man, dan. experienced engineers can appreciate your methodology and the value of your content and the tools you create. inexperienced engineers can learn the value of a methodical, structured approach to software development, which includes analyzing, comparing, and building tools to maximize your productivity. great videos. keep 'em coming.
Is it possible to do this without Cursor? I think I saw a video of vscode plugins for aideer, continue, and cluadedev. I'm new to all of this so I appreciate any pointers.
That Mr. Beast production document is literal gold. Tens of thousands of hours of expert advice perfectly distilled. He comes off like a dick but sometimes you need to be a dick to do something great.
You're missing a point here, more lines doesn't equal better software. Performance of AI code tends to degrade quickly with length and complexity, especially if you want to build applications that are more advanced than a demo. Still, quite amazing how good LLMs have gotten, but the metric we measure shouldn't be the length of code produced
Mathematica does the same reactive style reference updates with Dynamic[] expressions. It also has lots of widgets etc., and the killer feature is true symbolic computation.
Uff… another toy that pushes data scientists away from writing unit tests 😮💨 we made so much progress getting them into VSCode and its Notebooks feature with pytest right there
marimo notebooks are stored as pure Python files. Cells can be named and used in other python files, and in this way they can be readily tested with pytest. More features to help with this are roadmapped.
Great content!! I was breaking my head with the way how to structure instructions especialy for meta prompting and first I was thinking about json bcs of its unlimited nesting nature. then I realized that XML might be better bcs of the problem closing brackets.. and then I realized the reason why XML is the best format is bcs LLM are trained on websites - tudum tudum tudum tada - XML formated content :D I kind of realized all those things on my own and I was thinking, why is nobody talking about it and then 2 days lateer booom - this video :D thx for references - Ill study what others came up with, since I kinda reinvented wheel on my own :D Thx