Great video. I'm pretty unfamiliar with cloud, I just wanna make sure that I can get a LLM to service multiple endpoints for multiple users. If so how do I get to know the number of users that can be serviced
Why is that everyone skip the most important part of AWS service for automation which is how to create the Lambda code! Is there a resource about how to write or make the Lambda context/code?
I would rather use prompt engineering than DSPy. The beauty of LLm is to generate contents/code with natural language, now DSPy asks people to use programming lanaguage again. There is also deep learning curve for DSPy.
In my experience the challenge is the metric function. Examples always seem to use exact match; getting a qualitative metric working is non-trivial. Try use DSPy to optimize prompting for summarizing video transcripts, for example; you'll probably spend more time trying to get the metric working than you would have just coming up with a decent prompt. You also need a metric function that is going to discriminate between prompts of different quality, which is also not as trivial as it might seem.
Yes, this was useful. Thank you. But after watching the video I'm still not sure if DSPy lives up to the hype. What I would want to see is a series of benchmark tests of "Before Teleprompter" and "After Teleprompter" results on a determinative set of tasks that cover a range of concerns. Such as: math questions, reasoning questions, code generation questions, categorized into Easy, Medium, Difficult. This should be done with a series of models, starting with, of course, GPT4, but including Claude, Groq, Geminia, and a set of HuggingFace Open Source models such as DeepseekCoder-Instruct-33B, laser-dolphin-mixtral-2x7b, etc. I would want to see this done with the variety of DSPy Compile options, such as OneShot, FewShot, using ReACT, and using LLM as Judge, etc., where the concerns are appropriate. In other words, a formal set of tests and benchmarkes based on the various, but not infinite, configuration options for each set of concerns. This would give us much better information and be truly valuable to those who are embarking on their DSPy journey. Right now, it is very unclear whether compiling using teleprompter actually provides more accurate results, and under what circumstances (configurations). I have seen more than one demo, and in some cases the teleprompter actually produced worse results, and the comment was "well, sometimes it works better than others". My proposed information set, laid out in a coherent format, would be tremendously useful to the community and would go a long way towards answering the question you posed: Does DSPy live up to the Hype? Because we don't have this information, the Jury is still out, and your video poses the right question, but doesn't quite answer it, tbh. The benchmarking tests I am proposing would. That along with a thoughtful discussion of the discoveries would be tremendously useful. That said, I did learn a few useful things here, and so thanks again!
Ok, i did not understand one thing. Why did you include ANSWERNOTFOUND in the context? This seems to defeat the whole purpose of getting a correct answer. How would i know if the context is relevant to the question before the question is asked? Is it not similar to data leakage? The true test would be to just remove ANSWERNOTFOUND from the context , because we don't know what question that might be asked, or we can even create negative examples like we do in word2vec and just use them to train the answernotfound. Let me know if I make sense
You can think of this as similar to supplying labeled data for training ML models. In this example you are training a prompt for extracting answers in a particular format (ANSWENOTFOUND is the label when no answer can be extracted from the other parts of the context)
Good review! My issue with these frameworks is their limitations become more transparent with large and more complex use cases - the boiler plate code ends up being technical debt that needs circumventing and the next iteration of the GPT's or Mistrals intrinsically solve some of the previous limitations that the models couldn't solve for.
You haven't exaplined the code what is the preprocessing code doing how is it working ? Its a request to make a new video on proper explanaation of the code.
@buksa7257 0 seconds ago Im having a bit trouble undestanding the following: i believe you're saying the lambda function is calling the endpoint of the sagemaker (the place where we stored the llm). But then who calls the lambda function? When is that function triggered? Does it need another endpoint?
It depends on the type of quantization. A rule of thumb is for 8-bit quantization it is the same i.e. 30b parameter model 8-bit would need 30 GB of ram (preferably GPU)
@@Derick99 You would probably need to figure out if LMStudio could communicate with multiple GPUs. I know packages like huggingface accelerate can handle multiple GPU configurations quite seamlessly
Amazon sagemaker is pretty complex and the ui horrible, any other ways to deploy? The compute model tried to charge me like 1000 dollars for the free usage. Because it spins up like 5 instances. Instance that don’t show up in the console directly, you have to open the separate sage maker instance viewer,
Yes - you can deploy quantized models locally using desktop apps (ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-BPomfQYi9js.html&ab_channel=DataScienceInEverydayLife) or look at other 3rd party solutions like lambda labs
I haven't played around with Oobabooga - but looks like similar functionalities (although I didn't see a .exe installation of Oobabooga) - in my experience with LMStudio vs other similar offerings, LM studio was the best by far: book.premai.io/state-of-open-source-ai/desktop-apps/