Тёмный

DeepSeek-V2.5 : The Best Opensource Model GOT BETTER! (Beats Claude, GPT-4O?) 

AICodeKing
Подписаться 31 тыс.
Просмотров 9 тыс.
50% 1

Опубликовано:

 

29 сен 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 45   
@jimlynch9390
@jimlynch9390 21 день назад
I haven't seen a single model get the hex problem right. That was the best butterfly so far, of the models I've seen you test, that is. Thanks for all the reviews and keep up the good work.
@BadreddineMoon
@BadreddineMoon 21 день назад
Also the landing page is one of the best
@tgtussockyb7037
@tgtussockyb7037 21 день назад
I think gpt-4o mini has like a 50% chance of getting the math problem when I try on lmsys. And, perhaps slightly higher with bit of performance prompting.
@hernansanson4921
@hernansanson4921 21 день назад
Yes, the butterfly was beautiful, it should have counted as 2x !
@saabirmohamed636
@saabirmohamed636 21 день назад
Deepseek is cheap , and it works extremely well in aider, large context and large outputs, aider --deep to me is the best option other than paid ones , and reasoning quality is very good. sometimes better than sonnet ... i never see that rate limit stuff...and the web 300 500 lines of context it takes no problem , try and paste and see ..it can generate whole ideas long answers ...very rarely cuts the outputs, or prints half file and stops ...
@jaradaty88
@jaradaty88 21 день назад
I wish it can handle images and documents....that will be awesome
@TomaszStochmal
@TomaszStochmal 21 день назад
Aider can quickly increase your API costs so DeepSeek is nice alternative
@DeanRie
@DeanRie 21 день назад
Oh thanks man 👍
@MacS7n
@MacS7n 21 день назад
Deepseek to me is the best but since they are from China, EU, and US companies don't like mentioning them or using their models. Once they release a multimodal model it's going to have an impact. Deepseek is made for coding and I think that's the best approach. Instead of making a big model that's good at everything you just focus on making your model good at one thing and that's what they're doing with Deepseek It's a coding model first. Yes I sound like a fanboy but I think you should also improve your prompt because the more detail you give the better the result. How about not doing only zero-shot benchmarks? They also mentioned in the tweet at 0:55 that you should update the system prompt and temperature and this is something that nobody does when testing models. A +200B model shouldn't fail at the first 2 questions in your benchmark. It's like using a text-to-image model like midjourney, stable diffusion without changing the sref or seed values. You should adapt the system prompt and temperature based on the question type like question 11 about generating the SVG code for a butterfly, it sounds like a coding question but it's more of an artistic question first that needs a different system prompt. I saw a huge improvement with my local models when using Claude 3.5 Sonnet system prompt.
@Ryu-gs8hq
@Ryu-gs8hq 16 дней назад
@MacS7n Can you share your temperature and system prompts setting?
@MeinDeutschkurs
@MeinDeutschkurs 21 день назад
I‘m confused. I cannot understand your relatively positive rating for a model that size and so many fails. could you please compare the results of the current model to the former models? It all feels so random. Btw: Single zero shot does not say anything about the consistency.
@AICodeKing
@AICodeKing 21 день назад
It's not underwhelming at all. The first 2 questions that it fails is language questions which is not a strong of suite of Deepseek because it's a chinese model and it's mostly trained on chinese language, not english. It excels in coding, apart from the Game of life which could be fixed with some simple extra prompting. It's better than Llama-3.1 405b in my mind. Because, when models go above >30b parameter almost anyone can't run it locally and in that case you have to consider the pricing of inference instead of it's parameter and stuff like that and In the inference pricing, it's worth the cost.
@MeinDeutschkurs
@MeinDeutschkurs 21 день назад
@@AICodeKing ok, so it‘s not native in English. Maybe we should try to translate the prompt with the help of qwen2:72b (or google translate API) first, before feeding it to DeepSeek V 2.5? Sounds strange, but maybe this is better. The back translation could be a challenge, but why not? More than 1.5 years, I translated image prompts from n to English, to get better results at image generation.
@utvikler-no
@utvikler-no 21 день назад
@@AICodeKing I think you should mention that because i am also somewhat confused of the thumbnail and your positiveness 😀
@hottab.clubber
@hottab.clubber 21 день назад
DeepSeek Exellent coder on python. Exellent understanding my russian lang :)
@SudeeptoDutta
@SudeeptoDutta 21 день назад
I had bought API Uage since it was the cheapest and best option for me. Its programming capabilities are really good. Good to know it was upgraded.
@mjkht
@mjkht 21 день назад
if it can draw a butterfly in svg, it could do it also in blender as 3d object. also it can compose music via midi notes :>
@yuyutsurao
@yuyutsurao 21 день назад
Why don't u use coder 2.5 for coding questions
@paulyflynn
@paulyflynn 21 день назад
good point. I tried it with the "game of life" question and it got a PASS
@jackflash6377
@jackflash6377 21 день назад
I agree, we have plenty of models that will answer all the fluff questions. how about digging into some serious coding and make a benchmark for that. I use AI for a few minor things besides coding but coding is the majority of my use. Also.. one shot code answers are cool and all but not real world. How good can it do if you give it multiple shots? Can it make efficient and working code?
@hottab.clubber
@hottab.clubber 21 день назад
@@jackflash6377 DeepSeek write working code on python for me with 1% errors. But main logic of scripts - its human job !
@lydedreamoz
@lydedreamoz 21 день назад
My go to model for aider. Best value for money by far, even supports caching which is insane because the price is already so low.
@minhhieple6483
@minhhieple6483 7 дней назад
chat with deepseek work well but autocomplete very slow, don't know why.
@pudochu
@pudochu 21 день назад
Are the results of this test also available for Claude 3.5 Sonnet and GPT-4o? Thanks.
@paulyflynn
@paulyflynn 21 день назад
yes, see earlier videos. Claude is/was quite impressive - it also got the CSS butterfly question.
@mrfresshness
@mrfresshness 14 дней назад
can deepseek v2.5 be used with vscode for free through Inference with Huggingface's Transformers?
@mrfresshness
@mrfresshness 13 дней назад
It's this possible?
@TheBuzzati
@TheBuzzati 21 день назад
I love how cheap Deepseek is, and it's very impressive for open source, but man is it slow.
@sinapxiagency
@sinapxiagency 21 день назад
If we write in Chinese, do you think that could pass the language task
@marcusmayer1055
@marcusmayer1055 12 дней назад
Thanks 👍
@mal-avcisi9783
@mal-avcisi9783 21 день назад
this is cool, really cool !
@DouhaveaBugatti
@DouhaveaBugatti 21 день назад
Finally 😂 You noticed it
@Tomosw-xx
@Tomosw-xx 21 день назад
7:42 so deepseek v2coder beter than deepseek v2.5?
@AICodeKing
@AICodeKing 21 день назад
This combination is a little degraded in some scenarios but the degradation is about
@hottab.clubber
@hottab.clubber 21 день назад
No 2.5 more accuracy in python for me.
@TomaszStochmal
@TomaszStochmal 21 день назад
I find DeepSeek response time to be the slowest among LLMs but like it's cheap price
@jaradaty88
@jaradaty88 21 день назад
V2 is better -_-
@cerilza_kiyowo
@cerilza_kiyowo 21 день назад
No
@mallardlane8965
@mallardlane8965 21 день назад
it doesn't seems to work in Claude Dev, Can you make a tutorial for that ? Thanks
@BACA01
@BACA01 21 день назад
Select openai compatible option.
@TawnyE
@TawnyE 20 дней назад
E
@johngoad
@johngoad 20 дней назад
I actually have four boxes of pencils by the way
@AICodeKing
@AICodeKing 20 дней назад
Good for you, John. Let me know if you get more. I'll increase the number in the question then.
@johngoad
@johngoad 20 дней назад
@@AICodeKing Thank you, You are very cool
Далее
DAXSHAT!!! Avaz Oxun sahnada yeg'lab yubordi
10:46
Просмотров 269 тыс.
Катаю тележки  🛒
08:48
Просмотров 515 тыс.
What is RAG? (Retrieval Augmented Generation)
11:37
Просмотров 146 тыс.