DeepSeek-V2.5 : The Best Opensource Model GOT BETTER! (Beats Claude, GPT-4O?)

AICodeKing

Подписаться 31 тыс.

Просмотров 9 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

29 сен 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 45

@jimlynch9390 21 день назад

I haven't seen a single model get the hex problem right. That was the best butterfly so far, of the models I've seen you test, that is. Thanks for all the reviews and keep up the good work.

@BadreddineMoon 21 день назад

Also the landing page is one of the best

@tgtussockyb7037 21 день назад

I think gpt-4o mini has like a 50% chance of getting the math problem when I try on lmsys. And, perhaps slightly higher with bit of performance prompting.

@hernansanson4921 21 день назад

Yes, the butterfly was beautiful, it should have counted as 2x !

@saabirmohamed636 21 день назад

Deepseek is cheap , and it works extremely well in aider, large context and large outputs, aider --deep to me is the best option other than paid ones , and reasoning quality is very good. sometimes better than sonnet ... i never see that rate limit stuff...and the web 300 500 lines of context it takes no problem , try and paste and see ..it can generate whole ideas long answers ...very rarely cuts the outputs, or prints half file and stops ...

@jaradaty88 21 день назад

I wish it can handle images and documents....that will be awesome

@TomaszStochmal 21 день назад

Aider can quickly increase your API costs so DeepSeek is nice alternative

@DeanRie 21 день назад

Oh thanks man 👍

@MacS7n 21 день назад

Deepseek to me is the best but since they are from China, EU, and US companies don't like mentioning them or using their models. Once they release a multimodal model it's going to have an impact. Deepseek is made for coding and I think that's the best approach. Instead of making a big model that's good at everything you just focus on making your model good at one thing and that's what they're doing with Deepseek It's a coding model first. Yes I sound like a fanboy but I think you should also improve your prompt because the more detail you give the better the result. How about not doing only zero-shot benchmarks? They also mentioned in the tweet at 0:55 that you should update the system prompt and temperature and this is something that nobody does when testing models. A +200B model shouldn't fail at the first 2 questions in your benchmark. It's like using a text-to-image model like midjourney, stable diffusion without changing the sref or seed values. You should adapt the system prompt and temperature based on the question type like question 11 about generating the SVG code for a butterfly, it sounds like a coding question but it's more of an artistic question first that needs a different system prompt. I saw a huge improvement with my local models when using Claude 3.5 Sonnet system prompt.

@Ryu-gs8hq 16 дней назад

@MacS7n Can you share your temperature and system prompts setting?

@MeinDeutschkurs 21 день назад

I‘m confused. I cannot understand your relatively positive rating for a model that size and so many fails. could you please compare the results of the current model to the former models? It all feels so random. Btw: Single zero shot does not say anything about the consistency.

@AICodeKing 21 день назад

It's not underwhelming at all. The first 2 questions that it fails is language questions which is not a strong of suite of Deepseek because it's a chinese model and it's mostly trained on chinese language, not english. It excels in coding, apart from the Game of life which could be fixed with some simple extra prompting. It's better than Llama-3.1 405b in my mind. Because, when models go above >30b parameter almost anyone can't run it locally and in that case you have to consider the pricing of inference instead of it's parameter and stuff like that and In the inference pricing, it's worth the cost.

@MeinDeutschkurs 21 день назад

@@AICodeKing ok, so it‘s not native in English. Maybe we should try to translate the prompt with the help of qwen2:72b (or google translate API) first, before feeding it to DeepSeek V 2.5? Sounds strange, but maybe this is better. The back translation could be a challenge, but why not? More than 1.5 years, I translated image prompts from n to English, to get better results at image generation.

@utvikler-no 21 день назад

@@AICodeKing I think you should mention that because i am also somewhat confused of the thumbnail and your positiveness 😀

@hottab.clubber 21 день назад

DeepSeek Exellent coder on python. Exellent understanding my russian lang :)

@SudeeptoDutta 21 день назад

I had bought API Uage since it was the cheapest and best option for me. Its programming capabilities are really good. Good to know it was upgraded.

@mjkht 21 день назад

if it can draw a butterfly in svg, it could do it also in blender as 3d object. also it can compose music via midi notes :>

@yuyutsurao 21 день назад

Why don't u use coder 2.5 for coding questions

@paulyflynn 21 день назад

good point. I tried it with the "game of life" question and it got a PASS

@jackflash6377 21 день назад

I agree, we have plenty of models that will answer all the fluff questions. how about digging into some serious coding and make a benchmark for that. I use AI for a few minor things besides coding but coding is the majority of my use. Also.. one shot code answers are cool and all but not real world. How good can it do if you give it multiple shots? Can it make efficient and working code?

@hottab.clubber 21 день назад

@@jackflash6377 DeepSeek write working code on python for me with 1% errors. But main logic of scripts - its human job !