No video :(

InternLM-2.5 (7b) : This NEW Model BEATS Qwen-2 & Llama-3 in Benchmarks! (Fully Tested)

Подписаться 21 тыс.

Просмотров 4 тыс.

50% 1

In this Video, I'll be telling you about the newly released InternLM-2.5 7b Model. This new model comes with a 1M Token Context Limit which is really amazing. This new Model claims to beat Qwen-2, Llama-3, Claude, DeepSeek and other Opensource LLMs. I'll be testing it out in this video. Watch the video to find more about this new model. It also beats Qwen-2, DeepSeek Coder, Codestral in all kinds of coding tasks.
------
Key Takeaways:
🌟 InternLM 2.5 Launch: Just launched, InternLM 2.5 is the latest AI model, outperforming Llama 3 and Gemma 2 9B in practical scenarios.
🚀 7 Billion Parameters: With 7 billion parameters, InternLM 2.5 offers outstanding reasoning capabilities and a long context window, perfect for complex AI tasks.
🏆 Benchmark Dominance: InternLM 2.5 excels in MMLU, CMMLU, BBH, and MATH benchmarks, showcasing superior performance against larger models.
🔧 Tool Usage: InternLM 2.5 excels at tool usage, making it ideal for applications that involve web search and other integrated tools.
📊 Real-World Performance: Despite benchmark success, real-world performance is where InternLM 2.5 shines, particularly in coding tasks with its 1M-long context window.
💻 Available on Major Platforms: Now accessible on Ollama, HuggingFace, and more, making it easy to test and integrate InternLM 2.5 into your projects.
🤖 Hands-On Testing: Watch as we put InternLM 2.5 through various language and coding tasks, highlighting its strengths and weaknesses.
------
Timestamps:
00:00 - Introduction
00:07 - About InternLM-2.5 (7b with 1Million Context)
01:16 - Benchmarks
03:03 - Testing
07:53 - Conclusion

Опубликовано:

17 авг 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 30

@user-no4nv7io3r Месяц назад

They train their models on benchmarking, claim to beat everyone else, turned out to be trash in most cases, what a crazy world we are living in

@superakaike Месяц назад

They also train their model on ChatGPT answers ...

@wolraikoc Месяц назад

A copilot video with this model and neovim would be aweseome!

@Link-channel Месяц назад

I wonder how to integrate autoconpletion in vim, no wait, I wonder how to use vim

@nahuelpiguillem2949 Месяц назад

Thank you for doing honest review, it's rare to find someone saying "i tested and it is not worth it". Sometimes the last thing it isn't the best

@BadreddineMoon Месяц назад

I'm addicted to your videos, keep up the good work ❤

@user-no4nv7io3r Месяц назад

@@BadreddineMoon me too especially his voice and tone and critiques that's magical

@waveboardoli2 Месяц назад

Can you show how to use claude-engineer with opensource models?

@sammcj2000 Месяц назад

I’d be interested in you trying it with coding with a number of different parameters (topp/k, temp, rep penalty etc)

@Revontur Месяц назад

as always a great video... thanks for your effort. Is there any site, where you publish your tests ? because it would be really great to compare new models with previous tested models.

@nahuelpiguillem2949 Месяц назад

Sameeee

@RedOkamiDev Месяц назад

Thanks Mr. AiKing, you are my daily source of AI news :)

@jaysonp9426 Месяц назад

You didn't test the needle in a haystack or what it does with 1m tokens?

@tianjin8208 Месяц назад

Intern series always train their models on eval dataset, it's their style, they need to surpass others quikly, so this is the fast way.

@pudochu Месяц назад

6:47 How can I find the test here? It would also be great if they have answers.

@paulyflynn Месяц назад

What size codebase will 1M Token Context support? Is there a LOC to Token formula?

@elchippe Месяц назад

Draw a butterfly in svg? Those task would be hard for a large LLM like claude and way more for an 7B LLM. The transformer architecture biggest drawback is inability to rethink backwards, that is why this models mostly fail in these puzzles.

@AICodeKing Месяц назад

I generally do that test to check wheither the LLM can create something similar. Claude & GPT can do this. Also, I don't do other tests for smaller ones the tests are similar wheither it be 7b or 300b

@SpikyRoss Месяц назад

Hey, It would be great if you could add the links to the model in the description. 👍

@EladBarness Месяц назад

Hype for nothing, wouldn’t count on it in anything… thank you for the video!

@john_blues Месяц назад

If it can't build a basic python script, why would I want it chatting with my codebase? Anyway, thanks for the video and the actual testing on this.

@LucasMiranda2711 Месяц назад

Which one was the best tested until now? Any place or anyone counting the scores?

@AICodeKing Месяц назад

Currently, Qwen-2 is topping my list for general tasks and for coding DeepSeek-Coder-V2

@MeinDeutschkurs Месяц назад

The model seems to be horrendous! Thx for saving my time.

@Lemure_Noah Месяц назад

This model is good in benchmarks, but it doesn't seem to be better than other moderm models like Llama-3, Phi-3 or even Mistral 7, at least on my internal review, dealing with summarization and other language tasks. If someone could give real word example where it performs better than other models on same class, please share it ;)

@LazarMateev Месяц назад

Merge maestro with claude engineer and aider into 1. Make it is open source model orchestration recalling initial prompt with ascces to RAG and you would be the king of the kings 😊 locally hosted web apps looks very cool niche