Тёмный
No video :(

InternLM-2.5 (7b) : This NEW Model BEATS Qwen-2 & Llama-3 in Benchmarks! (Fully Tested) 

AICodeKing
Подписаться 21 тыс.
Просмотров 4 тыс.
50% 1

In this Video, I'll be telling you about the newly released InternLM-2.5 7b Model. This new model comes with a 1M Token Context Limit which is really amazing. This new Model claims to beat Qwen-2, Llama-3, Claude, DeepSeek and other Opensource LLMs. I'll be testing it out in this video. Watch the video to find more about this new model. It also beats Qwen-2, DeepSeek Coder, Codestral in all kinds of coding tasks.
------
Key Takeaways:
🌟 InternLM 2.5 Launch: Just launched, InternLM 2.5 is the latest AI model, outperforming Llama 3 and Gemma 2 9B in practical scenarios.
🚀 7 Billion Parameters: With 7 billion parameters, InternLM 2.5 offers outstanding reasoning capabilities and a long context window, perfect for complex AI tasks.
🏆 Benchmark Dominance: InternLM 2.5 excels in MMLU, CMMLU, BBH, and MATH benchmarks, showcasing superior performance against larger models.
🔧 Tool Usage: InternLM 2.5 excels at tool usage, making it ideal for applications that involve web search and other integrated tools.
📊 Real-World Performance: Despite benchmark success, real-world performance is where InternLM 2.5 shines, particularly in coding tasks with its 1M-long context window.
💻 Available on Major Platforms: Now accessible on Ollama, HuggingFace, and more, making it easy to test and integrate InternLM 2.5 into your projects.
🤖 Hands-On Testing: Watch as we put InternLM 2.5 through various language and coding tasks, highlighting its strengths and weaknesses.
------
Timestamps:
00:00 - Introduction
00:07 - About InternLM-2.5 (7b with 1Million Context)
01:16 - Benchmarks
03:03 - Testing
07:53 - Conclusion

Опубликовано:

 

17 авг 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 30   
@user-no4nv7io3r
@user-no4nv7io3r Месяц назад
They train their models on benchmarking, claim to beat everyone else, turned out to be trash in most cases, what a crazy world we are living in
@superakaike
@superakaike Месяц назад
They also train their model on ChatGPT answers ...
@wolraikoc
@wolraikoc Месяц назад
A copilot video with this model and neovim would be aweseome!
@Link-channel
@Link-channel Месяц назад
I wonder how to integrate autoconpletion in vim, no wait, I wonder how to use vim
@nahuelpiguillem2949
@nahuelpiguillem2949 Месяц назад
Thank you for doing honest review, it's rare to find someone saying "i tested and it is not worth it". Sometimes the last thing it isn't the best
@BadreddineMoon
@BadreddineMoon Месяц назад
I'm addicted to your videos, keep up the good work ❤
@user-no4nv7io3r
@user-no4nv7io3r Месяц назад
@@BadreddineMoon me too especially his voice and tone and critiques that's magical
@waveboardoli2
@waveboardoli2 Месяц назад
Can you show how to use claude-engineer with opensource models?
@sammcj2000
@sammcj2000 Месяц назад
I’d be interested in you trying it with coding with a number of different parameters (topp/k, temp, rep penalty etc)
@Revontur
@Revontur Месяц назад
as always a great video... thanks for your effort. Is there any site, where you publish your tests ? because it would be really great to compare new models with previous tested models.
@nahuelpiguillem2949
@nahuelpiguillem2949 Месяц назад
Sameeee
@RedOkamiDev
@RedOkamiDev Месяц назад
Thanks Mr. AiKing, you are my daily source of AI news :)
@jaysonp9426
@jaysonp9426 Месяц назад
You didn't test the needle in a haystack or what it does with 1m tokens?
@tianjin8208
@tianjin8208 Месяц назад
Intern series always train their models on eval dataset, it's their style, they need to surpass others quikly, so this is the fast way.
@pudochu
@pudochu Месяц назад
6:47 How can I find the test here? It would also be great if they have answers.
@paulyflynn
@paulyflynn Месяц назад
What size codebase will 1M Token Context support? Is there a LOC to Token formula?
@elchippe
@elchippe Месяц назад
Draw a butterfly in svg? Those task would be hard for a large LLM like claude and way more for an 7B LLM. The transformer architecture biggest drawback is inability to rethink backwards, that is why this models mostly fail in these puzzles.
@AICodeKing
@AICodeKing Месяц назад
I generally do that test to check wheither the LLM can create something similar. Claude & GPT can do this. Also, I don't do other tests for smaller ones the tests are similar wheither it be 7b or 300b
@SpikyRoss
@SpikyRoss Месяц назад
Hey, It would be great if you could add the links to the model in the description. 👍
@EladBarness
@EladBarness Месяц назад
Hype for nothing, wouldn’t count on it in anything… thank you for the video!
@john_blues
@john_blues Месяц назад
If it can't build a basic python script, why would I want it chatting with my codebase? Anyway, thanks for the video and the actual testing on this.
@LucasMiranda2711
@LucasMiranda2711 Месяц назад
Which one was the best tested until now? Any place or anyone counting the scores?
@AICodeKing
@AICodeKing Месяц назад
Currently, Qwen-2 is topping my list for general tasks and for coding DeepSeek-Coder-V2
@MeinDeutschkurs
@MeinDeutschkurs Месяц назад
The model seems to be horrendous! Thx for saving my time.
@Lemure_Noah
@Lemure_Noah Месяц назад
This model is good in benchmarks, but it doesn't seem to be better than other moderm models like Llama-3, Phi-3 or even Mistral 7, at least on my internal review, dealing with summarization and other language tasks. If someone could give real word example where it performs better than other models on same class, please share it ;)
@LazarMateev
@LazarMateev Месяц назад
Merge maestro with claude engineer and aider into 1. Make it is open source model orchestration recalling initial prompt with ascces to RAG and you would be the king of the kings 😊 locally hosted web apps looks very cool niche
@Andres-8o-u8z
@Andres-8o-u8z Месяц назад
Which one do you consider to be the best model for general tasks nowadays?
@AICodeKing
@AICodeKing Месяц назад
Qwen2
@aryindra2931
@aryindra2931 Месяц назад
Please make 2 day❤❤, i like video
@hollidaycursive
@hollidaycursive Месяц назад
Pre-watch Comment
Далее
Get 10 Mega Boxes OR 60 Starr Drops!!
01:39
Просмотров 14 млн
Linus Torvalds: Speaks on Hype and the Future of AI
9:02
LLaMA 3 UNCENSORED 🥸 It Answers ANY Question
8:48
LLaMA 405b Fully Tested - Open-Source WINS!
10:02
Просмотров 75 тыс.