Merge LLMs to Make Best Performing AI Model

Подписаться 33 тыс.

Просмотров 38 тыс.

50% 1

This video is about mergekit, how to choose and blend models. It's non technical but links to technical papers are included. You need to know how to navigate the terminal but no programming is required.
🤖 Join my Discord community: / discord
📰 My tutorials on Medium: / mayaakim
🐦 My twitter profile: / maya_akim
To rent a GPU from Massed Compute (mergekit preinstalled) follow the link ⤵️
bit.ly/maya-akim
Code for 50% discount: MayaAkim
All links:
mergekit:
github.com/arcee-ai/mergekit
Open LLM Leaderboard
huggingface.co/spaces/Hugging...
my huggingface profile (with model configs you can copy):
huggingface.co/mayacinka
git installation:
gitforwindows.org/
lfs installation:
docs.github.com/en/repositori...
supported architecture for mergekit:
github.com/arcee-ai/mergekit/...
best blog about mergekit:
/ merge-large-language-m...
other really good blog about mergekit:
/ merge-large-language-m...
Charles Goddard’s blog: (author of mergekit)
goddard.blog/about/
Mona lisa with Mohawk
www.designboom.com/technology...
What is YAML:
www.techtarget.com/searchitop...
What is Data Contamination:
bdtechtalks.com/2023/07/17/ll...
Goodharts law
www.cna.org/reports/2022/09/g...
LazyMergekit:
colab.research.google.com/dri...
Auto evaluation: (requires runpod profile)
colab.research.google.com/dri...
configuration with 14 models merged:
huggingface.co/EmbeddedLLM/Mi...
MoE instructions:
github.com/arcee-ai/mergekit/...
higher density - better results
github.com/arcee-ai/mergekit/...
Model family tree: (visualization)
colab.research.google.com/dri...
huggingface.co/spaces/mlabonn...
cost of training mistral:
www.ft.com/content/387eeeab-1...
Leaderboard is disgusting:
/ open_llm_leaderboard_i...
Merging models with different architectures:
arxiv.org/pdf/2401.10491.pdf
merging models different arch:
github.com/18907305772/FuseLLM
Blending is all you need:
arxiv.org/pdf/2401.02994.pdf
Model soups
arxiv.org/pdf/2203.05482.pdf
Ties-merging research paper:
arxiv.org/pdf/2306.01708.pdf
Dare merge research paper:
arxiv.org/pdf/2311.03099.pdf
Task arithemtic:
arxiv.org/pdf/2212.04089.pdf
Benchmarks
Arc benchmarks
deepgram.com/learn/arc-llm-be...
arxiv.org/pdf/1803.05457.pdf
HellaSwag
arxiv.org/pdf/1905.07830.pdf
MMLU
arxiv.org/pdf/2009.03300.pdf
TrithfulQA
arxiv.org/abs/2109.07958
WinoGrande
arxiv.org/pdf/1907.10641.pdf
GSM8K
arxiv.org/pdf/2110.14168.pdf
overfitting problem Ann Lotz:
arstechnica.com/tech-policy/2...
Benchmarks are a problem screenshots:
analyticsindiamag.com/the-pro...
/ llm_benchmarks_are_bro...
/ llm_benchmarks_are_bul...
Attributions:
[commons.wikimedia.org/wiki/Fi...](commons.wikimedia.org/wiki/Fi...)
Timecodes:
0:00 - 1:47 - blending intro
1:48 - 3:36 - promise of blending
3:37 - 4:22 - blending steps and requirements
4:23 - 5:05 - all you need is hardware
5:06 - 5:30 - mergekit installation
5:31 - 9:23 - merge methods
10:48 - 13:31 - configurations and yaml
13:32 - 14:38 - how to run merge
14:39 - 14:42 - upload merged model
14:43 - 16:27 - best merge method
16:28 - 20:16 benchmark problems, overfitting and contamination
#mergekit #llm #localmodels

Наука

Опубликовано:

24 май 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 99

@maya-akim 2 месяца назад

i hope you find the video useful and don't forget to show (and brag about) your blended models!

@TheEarlVix 2 месяца назад

Thank you. Found this from your post on X.

@chrisBruner 2 месяца назад

Your videos are always very good and cutting edge.

@qiqqaqwerty1713 2 месяца назад

Thanks for the very informative video. Cheers from "Down Under"!

@user-ds2sc9tn4x 2 месяца назад

videos are so great! i will be modest to learn as more as i can!

@Ludecan 2 месяца назад

Awesome video!! Been following your series on building AI agents and they're very good! Thanks for sharing!

@sergiofigueiredo1987 2 месяца назад

This is destined to evolve into a meticulously curated, go-to channel of human reliability for years to come. Thank you very much for the exceptional quality you provide!

@Santiino 2 месяца назад

Its mindblowing to me how good your videos are yet you are still so unknown. Keep it up!

@Nifty-Stuff Месяц назад

Blending LLMs is a fascinating idea. The idea left me wondering: Why hasn't anybody developed a system/app that takes the API's from the top LLMs, created agents for each, and then have these agents all work together to brainstorm, debate, review, and solve problems? I often get 4 different answers from 4 LLMs, so why not have them all setup as agents "in one room" working together to come up with the "best" solution. I can't find anybody that's tried this... why not? Wouldn't having the "top minds" (LLMs) working together produce better results?

@bamh1re318 День назад

they could become worse, or give you 4 different answers, or could not stop talking around themsaelves

@SebastianKreutzberger 2 месяца назад

Fantastic video, so well prepared, fool-proof explained, and a really cutting-edge topic. Best AI RU-vidr out there - thank you 🙏

@Rob_Steele 2 месяца назад

Great video Maya! Keep em coming! 😎

@ivandukic 2 месяца назад

Wow, what an incredible explaination of merge methods. Thank you.

@seanhynes9516 25 дней назад

Awesome, thanks for the gerat video. Very well explained, great diagrams! :)

@blim420 2 месяца назад

Excellent walk through, thanks !

@mysticaltech Месяц назад

Maya, you are good at this stuff. you are averaging my internal mind vectors to make Ai easy. Keep doing so!

@minae1423 2 месяца назад

well articulated and educational video, thank you Maya!🙏🏼

@mayorc 2 месяца назад

Great video Maya. Keep it up ❕❕❕

@robinmordasiewicz 2 месяца назад

wow, most sophisticated RU-vidr ever. New favorite channel.

@GetzAI 2 месяца назад

Thanks Maya!

@overcuriousity Месяц назад

interesting, easy to follow, well researched and critically scrutinized the results. like your content!

@ajay--yadav 2 месяца назад

lot of information about so many topics presented nicely.

@rein436 2 месяца назад

Very insightful 👍

@gerykis 2 месяца назад

Very good explanation. I'm looking for such easy to understand video how to fine tune a model locally .

@scienceandmind3065 2 месяца назад

Great video and exactly what I need at the moment. Having a lot of specialized models for science, translation, coding, finance etc but no good way of combining them.

@maya-akim 2 месяца назад

best of luck! and share with us your results if you want :)

@MiguelLopez-mu1ss 2 месяца назад

Thank you for the insights

@ulrichbeutenmuller8101 2 месяца назад

thanks, great video!

@doomstertech8305 2 месяца назад

great video, loved the explanation of all the technical stuff. Would love to know your process on how you read and understand these topics in-depth?

@jeremybristol4374 2 месяца назад

Love the props with storytelling! Great instructional video!

@johnefan 2 месяца назад

Great Video👏🏻

@qiqqaqwerty1713 2 месяца назад

🎯 Key Takeaways for quick navigation, however this summary does not avoids you watch the complete video for a more in deep understanding:: Main Ideas: - 🌍 Model blending is an innovative approach to surpass the performance of high-cost models with limited resources. - 🤖 Non-experts can effectively blend models, demonstrating the technique's accessibility. - 💡 The blend allows for specialized functionality, combining models tuned for diverse tasks into a powerhouse model. - 🛠 The merging process involves selecting compatible models, defining parameters, and executing the blend with basic command line knowledge. - 🔄 Various blending methods like task vector arithmetic and SLURP offer unique advantages for custom model creation. - 📚 Proper selection and preparation of models are crucial, with a focus on architecture compatibility and avoiding common pitfalls. - 🏆 Blended models can achieve top rankings on leaderboards, though their position may fluctuate. - 🤔 The effectiveness of benchmarks in evaluating model intelligence is questioned, highlighting the issue of data contamination. Takeways: 00:00 *🤖 Introduction to Model Blending* - Introduction to the concepts of model blending, showcasing the power of combining models to overcome resource limitations and improve performance. - Highlights two models, Mixol and Ramonda, emphasizing the potential of model blending even with limited resources. 01:24 *📘 Basics of Model Blending* - Detailed explanation of model blending, its significance, and the methodology behind efficient blending. - Discusses the blending process, the importance of model selection, and the steps involved in creating a blended model. 02:05 *💡 The Promise of Blending* - Explores the potential of blending models to create top-performing LLMs without the need for extensive resources. - Focus on the accessibility of fine-tuning and blending for personalized model development. 03:33 *🛠️ How to Blend Models* - Provides a practical guide on blending models using MergeKit, including setup and execution steps. - Emphasizes the ease of blending models with basic knowledge and the right tools, offering an approachable method for enthusiasts and professionals alike. 05:33 *🧪 Detailed Blending Methods* - Deep dive into various blending techniques such as task vector arithmetic, SLURP, TIES, and DARE, explaining their unique applications and benefits. - Discusses the technical aspects of model blending, offering insights into choosing the right method for specific goals. 08:17 *🖥️ Preparing for Blending* - Guidelines on selecting compatible models for blending, emphasizing the importance of architecture and layer compatibility. - Instructions for downloading models from Hugging Face and preparing for the blending process. 10:33 *📝 Configuring YAML for Blending* - Step-by-step instructions on setting up YAML files for blending, highlighting the importance of specifying base models, merge methods, and parameters. - Offers practical tips for configuring blending parameters to optimize the blending process. 13:42 *🚀 Executing the Blend and Evaluation* - Detailed walkthrough of the blending execution using MergeKit and subsequent evaluation through a text generation interface. - Encourages testing and fine-tuning of the blended model before submission to benchmarks or public use. 15:45 *📊 Performance Testing and Data Contamination* - Discusses the significance of performance testing on open LLM leaderboards and addresses the issue of data contamination in model training. - Highlights the importance of careful model selection and blending strategy to avoid overfitting and ensure genuine improvements in model performance. I hope this helps everybody!

@EduGuti9000 2 месяца назад

¡Gracias!

@synchro-dentally1965 2 месяца назад

Excellent video! The development outlook seems open to so many possibilities. I'm curious if anyone will find advantages in networks built via diffusions(similar to image generation) or if there will be more real time dynamics implemented as the model responds to a query.

@DemiGoodUA 2 месяца назад

Nice Video! Do we have the ability to fine tune the model on own codebase?

@noblewarrior4776 2 месяца назад

You are amazing… thank you

@xspydazx 2 месяца назад

Very good lesson and explanation ! So far the best on this subject .. as the main problem I have was running the models after . I could not find the definitive method to work ... Despite one of the models scoring high it could not run in the HF Inference plugin on the model card ..

@lokeshart3340 2 месяца назад

Can we blend multimodal models like llavaa and mistral and gemini vision? Can u make a video on it pls..❤❤

@maya-akim 2 месяца назад

oh that's interesting, I got to say I didn't try but I'm curious myself! I'll see how it goes and either I'll make a video or I'll let you know somehow

@lokeshart3340 2 месяца назад

@@maya-akim sure.

@tiberiumihairezus417 2 месяца назад

Great content.

@Alf-Dee 2 месяца назад

Amazing video! I didn’t know it could be done. I am definitely going to make my own uncensored blended model for coding. I am tired of openai telling me that I should not modify/hack code without owner permission even if I am the owner, and I am trying to test how solid the code is…

@leumas_tai 2 месяца назад

Great video. How does this differ from the Mixture of Experts (MOE)?

@maya-akim 2 месяца назад

that's an excellent question! first of all, I noticed that the community doesn't consider MoE to be merged models, even though you can use mergekit to create MoE yourself (instructions in the description box). My understanding is that blended models become "fixed" when it comes to their capabilities. MoE capabilities change dynamically thanks to gating mechanism that decides how much of each expert's advice to follow for a given input. You specify prompts (or simple strings with mergekit) that activate specific expert. For example, here's a configuration that I used for MoE: huggingface.co/mayacinka/West-Ramen-7Bx4 as you can see, positive and negative prompts will "guide" the model.

@leumas_tai 2 месяца назад

@@maya-akim interesting. thanks for sharing your thoughts I'll look it out.

@geekyprogrammer4831 2 месяца назад

Very underrated channel!! This is enlightening. How a person can be so smart and beautiful too at the same time 😭😭

@mikect05 2 месяца назад

The combination of spending time messing w ai along with your videos are inspiring me to build my own workstation. Not sure if that's smart considering I don't know how to code. So far I have ordered: super micro x12dai mobo 2 platinum 8352s 2 rtx 3090s 2 sata 12 tb 2 optane nvmes for os and quick retrieval stuff 128 gb lrdimm ddr4 E-ATX case, cables & ps Do you do any consulting work via zoom? I may need some direction soon.

@chuchel3156 2 месяца назад

Nice video

@hand-eye4517 2 месяца назад

We thank you for all the amazing content and as such , being a great content creator , i dont wanna sound nitpicky , but since you are already attracting and leaning towards the DIY crowd you may as well be using the open source tools as well { vs Codium} etc. Just a small critique because i love the content.

@maya-akim 2 месяца назад

hey thanks for support and feedback 🙏🏻 I'm not sure I totally follow. Do you suggest that I switch to Codium? Honestly, before your comment I assumed that VScode is open source, but after googling a bit I realized that the product itself isn't actually. But I looks like Codium is os, so you think that that's a better fit for the channel?

@Cloudvenus666 2 месяца назад

What happens if you merge two models of the same family but they each have different context lengths? Does the model with the larger token window take precedence?

@maya-akim 2 месяца назад

it will depend on the "base model". But, in the cases that don't require defining a base model (like passthrough) or this hacky case here: huggingface.co/mayacinka/chatty-djinn-14B. when I merged models with 32K and 8K context window, the 32K models overpowered the 8K open chat model.

@Cloudvenus666 2 месяца назад

@@maya-akim thank you

@bgNinjaart 2 месяца назад

Genius

@_codegod 2 месяца назад

Thanks! What software are you running for loading and inferencing your merged LLM using localhost in browser?

@maya-akim 2 месяца назад

that's oobabooga's text generation UI. It allows you to run any model, whether it's saved locally, or on huggingface's hub

@_codegod 2 месяца назад

thanks@@maya-akim

@johntdavies 2 месяца назад

Maya, a great video, thank you. Quick question, where are you based? The reason I ask is I'm looking for an AI speak in the UK, you came to mind so was just wondering. Again, excellent video, amazing depth.

@maya-akim 2 месяца назад

hey John, thanks a lot for the support! I live in Austin, TX, so I'm afraid I won't be of any help :/

@johntdavies 2 месяца назад

@@maya-akim Damn, that's a long way away! Never mind, keep up the great work and thanks for getting back 🙂

@abdallamosa8836 Месяц назад

Is Combining tools like SWE-Agent, Crew AI, and OS-Copilot into a cohesive agentic workflow possible

@axe863 Месяц назад

Stacked and Cascading ensembling have been around for awhile

@Linguisticsfreak 15 дней назад

Since we don't have access to the training data, it is simply impossible/unfeasible to choose models based on whether they have or don't have contaminated data.

@JoelSiby-ju5pf Месяц назад

after that i could use my customized model from hugging face or locally on my app's?

@JoelSiby-ju5pf Месяц назад

also now that i have decided to use this model on my creating of gen-ai app's how would i load? llm = ??? # provide me the syntax for this

@amandamate9117 2 месяца назад

can you test agent frameworks like CrewAi with Claude 3 opus?

@PRColacino 2 месяца назад

Maya.. you are the girl!!!

@PaulSchwarzer-ou9sw 2 месяца назад

🎉

@SinanAkkoyun 2 месяца назад

Bach wtc 1 prelude 21 😍

@nimesh.akalanka 28 дней назад

How can I fine-tune the LLAMA 3 8B model for free on my local hardware, specifically a ThinkStation P620 Tower Workstation with an AMD Ryzen Threadripper PRO 5945WX processor, 128 GB DDR4 RAM, and two NVIDIA RTX A4000 16GB GPUs in SLI? I am new to this and have prepared a dataset for training. Is this feasible?

@Dhirajkumar-ls1ws 2 месяца назад

👍

@GuidedBreathing 2 месяца назад

3:40 and now add robots 🤖 cheers🥂

@justinwhite2725 2 месяца назад

LLMs catching up to something Stable Diffusion users have been doing for awhile. Open source is the way.

@inout3394 2 месяца назад

LLM: Tokenization vs MAMBA, please make video about this

@BrandonFurtwangler 2 месяца назад

Why does Slerp only support two models? Can’t you just slerp between pairs, then slerp the slerps, etc until you have 1?

@maya-akim 2 месяца назад

yep, you absolutely can slerp the slerps of the previously slerped slerps. That's what a lot of people do.

@user-ml9ph9tf1b 2 месяца назад

My only question while watching was. Why should I make a model? I figure there is going to a be a infinite number of models being created by people and soon to be ai models created by ai models. So my question is, what is the point of making a custom model aside from fine tuning on data. I use autogen, would creating a model like your doing. empower a local model to let's say.. chat on my data, and be good at function calling? maybe this would be an experimental way to possibly make my own model specifically for autogen? Like Ik someone out there is already working on that specifically and even you showed those models specifically used for function calling in one of your other vids.

@maya-akim 2 месяца назад

oh that's a great question! here's how I would use it: 1. Find a model that scores highly on MMLU benchmark (which means that It has diverse knowledge). Blend it with a model that you like because of how its "vibe". For me that would be openchat because I like how conversational it is. The blended model would perform better than the two "parent" models. 2. I'm actually working on this one. I'm trying to fine tune one model to specifically be good at crafting youtube titles. And another one to write good youtube scripts. Than, I'll try to blend those two.

@NickDoddTV 2 месяца назад

Good soup

@Maisonier Месяц назад

So it's like mixing colors, back in kindergarten, you'd always blend everything together hoping to create this amazing hue, but it always just ended up this muddy, ugly brown

@oryxchannel 22 дня назад

Wanna get “addicted”.

@florentflote 2 месяца назад

@zippytechnologies Месяц назад

At first - I was excited to see a new video with useful info - but when it got to that crime scene mapping thing you do - well... sorta creepy, no? What is that method called? Conspiracy mapping? Good visuals but wow... I lost track of what was going on with it... maybe it was more of a "Why are you putting holes in you walls? Some poor guy is gonna be like "...where's the spackle and putty knife? Some tenant/wife/daughter/kid poked a bunch of holes in my wall"... I never understood how so many holes got poked into my daughters walls or even our living room walls (ahem... the wife) but maybe this is just something that is fun to do? Now, do a video on how to patch all those little holes and get a paint roller with medium nap to repaint and cover everything up - but don't just paint a small area... no.. gonna probably have to paint the whole wall so there's no more streaks and visible coverups.. or at least learn how to feather out the edges so they blend better with the existing paint on the walls.. ugh... can't plug those holes and paint with an ai agent (yet)... so at least some skills are still worthy of known and learning... go get a guy or gal with some handy work skills - mechanical skills or something useful that AI can't do well any never will (likely for a long time) and you at least know your guy/gal will be useful given that AI will be putting lots of other people out of work (and is already doing so). I need to hire some people to help me get this working for our company - but I can't afford to keep paying drywall contractors every time we get a new idea... lol

@yellowboat8773 Месяц назад

Wow, you have too much time on your hands

@MichaelDomer Месяц назад

Stop saying it's so simple... yes, for you it is.

@DC-xt1ry Месяц назад

Monoltic LLMs < MulitAgents

@rinokpp1692 Месяц назад

CAN I use agent on my mobile device

@gareththomas3234 Месяц назад

why not just use autogen?

@free_thinker4958 Месяц назад

Autogen is full of crap

@LukasSmith827 2 месяца назад

your timing is scary

@maya-akim 2 месяца назад

what do you mean?

@PazLeBon 2 месяца назад

so a claude haha

@ServerGamingTop100 2 месяца назад

It's not about collecting links with information and adding below the video... The important information is: which models are compatible and how to write the configuration file, which you barely mention! I can find all these links myself.

@JINGWA64 24 дня назад

problem with making vids that require prior knowledge and experience, is those who would find the information most useful, cannot make use of that information due requiring that prior knowledge and experience, yet at the same time the information provided in the vid is at the level to service a novice who had no prior interest, so who is the audience being catered to?

@MichaelDomer Месяц назад

Too many video of AI nerds on TouTube... for AI nerds, hardly anyone makes videos for the average John and Jane, resulting in a large group people detached from AI.

@maya-akim Месяц назад

what types of videos would appeal to average John and Jane?