This video is about mergekit, how to choose and blend models. It's non technical but links to technical papers are included. You need to know how to navigate the terminal but no programming is required.
🤖 Join my Discord community: / discord
📰 My tutorials on Medium: / mayaakim
🐦 My twitter profile: / maya_akim
To rent a GPU from Massed Compute (mergekit preinstalled) follow the link ⤵️
bit.ly/maya-akim
Code for 50% discount: MayaAkim
All links:
mergekit:
github.com/arcee-ai/mergekit
Open LLM Leaderboard
huggingface.co/spaces/Hugging...
my huggingface profile (with model configs you can copy):
huggingface.co/mayacinka
git installation:
gitforwindows.org/
lfs installation:
docs.github.com/en/repositori...
supported architecture for mergekit:
github.com/arcee-ai/mergekit/...
best blog about mergekit:
/ merge-large-language-m...
other really good blog about mergekit:
/ merge-large-language-m...
Charles Goddard’s blog: (author of mergekit)
goddard.blog/about/
Mona lisa with Mohawk
www.designboom.com/technology...
What is YAML:
www.techtarget.com/searchitop...
What is Data Contamination:
bdtechtalks.com/2023/07/17/ll...
Goodharts law
www.cna.org/reports/2022/09/g...
LazyMergekit:
colab.research.google.com/dri...
Auto evaluation: (requires runpod profile)
colab.research.google.com/dri...
configuration with 14 models merged:
huggingface.co/EmbeddedLLM/Mi...
MoE instructions:
github.com/arcee-ai/mergekit/...
higher density - better results
github.com/arcee-ai/mergekit/...
Model family tree: (visualization)
colab.research.google.com/dri...
huggingface.co/spaces/mlabonn...
cost of training mistral:
www.ft.com/content/387eeeab-1...
Leaderboard is disgusting:
/ open_llm_leaderboard_i...
Merging models with different architectures:
arxiv.org/pdf/2401.10491.pdf
merging models different arch:
github.com/18907305772/FuseLLM
Blending is all you need:
arxiv.org/pdf/2401.02994.pdf
Model soups
arxiv.org/pdf/2203.05482.pdf
Ties-merging research paper:
arxiv.org/pdf/2306.01708.pdf
Dare merge research paper:
arxiv.org/pdf/2311.03099.pdf
Task arithemtic:
arxiv.org/pdf/2212.04089.pdf
Benchmarks
Arc benchmarks
deepgram.com/learn/arc-llm-be...
arxiv.org/pdf/1803.05457.pdf
HellaSwag
arxiv.org/pdf/1905.07830.pdf
MMLU
arxiv.org/pdf/2009.03300.pdf
TrithfulQA
arxiv.org/abs/2109.07958
WinoGrande
arxiv.org/pdf/1907.10641.pdf
GSM8K
arxiv.org/pdf/2110.14168.pdf
overfitting problem Ann Lotz:
arstechnica.com/tech-policy/2...
Benchmarks are a problem screenshots:
analyticsindiamag.com/the-pro...
/ llm_benchmarks_are_bro...
/ llm_benchmarks_are_bul...
Attributions:
[commons.wikimedia.org/wiki/Fi...](commons.wikimedia.org/wiki/Fi...)
Timecodes:
0:00 - 1:47 - blending intro
1:48 - 3:36 - promise of blending
3:37 - 4:22 - blending steps and requirements
4:23 - 5:05 - all you need is hardware
5:06 - 5:30 - mergekit installation
5:31 - 9:23 - merge methods
10:48 - 13:31 - configurations and yaml
13:32 - 14:38 - how to run merge
14:39 - 14:42 - upload merged model
14:43 - 16:27 - best merge method
16:28 - 20:16 benchmark problems, overfitting and contamination
#mergekit #llm #localmodels
24 май 2024