Importing Open Source Models to Ollama

Decoder

Подписаться 7 тыс.

Просмотров 32 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

26 сен 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 114

@rookandpawn 3 месяца назад

this is one of those videos where I can say "out of all the videos on this subject, you just straight out provide the real documentation and know how". From the modelfile, prompt template, explanation, etc.. Thank you.

@decoder-sh 3 месяца назад

Thank you taking the time to comment! I look forward to making more videos

@OUTLANDAH 2 месяца назад

@@decoder-sh im not that great with using terminals. im on powershell and im a little confused on how to get to the modelfile make section

@mohl-bodell2948 7 месяцев назад

Thanks a lot! I love your carefully prepared, very quick and succinct, yet complete style! This one was a bit over-paced compared to your other two videos so far, but just a tiny bit, also it would have been nice to see the rename at the end. Keeping it succinct and to the point as you do is the big value in your videos.

@decoder-sh 7 месяцев назад

Thank you for watching my videos, as well as for your feedback! You’re the second commenter to mention this was a bit too fast, so I will do my best to correct that in the next one :)

@scottmiller2591 7 месяцев назад

Cool. Looking forward to local RAG when that's ready.

@MarkSze 7 месяцев назад

Great simple explanations, and so useful

@blackswaneleven 7 месяцев назад

Ничего лишнего. Отличная подача материала. Жду новых интересных видео.

@ironman5034 7 месяцев назад

Great straightforward informative video, keep it up man...was trying some currently dolphin-mixtral

@baheth3elmy16 7 месяцев назад

Your video is amazing. I never thought transferring these big models into GGUF was this simple. You just unlocked a lot of possibilities. Thank you so very much! Sadly you don't have many videos posted. Hope you do more videos. I wonder if Docker is the only way to transfer models to GGUF.

@decoder-sh 7 месяцев назад

You can also use llama.cpp (which ollama is basically a fancy wrapper for) to do the conversion to gguf!

@bruno10505 4 месяца назад

@@decoder-sh Does it work for tokenizer.json file? Docker seems to only work with the .model one

@decoder-sh 4 месяца назад

@@bruno10505 Unfortunately I'm not sure about that

@cmwilki 7 месяцев назад

Unfortunately the docker run command to quantize your own model fails. I've had a heck of a time getting anything ollama convert / quantize-related to work :(

@mohammedrashad-n9p 26 дней назад

Very helpful, thank you

@kachunchau4945 7 месяцев назад

Hopefully you guys can make a video about fine tuning or long chaining to make the model more adaptable to our personal use needs

@decoder-sh 7 месяцев назад

Absolutely! I’m already putting together a script for fine tuning now :)

@fah_mi 7 месяцев назад

Great content thanks, you definitely deserve more subscribers ! Can you show us how to let the models have access to local data and learn from it in a future video ?

@decoder-sh 7 месяцев назад

Thanks for watching! Yes I do intend to do a whole series on interacting with documents in the near future :)

@samadislam4458 Месяц назад

I can only see the bin file where is the gguf file?

@Hotboy-q7n 2 месяца назад

Hey man!!! Thankx You're the man You're the one who wakes the rooster up You don't wear a watch, you decide what time it is When you misspell a word, the dictionary updates You install Windows, and Microsoft agrees to your terms When you found the lamp, you gave the genie three wishes When you were born, you slapped the doctor The revolver sleeps under your pillow You ask the police for their documents When you turned 18, your parents moved out Ghosts gather around a campfire to tell stories about you hugs for brazil

@decoder-sh 2 месяца назад

Wow no one has ever written me lore before! I hope to live up to your impression of me 🫡

@Hotboy-q7n 2 месяца назад

@@decoder-sh No need to try hard you already saved my life from an Indian villain who was holding me for more than 6 hours in a suicidal tutorial When you come to Brazil, you already have a house to stay in

@decoder-sh 2 месяца назад

Then it sounds like it's time to take this show on the road 😎

@Hotboy-q7n 2 месяца назад

@@decoder-sh 😎😎😎😎

@bennguyen1313 7 месяцев назад

Why are the LLM models so big (25GB)? For example, isn't the model (BLOOM, Meta's Llama 2, Guanaco 65B and 33B, dolphin-2.5-mixtral-8x7b etc) just the algorithm that is used to learn your data? And if the training data is another 25GB, what is the resulting size if you wanted to run your new AI offline on a new PC? 50GB? And what do the 33B and 8x7b mean? For example, everyone says that ChatGPT4 has 220 billion parameters and is a 16-way mixture model with eight sets of weights?

@decoder-sh 7 месяцев назад

So a model from a zoomed-out perspective has two components - the model architecture (llama, mistral, mixtral...) which describes the steps and connections that transform an input to an output, and the weights which are the result of training the model. Another way to think about this is that the model is like a blueprint that tells us which parts of a building go where, how many doors there are, what the plumbing looks like. A blueprint itself takes up no space and weighs nothing. But the building materials, the weights in our model, are what physically occupy the space. Here's a more literal explanation of weights: datascience.stackexchange.com/questions/120764/how-does-an-llm-parameter-relate-to-a-weight-in-a-neural-network For fast math on how much disk space a model uses, try this calculation: # of parameters * (4 bits if quantized, 32 if not) / (8 bits in one byte). So the Phi model has 2.7B parameters and is about 1.6GB. Math: 2.7 * 1e9 * 4 (all of ollama's models are quantized afik) / 8 = 1.35gb. Then every model uses some extra space for config files etc.

@StnImg 7 месяцев назад

Awesome video. You covered every bit of it. Can u make a video on Agentkit codebase with ollama

@decoder-sh 7 месяцев назад

I will look into it! Thanks for the suggestion

@chethankumarda9883 5 месяцев назад

HOW TO DO IN WINDOWS

@parthwagh3607 2 месяца назад

please reply if you found the way to run in windows

@dib9900 3 месяца назад

When I want to import an embeddings models yaml modelfile different that for the Chat LLM models? If a model doesn't have information about supported prompt templates & parameters, where do I get those?

@smithnigelw 7 месяцев назад

Very interesting and useful. I’m interested in the format GGUF, so maybe you can describe that in more detail. I wish Ollama was available for Windows OS.

@AnythingGodamnit 7 месяцев назад

Fwiw I run it on windows via docker. I don’t have an nvidia GPU though, so it’s pretty slow. Agree that a native install experience would be nice

@decoder-sh 7 месяцев назад

That’s a good idea, I feel like it’s a common enough format to warrant a deep dive or at least a closer look

@sunnywang998 7 месяцев назад

gguf is the new format for llama.cpp model image

@JamesMcHie 5 месяцев назад

I would love to see a similar tutorial for Windows as I am running Ollama with the openWebUI front end in Windows on an Intel Arc GPU.

@张立昌-o4l Месяц назад

I have downed ollama, stroed it in my computer, but cannont open it. Why? How to deal with this?

@maxlgemeinderat9202 Месяц назад

how would you import not-quantized models?

@parthwagh3607 2 месяца назад

How to run this in windows, where files are safetensors? Where to create modelfile? I have multiple models on different directory of oobabooga/text-generation-webui, I have to use them in ollama.

@kamleshpaul414 7 месяцев назад

Thank you for this video i have one question to generate gguf do i need any special hardware or can i just generate from google colab thanks again for this video ❤

@decoder-sh 7 месяцев назад

I believe this process does require a GPU, but you should have access to one on Colab

@user-he8qc4mr4i 5 месяцев назад

Nice one! thx for sharing!

@decoder-sh 5 месяцев назад

My pleasure!

@DihelsonMendonca 2 месяца назад

💥That´s wonderful. I´m not a programmer, don´t know Python, but I could install Open WebUI, nd it has only Ollama models, and I love those Hugging Face GGUF models. So I need a way to run them on Open WebUI. Thanks ! ❤❤❤

@ScrantonStrangler19 6 месяцев назад

Good explanation! Is there a list of model architectures that are supported by Ollama?

@decoder-sh 6 месяцев назад

I wasn't able to find one - ollama is llama.cpp under the hood, and the closest thing I was able to find was their list of supported models. Anything that's a finetune of these models should work! github.com/ggerganov/llama.cpp?tab=readme-ov-file#description

@ScrantonStrangler19 6 месяцев назад

@@decoder-sh I see, thanks a lot! I'm gonna try some of them out.

@thomasshields1827 6 месяцев назад

Really nice video!

@decoder-sh 6 месяцев назад

Thank you very much!

@ACaruso 5 месяцев назад

great explanation and video format. do you know how to use models pulled with ollama (i.e. $ ollama pull dolphin-mixtral) as gguf files? is there a way to convert those to .gguf? thanks !

@decoder-sh 5 месяцев назад

After poking around the ollama repo, it does appear that models are stored as ggufs github.com/ollama/ollama/blob/main/server/images.go#L696 github.com/ollama/ollama/blob/main/server/images.go#L401

@carthagely122 4 месяца назад

Very very thank you

@SODKGB 5 месяцев назад

I would like to ingest several of my own documents and perhaps add it into an existing gguf? Not sure what is the best way to add documents to make them searchable, while using a windows version of Ollama and Docker. Any tips would be great, thanks. I want to avoid the one at a time concept and the need to use the local interface, ideally it would be great to dump the files into a directory and run the ingester.

@crazyKurious 6 месяцев назад

Just awesome !

@decoder-sh 6 месяцев назад

Thanks for watching!

@emil8367 7 месяцев назад

super useful. thanks !

@decoder-sh 7 месяцев назад

Thanks for watching!

@pavansatish3506 3 месяца назад

What’s your system configuration? Ram and M1?

@decoder-sh 3 месяца назад

32gb ram, M1 chip

@tiredofeverythingnew 7 месяцев назад

Great video but you go over some of the steps really quickly. Slow down on the showing part, this is why we are here to learn.

@decoder-sh 7 месяцев назад

Thank you for the feedback! I’m still refining my pacing, I’ll do my best to improve that in the next one

@crobinso2010 7 месяцев назад

See if changing the speed in settings helps

@feignenthusiasm 7 месяцев назад

I disagree, the pacing of this video was perfect 👌. Thanks so much for cutting out the fluff, showing the important parts but keeping things moving.

@ejh237 7 месяцев назад

I think he was referring to the “showing part”, being when we are seeing the actions in terminal. I did have to back up so i could look for more than the 1/10th a second one part was on screen. :). All said, great video, and helpful too!

@decoder-sh 7 месяцев назад

@@ejh237 Noted for my next video! I think I'm going to start doing pop-outs of any commands that I run that stick around until the next command. That way you can see the command even while you're watching the output of that command go by.

@mishlaev 7 месяцев назад

Thanks for the video. Why don’t you use LM Studio instead of Olama?

@decoder-sh 7 месяцев назад

Tbh I haven’t tried it yet! One of the videos I’d like to do in the near future is comparing different ways of running local models, with or without a UI

@sebastianarias9790 5 месяцев назад

Make a video using a model that analyzes tables and generates new processed tables like csv, excel !

@sebastianarias9790 4 месяца назад

Could this be possible my friend?

@proterotype 7 месяцев назад

Great video. New sub

@decoder-sh 7 месяцев назад

Love to hear it, thanks for watching!

@squiddymute 7 месяцев назад

sorry for bothering again , i'm using the ollama api in python to create a chat request with 1 message but i found if i create another request the context from the same request appears to have changed . I'm trying to parse the output from the first request make some decisions on it then ask another question but in the context of the 1st message. I tried using generate instead of chat but it seems that it doesn't support images list parameter

@decoder-sh 7 месяцев назад

What do you mean by the context? For the chat endpoint, you'll need to append the llm response to the list of messages you're sending in your second request, see here for more info github.com/ollama/ollama/blob/main/docs/api.md#chat-request-with-history

@squiddymute 7 месяцев назад

i'm fiddling with the llama2 model i find it impossible for it to produce short descriptions from big ones :/ it keeps shoving out huge chunks of text no matter what i tell it , is there a hack to reduce the word output count somehow ?

@decoder-sh 7 месяцев назад

Maybe try modifying the system prompt to include something like "Your responses should be as concise as possible, no longer than 2 sentences." I've also found that adding examples to the system prompt helps a lot, eg "here's an example exchange: 'user: count to 3; assistant: 1 2 3' "

@wilcosec 7 месяцев назад

Next up, contribute your configs back to ollama so others don’t have to do these steps over again.

@decoder-sh 7 месяцев назад

That’s a great idea!

@khlifihamza1368 4 месяца назад

Thanks a lot! i only have one question, regarding creating the ollama model based on the GGUF, it worked perfect with the suggested template, but the second option does not, why is that? and can you provide the modelfile used for the second method please? Modelfile.txt: FROM "CapybaraHermes-2.5-Mistral-7B" PARAMETER stop "" PARAMETER stop "" TEMPLATE """ system {{ .System }} user {{ .Prompt}} assistant """

@kelvinli2970 4 месяца назад

How to do it in window?

@parthwagh3607 2 месяца назад

please reply if you found the way to run in windows

@creativemind2506 7 месяцев назад

Hey can you please let me know about hollyland mic and would be great if you can share the link

@decoder-sh 7 месяцев назад

Hey yeah! It's the Lark Max, I've really been enjoying using it a.co/d/0RBC5XQ

@drmetroyt 7 месяцев назад

I have a confusion , how to write modelfile for every llm i import into ollama ? Need some tutorial on various parameters, template and other things in model file

@decoder-sh 6 месяцев назад

This is a great idea, I'll add this to my list! I would be happy to walk through how to fully customize an ollama modelfile

@drmetroyt 6 месяцев назад

@@decoder-sh thanks waiting for it

@JohnSmith-iv5zy 4 месяца назад

@@drmetroyt did we get it yet lol

@_MoshikoAz_ 7 месяцев назад

do you find it a better replacement to chat gpt ? (specifically gpt-4)

@decoder-sh 7 месяцев назад

Gpt4 is currently the gold standard for LLMs by quality. In fact a lot of models are trained on data generated by gpt4, that should tell us how good people think it is. But while gpt4 is very good at most things, we can train small models that we’re able to run locally to be good at specific things. I’ll be doing a video on this process called fine tuning in the near future

@jmpark3 7 месяцев назад

I built a model the way you showed me, but the model's response has nothing to do with my question.

@decoder-sh 7 месяцев назад

I've noticed this sometimes happens if an unexpected character appears in the modelfile. For example, my text editor sometimes converts " into ”, which is a different character. If that happens, then I get the same issue as you.

@TomaGamil 7 месяцев назад

great

@SAVONASOTTERRANEASEGRETA 7 месяцев назад

but why isn't there a windows version?

@nonenothingnull 7 месяцев назад

Easier

@decoder-sh 7 месяцев назад

There is now! ollama.com/download/windows

@lowpolyduck 7 месяцев назад

OllamAF

@AI-PhotographyGeek 7 месяцев назад

I'm newbee but it was hard to grasp what you have done, I belive only expert in this field can imagine or probably while watching this video I have to work hard to understand intermediate steps between your steps shown in the video. Video is intresting but not useful to me.

@decoder-sh 7 месяцев назад

Hey thanks for your comment. I’d like to make my content friendly for beginners that have a basic ability to use the terminal. Which concepts in particular have you trouble? I hope to use your feedback to improve my future videos

@AI-PhotographyGeek 7 месяцев назад

@@decoder-sh Thank you for your reply. 3:07 I didn't understand what is the model file, what is the extension is, where to create it, and where not to create it, should I copy the GUFF file to any folder is okay, making the model file to any location will be acceptable? There were so many questions at that point which led me to stop watching 😅😅

@decoder-sh 7 месяцев назад

@@AI-PhotographyGeek⁠oh I see! I have another video that goes into much more detail about model files, please let me know if this clarifies things for you ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-xa8pTD16SnM.html

@AI-PhotographyGeek 7 месяцев назад

@@decoder-sh Definitely, I will refer to that video, but in the future, please capture such steps, there will be a lot of new visitors to your site and they will be watching your video for the first time, if they feel that they need to watch your other videos, just to understand any particular video, then it would be very hard for them to follow you. I hope you grow more on this journey!😊 Not expecting you to explain everything again in detail, but only showing it in a video will help a lot.

@decoder-sh 7 месяцев назад

@@AI-PhotographyGeekthat’s a very good idea! I’ll be more explicit about prerequisite knowledge and where to find it. Thank you again for the feedback 🤝

@joelreyes5583 День назад

This helped me a lot. Great quality and good way of explaining everything. Thank you so much

@proterotype 4 месяца назад

Rewatching this for the non-GGUF repo section. That would’ve been tricky without you

@decoder-sh 4 месяца назад

Glad to be of use!

@excido7107 5 месяцев назад

Great video man! Keep up the good work! I have a question, my docker isn't working on my Windows due to some WSL issue I think but I've got Ollama running without docker and was wondering if its still possible to quantize a model with Ollama?

@parthwagh3607 2 месяца назад

please reply if you found the way to run in windows

@excido7107 2 месяца назад

@@parthwagh3607 Sorry I haven't, as in Ollama works fine in windows, but not importing open source models

@parthwagh3607 2 месяца назад

@@excido7107 if we have downloaded the models for obabooga and want to use in ollama.