this is one of those videos where I can say "out of all the videos on this subject, you just straight out provide the real documentation and know how". From the modelfile, prompt template, explanation, etc.. Thank you.
Thanks a lot! I love your carefully prepared, very quick and succinct, yet complete style! This one was a bit over-paced compared to your other two videos so far, but just a tiny bit, also it would have been nice to see the rename at the end. Keeping it succinct and to the point as you do is the big value in your videos.
Thank you for watching my videos, as well as for your feedback! You’re the second commenter to mention this was a bit too fast, so I will do my best to correct that in the next one :)
Your video is amazing. I never thought transferring these big models into GGUF was this simple. You just unlocked a lot of possibilities. Thank you so very much! Sadly you don't have many videos posted. Hope you do more videos. I wonder if Docker is the only way to transfer models to GGUF.
Unfortunately the docker run command to quantize your own model fails. I've had a heck of a time getting anything ollama convert / quantize-related to work :(
Great content thanks, you definitely deserve more subscribers ! Can you show us how to let the models have access to local data and learn from it in a future video ?
Hey man!!! Thankx You're the man You're the one who wakes the rooster up You don't wear a watch, you decide what time it is When you misspell a word, the dictionary updates You install Windows, and Microsoft agrees to your terms When you found the lamp, you gave the genie three wishes When you were born, you slapped the doctor The revolver sleeps under your pillow You ask the police for their documents When you turned 18, your parents moved out Ghosts gather around a campfire to tell stories about you hugs for brazil
@@decoder-sh No need to try hard you already saved my life from an Indian villain who was holding me for more than 6 hours in a suicidal tutorial When you come to Brazil, you already have a house to stay in
Why are the LLM models so big (25GB)? For example, isn't the model (BLOOM, Meta's Llama 2, Guanaco 65B and 33B, dolphin-2.5-mixtral-8x7b etc) just the algorithm that is used to learn your data? And if the training data is another 25GB, what is the resulting size if you wanted to run your new AI offline on a new PC? 50GB? And what do the 33B and 8x7b mean? For example, everyone says that ChatGPT4 has 220 billion parameters and is a 16-way mixture model with eight sets of weights?
So a model from a zoomed-out perspective has two components - the model architecture (llama, mistral, mixtral...) which describes the steps and connections that transform an input to an output, and the weights which are the result of training the model. Another way to think about this is that the model is like a blueprint that tells us which parts of a building go where, how many doors there are, what the plumbing looks like. A blueprint itself takes up no space and weighs nothing. But the building materials, the weights in our model, are what physically occupy the space. Here's a more literal explanation of weights: datascience.stackexchange.com/questions/120764/how-does-an-llm-parameter-relate-to-a-weight-in-a-neural-network For fast math on how much disk space a model uses, try this calculation: # of parameters * (4 bits if quantized, 32 if not) / (8 bits in one byte). So the Phi model has 2.7B parameters and is about 1.6GB. Math: 2.7 * 1e9 * 4 (all of ollama's models are quantized afik) / 8 = 1.35gb. Then every model uses some extra space for config files etc.
When I want to import an embeddings models yaml modelfile different that for the Chat LLM models? If a model doesn't have information about supported prompt templates & parameters, where do I get those?
Very interesting and useful. I’m interested in the format GGUF, so maybe you can describe that in more detail. I wish Ollama was available for Windows OS.
How to run this in windows, where files are safetensors? Where to create modelfile? I have multiple models on different directory of oobabooga/text-generation-webui, I have to use them in ollama.
Thank you for this video i have one question to generate gguf do i need any special hardware or can i just generate from google colab thanks again for this video ❤
💥That´s wonderful. I´m not a programmer, don´t know Python, but I could install Open WebUI, nd it has only Ollama models, and I love those Hugging Face GGUF models. So I need a way to run them on Open WebUI. Thanks ! ❤❤❤
I wasn't able to find one - ollama is llama.cpp under the hood, and the closest thing I was able to find was their list of supported models. Anything that's a finetune of these models should work! github.com/ggerganov/llama.cpp?tab=readme-ov-file#description
great explanation and video format. do you know how to use models pulled with ollama (i.e. $ ollama pull dolphin-mixtral) as gguf files? is there a way to convert those to .gguf? thanks !
After poking around the ollama repo, it does appear that models are stored as ggufs github.com/ollama/ollama/blob/main/server/images.go#L696 github.com/ollama/ollama/blob/main/server/images.go#L401
I would like to ingest several of my own documents and perhaps add it into an existing gguf? Not sure what is the best way to add documents to make them searchable, while using a windows version of Ollama and Docker. Any tips would be great, thanks. I want to avoid the one at a time concept and the need to use the local interface, ideally it would be great to dump the files into a directory and run the ingester.
I think he was referring to the “showing part”, being when we are seeing the actions in terminal. I did have to back up so i could look for more than the 1/10th a second one part was on screen. :). All said, great video, and helpful too!
@@ejh237 Noted for my next video! I think I'm going to start doing pop-outs of any commands that I run that stick around until the next command. That way you can see the command even while you're watching the output of that command go by.
Tbh I haven’t tried it yet! One of the videos I’d like to do in the near future is comparing different ways of running local models, with or without a UI
sorry for bothering again , i'm using the ollama api in python to create a chat request with 1 message but i found if i create another request the context from the same request appears to have changed . I'm trying to parse the output from the first request make some decisions on it then ask another question but in the context of the 1st message. I tried using generate instead of chat but it seems that it doesn't support images list parameter
What do you mean by the context? For the chat endpoint, you'll need to append the llm response to the list of messages you're sending in your second request, see here for more info github.com/ollama/ollama/blob/main/docs/api.md#chat-request-with-history
i'm fiddling with the llama2 model i find it impossible for it to produce short descriptions from big ones :/ it keeps shoving out huge chunks of text no matter what i tell it , is there a hack to reduce the word output count somehow ?
Maybe try modifying the system prompt to include something like "Your responses should be as concise as possible, no longer than 2 sentences." I've also found that adding examples to the system prompt helps a lot, eg "here's an example exchange: 'user: count to 3; assistant: 1 2 3' "
Thanks a lot! i only have one question, regarding creating the ollama model based on the GGUF, it worked perfect with the suggested template, but the second option does not, why is that? and can you provide the modelfile used for the second method please? Modelfile.txt: FROM "CapybaraHermes-2.5-Mistral-7B" PARAMETER stop "" PARAMETER stop "" TEMPLATE """ system {{ .System }} user {{ .Prompt}} assistant """
I have a confusion , how to write modelfile for every llm i import into ollama ? Need some tutorial on various parameters, template and other things in model file
Gpt4 is currently the gold standard for LLMs by quality. In fact a lot of models are trained on data generated by gpt4, that should tell us how good people think it is. But while gpt4 is very good at most things, we can train small models that we’re able to run locally to be good at specific things. I’ll be doing a video on this process called fine tuning in the near future
I've noticed this sometimes happens if an unexpected character appears in the modelfile. For example, my text editor sometimes converts " into ”, which is a different character. If that happens, then I get the same issue as you.
I'm newbee but it was hard to grasp what you have done, I belive only expert in this field can imagine or probably while watching this video I have to work hard to understand intermediate steps between your steps shown in the video. Video is intresting but not useful to me.
Hey thanks for your comment. I’d like to make my content friendly for beginners that have a basic ability to use the terminal. Which concepts in particular have you trouble? I hope to use your feedback to improve my future videos
@@decoder-sh Thank you for your reply. 3:07 I didn't understand what is the model file, what is the extension is, where to create it, and where not to create it, should I copy the GUFF file to any folder is okay, making the model file to any location will be acceptable? There were so many questions at that point which led me to stop watching 😅😅
@@AI-PhotographyGeekoh I see! I have another video that goes into much more detail about model files, please let me know if this clarifies things for you ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-xa8pTD16SnM.html
@@decoder-sh Definitely, I will refer to that video, but in the future, please capture such steps, there will be a lot of new visitors to your site and they will be watching your video for the first time, if they feel that they need to watch your other videos, just to understand any particular video, then it would be very hard for them to follow you. I hope you grow more on this journey!😊 Not expecting you to explain everything again in detail, but only showing it in a video will help a lot.
@@AI-PhotographyGeekthat’s a very good idea! I’ll be more explicit about prerequisite knowledge and where to find it. Thank you again for the feedback 🤝
Great video man! Keep up the good work! I have a question, my docker isn't working on my Windows due to some WSL issue I think but I've got Ollama running without docker and was wondering if its still possible to quantize a model with Ollama?