Your videos are a gem. Thank you. Just a suggestion for a topic: Resource management. I don't understand how, in a multi-tier system with dedicated servers, there is such a difference in memory allocated for ollama when operating from curl, open webui, or membpt/letta. How can I tune what the client reserves on the ollama server?
Hi, thanks for all the great videos! I have an unusual issue and hope you can help. I use Ollama for both daily tasks and larger projects. For the bigger models, I’ve moved the files to an external drive due to their size and set the environment path (on Windows), which works well. However, for my daily tasks, it’s inconvenient to always have the external drive connected, especially for using basic models like LLaMA 3.2. Is there a way to set up two model locations so it can read from both when available, or default to the laptop when the external drive isn’t connected? Thanks in advance! 🥛
Thank you so much Matt! As always everything is relevant, clear and interesting! I have several questions for you: 1. How do I know what information the model was trained on? What skills does it contain? I have a weak computer, so I use small models. If I know what information was put into the model, I will understand if I should use it for my purposes. 2. Is there any way to remove unnecessary information from the model, so that I can train this model on my own. I am grateful to you in advance for your professional answers. From the bottom of my heart I wish you the soonest 1 000 000 subscribers, success and prosperity, you are the best !!!
usually the card describing the model says where the data it was trained on comes from. Removing info from a model is very hard and computationally very expensive.
Matt thanks for great ple content. I have a 2016 i7 32Gb ram and 6GB 1070Ti laptop I can run 13b and 27b models easily. It’s great platform! Please do crash course on templates
thanks, matt this seems strangely related to my questions i was asking on discord of what is called "a model" vs what is the GGUF file, because it can somewhat confusing to see the catalog on ollama of models, and see a catalog of models on hugging face, i'm trying sort grasp the notion of a model which makes it look like it's code, even though it's not, and how it's related to model template which is not the same as the system prompt. i understand that model template is somehow used for the creation of a model, but it's the language itself standard, that all tools besides ollama could understand?
All models require a template to use. the general syntax is the same, but some tools will use jinja templates to express that, and ollama uses gotemplates
Thanks a lot for the course, Matt. I have a 2020 iMac with an AMD Radeon which doesn’t work with cuda. In your experience, is there a way to use an external graphics card that works with Ollama?
@@technovangelist OK, thanks a lot for taking the time, sir. Time to change my Mac. Would you share which specs are relevant for working with Ollama? Thank you!
I think any apple silicon Mac is amazing. Getting the most memory and dis you can afford is important. With 64gb ram I can do up to a 70b model though I rarely do. Depending on your workload I would pick at least 1tb. I have 4 and it’s great. Though I spend a lot of time offloading stuff. The new Mac’s should be out soon but a used m1 or m2 is great too
I wish there is a model move command. The internal model folder can use up my ssd free space. Some of my models are huge and don't want to re-download again. Be nice to do a model move command to offload models I'm not using for now onto external ssd. But convenient to copy back to the internal ssd model folder when needed again. Made a py script that just moves the "big" files. But its not 100% reliable and few times I was forced to re-download the model to get it working again. An official ollama model mover tool would be most useful. keeps all the dependency files organized and working for the target model when moving between volumes ssd model folders.
a better lesson is swap file! the bigger your swap file the bigger the models one can load even on a lil rpi 5 i can run massive models any yer its slower but free :D this is for linux maybe mac aswell i dont have crapple stuff! function swap() { # Set the default size to 20 GB local default_swap_size=20 # Check if an argument is supplied if [ -z "$1" ]; then read -p "Enter Swap Size (GB) [default: $default_swap_size]: " sizeofswap # If no input is provided, use the default size sizeofswap=${sizeofswap:-$default_swap_size} echo "Setting New Swap Size To $sizeofswap GB" else echo "Setting New Swap Size To $1 GB" sizeofswap=$1 fi sudo swapoff /swapfile sudo rm /swapfile sudo fallocate -l "${sizeofswap}G" /swapfile sudo chmod 600 /swapfile sudo mkswap /swapfile sudo swapon /swapfile echo "New Swap Size Of $sizeofswap GB" free -h }
For most this is the first video of mine they have seen. Why should I be listened to to learn about ollama?it’s about credibility. And I am able to see watch time has generally gone up since adding that.