from home page of your webUI localhost3000 in your browser, click on your account name in the lower left, then click settings, then "models", then you can pull llama3.1 by typing it in the "pull" box and clicking the download button. when it completes, close webUI and reopen it. then i had the option to select 3.1 8B from the models list
Note for 405B: We are releasing multiple versions of the 405B model to accommodate its large size and facilitate multiple deployment options: MP16 (Model Parallel 16) is the full version of BF16 weights. These weights can only be served on multiple nodes using pipelined parallel inference. At minimum it would need 2 nodes of 8 GPUs to serve. MP8 (Model Parallel 8) is also the full version of BF16 weights, but can be served on a single node with 8 GPUs by using dynamic FP8 (Floating Point 8) quantization. We are providing reference code for it. You can download these weights and experiment with different quantization techniques outside of what we are providing. FP8 (Floating Point 8) is a quantized version of the weights. These weights can be served on a single node with 8 GPUs by using the static FP quantization. We have provided reference code for it as well. 405B model requires significant storage and computational resources, occupying approximately 750GB of disk storage space and necessitating two nodes on MP16 for inferencing.
finally setup open webui thanks to you. i'd approached it, seen "docker" and left it on my todo list for weeks/months. I'm running gemma2 2b on my gtx 1060 6gb vram. any suggestions on good models for my size?
hello. After installing OpenWebUI, I am unable to find OLLAM under 'Select a Model'. Is this due to a specific configuration? For information, my system is running Ubuntu 24.04.