Тёмный

Deploy Open LLMs with LLAMA-CPP Server 

Prompt Engineering
Подписаться 173 тыс.
Просмотров 9 тыс.
50% 1

Наука

Опубликовано:

 

3 окт 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 12   
@engineerprompt
@engineerprompt 3 месяца назад
If you want to build robust RAG applications based on your own datasets, this is for you: prompt-s-site.thinkific.com/courses/rag
@unclecode
@unclecode 3 месяца назад
👏 I'm glad to see you're focusing on DevOps options for AI apps. In my opinion, LlamaCpp will remain the best way to launch a production LLM server. One notable feature is its support for hardware-level concurrency. Using the `-np 4` (or `--parallel 4`) flag allows running 4 slots in parallel, where 4 can be any number of concurrent requests you want. One thing to remember the context window will be divided accordingly. For example, if you pass `-c 4096`, each slot will have a context size of 1024. Adding the `--n-gpu-layers` (`-ngl 99`) flag will offload the model layers to your GPU, providing the best performance. So, a command like `-c 4096 -np 4 -ngl 99` will offer excellent concurrency on a machine with a 4090 GPU.
@Nihilvs
@Nihilvs 3 месяца назад
amazing thanks !
@johnkost2514
@johnkost2514 3 месяца назад
Mozilla's Llamafile format is very flexible for deploying LLM(s) across operating systems. NIM has the advantage of bundling other types of models like audio or video.
@thecodingchallengeshow
@thecodingchallengeshow Месяц назад
can we finetune it using lora? i need it to be about ai so i have doqnloded data about ai and i want to add it to this model
@marcaodd
@marcaodd 3 месяца назад
Which server specs did you use?
@engineerprompt
@engineerprompt 3 месяца назад
Its running on A6000 with 48GB vRAM. Hope that helps.
@andreawijayakusuma6008
@andreawijayakusuma6008 3 месяца назад
bro, I wanna ask, do I need to use GPU to run this ?
@sadsagftrwre
@sadsagftrwre 3 месяца назад
No, llama-cpp specifically enables llms on cpus. its just going to be a bit slow, mate.
@andreawijayakusuma6008
@andreawijayakusuma6008 2 месяца назад
@@sadsagftrwre oke thanks for the answer. I just want to tried it but afraid it won't worked without GPU.
@sadsagftrwre
@sadsagftrwre 2 месяца назад
@@andreawijayakusuma6008 I tried on CPU and it worked.
Далее
Self-Host and Deploy Local LLAMA-3 with NIMs
13:08
Просмотров 7 тыс.
#慧慧很努力#家庭搞笑#生活#亲子#记录
00:11
TRENDNI BOMBASI💣🔥 LADA
00:28
Просмотров 497 тыс.
Local RAG with llama.cpp
8:38
Просмотров 4,2 тыс.
Cheap mini runs a 70B LLM 🤯
11:22
Просмотров 131 тыс.
All You Need To Know About Running LLMs Locally
10:30
Просмотров 157 тыс.
Deploy AI Models to Production with NVIDIA NIM
12:08
Просмотров 10 тыс.
Adding Custom Models to Ollama
10:12
Просмотров 30 тыс.
GGUF quantization of LLMs with llama cpp
12:10
Просмотров 3 тыс.
Внутри коробки iPhone 3G 📱
0:36
Просмотров 237 тыс.
3x 2x 1x 0.5x 0.3x... #iphone
0:10
Просмотров 2,7 млн
iPhone Standby mode dock, designed with @overwerk
0:27