Get Embeddings From Llama3

Mosleh Mahamud

Подписаться 2,3 тыс.

Просмотров 9 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

2 окт 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 32

@davidinawe791 27 дней назад

Thanks, very helpful and straightforward

@htrnhtrn6986 5 месяцев назад

Is it true that the embedding values of the three methods are different for the same sentence?

@moslehmahamud 5 месяцев назад

possible

@johnbarguti1025 4 месяца назад

By specifying the model in the ollama.embeddings() call and in the OllamaEmbeddings class, what goes on behind the scenes and how is that model utilized in that scenario? Are there advantages to different models specified for the embedding process?

@moslehmahamud 4 месяца назад

Very insightful question. I'm assuming the embeddings are extracted from the linear layer at the end of the LLama architecture (this is an assumption, of course). Regarding the advantage part, it depends on your use case, but using the embeddings can be an additional experiment. It could also be useful to check other embeddings models too.

@LuigiBungaro 5 месяцев назад

Thanks for sharing :) In the initial package installation I had also to run: `pip install llama-index-embeddings-ollama` in order to run `from llama_index.embeddings.ollama import OllamaEmbedding`.

@moslehmahamud 5 месяцев назад

Thanks! thats correct! i'll add it in the notebook. Frogot that i installed it before

@johnbrandt5158 4 месяца назад

Hey! What are your computer specs? Wondering how that may affect speed, either positively or negatively.

@moslehmahamud 4 месяца назад

Hey! Using an M1 macbook pro (2020). Works decent for basic inference. Training is bit of a struggle, as expected. Let me know if you have any tips

@jennilthiyam1261 4 месяца назад

HI. thank you for this video. I am using a server, and we are not allowed to use anything on baseline, but have tyo create docker container. I have installed llama in docker following the command docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama and i am able to run llama3 from ollama with the command docker exec -it ollama ollama run llama3 Now, could you please tell me how can i follow your way to use ollama for embedding? i want to use llama3 from ollama as embedding model like you did in the video.

@moslehmahamud 3 месяца назад

i ran ollama locally from the terminal, so not from docker. found it to be quite cumbersum to connect to gpus (from docker)

@yorkmena 2 месяца назад

@@moslehmahamud You mentioned you are using m1 macbook pro. Can you run ollama in macbook on GPUs? From what i know it only uses cpus. Please give some insight, it would help.

@wilfredomartel7781 4 месяца назад

😊

@CasperW-o2u 2 месяца назад

Hi, thank you for this video. But I am still confused. Why can Ollama use llama3, which is a LLM, to embed? I only used embedding models like Jina and bge before, the input of the embedding model is natural language and the output is vectors. I thought the input and output of LLM were both natural languages, how do ollama get vectors? Look forward to your reply.

@abdulqadircp Месяц назад

I thought the input and output of LLM were both natural languages, how do ollama get vectors? the text input to LLM has to be converted into the embedding(the meaningful numeric representation of the words, each LLM learn these embedding, and has its own embedding after the training), each word is represented in the form of vector.

@fernandolbf_ 3 месяца назад

Just to be sure, the results running the three methods shown are always the same, just the speed is different? Pls, explain the differences of each method

@moslehmahamud 3 месяца назад

Hi, this video is showing how to use the llama3 with different packages. So, that people can use/experiement them in their LLM based application. Hope it helps!

@RomyIlano Месяц назад

thanks!!!

@alejandrogallardo1414 4 месяца назад

You ran LLama 3 8B locally on a Mac?!

@moslehmahamud 4 месяца назад

Yes sir

@jenot7164 3 месяца назад

I am running it on a beefy laptop GPU to make the inference fast. M1 chips are pretty impressive.

@moslehmahamud 3 месяца назад

@@jenot7164 i was surprised myself m1 works, prefer some beefy setup tho

@zoedaemon4940 3 месяца назад

My cheap laptop RTX 4050 (just 6GB 😢) screaming in pain just for a single response text generation, took 15 minutes to see the output 😅

@jenot7164 3 месяца назад

@@zoedaemon4940 I think the reason is that your CPU was used instead. A 4050 should be capable of this. Maybe it switched to the CPU because of the insuficient vram.