By specifying the model in the ollama.embeddings() call and in the OllamaEmbeddings class, what goes on behind the scenes and how is that model utilized in that scenario? Are there advantages to different models specified for the embedding process?
Very insightful question. I'm assuming the embeddings are extracted from the linear layer at the end of the LLama architecture (this is an assumption, of course). Regarding the advantage part, it depends on your use case, but using the embeddings can be an additional experiment. It could also be useful to check other embeddings models too.
Thanks for sharing :) In the initial package installation I had also to run: `pip install llama-index-embeddings-ollama` in order to run `from llama_index.embeddings.ollama import OllamaEmbedding`.
HI. thank you for this video. I am using a server, and we are not allowed to use anything on baseline, but have tyo create docker container. I have installed llama in docker following the command docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama and i am able to run llama3 from ollama with the command docker exec -it ollama ollama run llama3 Now, could you please tell me how can i follow your way to use ollama for embedding? i want to use llama3 from ollama as embedding model like you did in the video.
@@moslehmahamud You mentioned you are using m1 macbook pro. Can you run ollama in macbook on GPUs? From what i know it only uses cpus. Please give some insight, it would help.
Hi, thank you for this video. But I am still confused. Why can Ollama use llama3, which is a LLM, to embed? I only used embedding models like Jina and bge before, the input of the embedding model is natural language and the output is vectors. I thought the input and output of LLM were both natural languages, how do ollama get vectors? Look forward to your reply.
I thought the input and output of LLM were both natural languages, how do ollama get vectors? the text input to LLM has to be converted into the embedding(the meaningful numeric representation of the words, each LLM learn these embedding, and has its own embedding after the training), each word is represented in the form of vector.
Just to be sure, the results running the three methods shown are always the same, just the speed is different? Pls, explain the differences of each method
Hi, this video is showing how to use the llama3 with different packages. So, that people can use/experiement them in their LLM based application. Hope it helps!
@@zoedaemon4940 I think the reason is that your CPU was used instead. A 4050 should be capable of this. Maybe it switched to the CPU because of the insuficient vram.