No video :(

LocalGPT & Llama-2: Adding Chat History & Custom Prompt Templates

Prompt Engineering

Подписаться 168 тыс.

Просмотров 29 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

29 авг 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 62

@engineerprompt Год назад

What are you going to create with this tool? 💼Consulting: calendly.com/engineerprompt/consulting-call 🦾 Discord: discord.com/invite/t4eYQRUcXB ☕ Buy me a Coffee: ko-fi.com/promptengineering |🔴 Join Patreon: Patreon.com/PromptEngineering ▶ Subscribe: www.youtube.com/@engineerprompt?sub_confirmation=1

@gambiarran419 11 месяцев назад

I have listened to this 4 times because I think you have presented the information fantastically. Unfortunately, however, this means I had to endure 32 adverts, including some really long one of 3 mins or more. I am sure this is earning you good money, and I am letting them run to help with that for you, but really, if you can change the setting so people like me can support you without having to wait 10 minutes every time we watch your video. Much appreciated.

@yuzual9506 8 месяцев назад

Your a god...from France, i know we have somes gods too, but your the best teacher. THX

@nikow1060 Год назад

This history feature is great. However, i would be great to have a chat history like with the chatGPT-user interface .. so that one can retrieve any of the former historical chats I had with localGPT. Just an idea :)

@nikow1060 Год назад

I just had an idea: What about saving the session chat as a pdf with a title (example "Session 64 about something") ... and then uploading this document at a new session and tell localChatGPT to use the pdf "Session 64 about somethin" as chat history) and that i want to continue this chat with the history as context?

@pedroavex Год назад

Hello brother! Thanks again for the video. I Have noticed one thing, i don't know if you have the explanation. I have ingested a medical book with 700 pages and ran the program in a rented CUDA 48GB VRAM machine using the Llama2 13B chat GPTQ model. I make a question about some disease that i know is explained in the book, and the first answer always starts with "According to the provided text, ...." and the answer is always correct. However, after this first answer the subsequent answers about the same disease, let's say, what are the symptoms, program seems to be responding with its own original LLM knowledge, because it doesn't say anymore "According to the provided text" and the answers are correct (doesn't seem to be hallucination) but some information from the answer are not in the book, so i am sure that is not from the book. Do you have any explanation for that?

@Data_Core_ Год назад

Try with a well-known disease with symptoms everyone is familiar with, cross-reference and try to trap it, and you'll eventually figure out if it's a hallucination.

@pedroavex Год назад

@@Data_Core_ yes I have already tried that and I cross checked inside the book. The answers are coherent but not the same written in the book. I mean, looking in Google I see that those answers are correct, but some parts are not present in the book, which means that is coming from its own knowledge base, not from the book. That's my guess.

@pedroavex Год назад

Thanks for your video my friend! One question :if i ingest documents in a certain language (portuguese, in my case), would the system handle the prompts in Portuguese? Llama2 can handle it? Thanks!

@engineerprompt Год назад

Yes to the first question, not sure about the second question.

@pedroavex Год назад

@@engineerprompt ok i will try portuguese and let you know. Thanks!

@EnKiNekiDela Год назад

Great work, thank you. Looking forward to an API. Is there any option now to use existing web UI's with LocalGPT?

@engineerprompt Год назад

Yes, there is a UI which uses the API.

@vedant4613 Год назад

Langflow or flowwise?

@chriscortner3261 Год назад

I noticed you are using a chunk_size of 1000 when splitting documents, but the instruct embeddings only work up to 512 characters. Won't your document embeddings be only based on about half of each document?

@pedroavex Год назад

Yes i was going to ask the same.

@engineerprompt Год назад

I think you are referring to the size of embeddings which is different than the input text that I can receive. So it will take a chunk of 1000 chars and convert it into a 512 dimensional space. Hope this helps.

@chriscortner3261 Год назад

@@engineerpromptHi, I wish I was. The embedding dimension is 768 and the sequence length is 512. I hope I'm wrong because I'm also using this embedding model for a project. Look at the HF MTEB leaderboard. There are also a post out there where someone asks the dev what it would take to raise the sequence length.

@loicbaconnier9150 Год назад

You are right embedding dim is 768 and context dim is 512, so only last 512 tokens of 1000 chunk will be used

@mdfarhananis8950 Год назад

@@loicbaconnier9150 i think there is a difference between character and token

@nirsarkar Год назад

This is a great video. Thanks a ton! I set up on my M2, with 24G RAM . However during inference I cannot see the GPUs being put to task. The P-CPUs are pressed instead. I have passed device_type as mps. Any hints? Thanks!

@engineerprompt Год назад

Which model type are you running?

@nirsarkar Год назад

@@engineerprompt GGML, llama-2-13-b-chat. (llama-2-13b-chat.ggmlv3.q5_K_S.bin)

@ppvvtt 9 месяцев назад

Thank you very much for your nice work! 1. Related to the order of context, history, and question in template and input_variables, is it required to be the same order? Different order can give any effects? 2. For example, I use LLM "Baichuan2-13B-Chat", but I cannot find its prompt template. Thus, what exact template should I use?

@Orwlit Год назад

Thanks for the repo! But I found running llama-2 model locally is extremely slow in my Mac Mini, and I noticed that the total time of running a query in this video is so slow too. Anyone knows why prompt eval time takes so long? 😅😅😅

@khalidbouziane4005 Год назад

Tks for the video and efforts for making it this far on local GPT project, I have question here, what about if I need to add new document to source documents files which are already embeded (ingested) and stored in chromadb. How to embed the new document and add it to the VD base without ingesting again all existing docs, in other words could we avoid ingesting previous embeded files every time we add a document and instead do the embwding for only the new doc and add it to the vectore data base.

@engineerprompt Год назад

It's possible, I will create video on how to do it.

@arjundev4908 Год назад

@@engineerprompt Sorry if my questions sounds lame. Is FAISS a vector database? I remember in your earlier videos you used FAISS which was later pickled and used if same PDF's were used for our search. that way we didn't have to vectorize the whole doc again. How different is Chromadb from FAISS?

@engineerprompt Год назад

@arjundev4908 yes, FAISS is a vector store like chroma. You can use either. I like to use FAISS if I have a smaller dataset and chroma for larger datasets. Hope this helps

@arjundev4908 Год назад

@@engineerprompt Thank you very much for your response.

@nikow1060 Год назад

I usually put new documents to the SOURCE_... folder . After running ingest the new embeddings are added to the vector store. After that I remove the new documents and put them in a ALREADY_PROCESSED folder. I am not sure ... but i believe this is exactly what you were asking for.

@timer4times2 Год назад

Is the VRAM supposed to fill up as you enter more queries? Using 7b models it starts at 6gb vram then it works itself up until it can no longer operate.

@avikhandakar Год назад

@engineerprompt Thanks for the nice tutorial. I have a problem, LLaMA-2 always confuse with context and history. Can you please help?

@mdfarhananis8950 Год назад

Please teach how to create dataset

@engineerprompt Год назад

Working on it :)

@anonded Год назад

@@engineerpromptThanks! I'm waiting for that vid too!!! 🗿📈

@mdfarhananis8950 Год назад

@@engineerprompt thanks a ton. i keep getting errors in mine

@jennilthiyam1261 9 месяцев назад

I have watched all the videos of the project localGPT. Thank you so much for all your efforts and contributions. I wanna know if are you planning to upload any videos to set up llama on my local system with memory. I am able to set up llama-2 13B on my system, but the things I can only perform one-time conversations and it does not have the memory to perform interactive conversations. And, why your code are not using any GPU? what shall I do if i wanna use GPU? It is using only CPU

@engineerprompt 9 месяцев назад

Which OS are on you? You will need to make sure that your virtual env is able to see your GPU

@nabswai Год назад

awesome

@engineerprompt Год назад

thanks

@muhammadmuzammil7592 8 месяцев назад

I am working on it but i want like chatgpt i want to create one time embedding and based on user input it reaponse according to my embedding ...the problem is that it give reaponse butit create embedding every time ...can you help me.?

@engineerprompt 8 месяцев назад

Are you using localgpt? It will create the vectorstore only once. Will not be doing that with each query

@akshayas5487 Год назад

when I run this localGPT in CPU , it takes more time to give answer to my query. Is there any alternate to do ...to make answers fast by using CPU .

@loicbaconnier9150 Год назад

Hi thanks for all. You use Chroma as db vector' database is there any reason ? why not Faiss or Qdrand ? As embedding model the new gte-large seems to be a good idea no ?

@engineerprompt Год назад

If you are running it in memory, chroma & faiss are good options. For deployment, I will suggest Qdrand.

@loicbaconnier9150 Год назад

What about gte-base ? very light embedding model 270mB

@loicbaconnier9150 Год назад

Could you make a video for using TGI from hugging face. Especially the parameters for quantization ?

@jennilthiyam1261 9 месяцев назад

Hi, can we run this in GPU and not CPU?

@ganeshkgp8807 Год назад

why its painfully slow i guess it because of model sir please tell me how to use MBZUAI/LaMini-T5-738M model

@contentfreeGPT5-py6uv Год назад

Nice

@engineerprompt Год назад

Thanks

@ppvvtt 9 месяцев назад

Hi, Also, I'm doing tests when using chat history (ConversationBufferWindowMemory) + local data retrieval + llm (Baichuan2-13B-Chat) to get answer. I have two tests. Note that, all questions are related to the local data I feed to llm. Without chat history ConversationBufferWindowMemory with k = 0, the answer is quite ok. It means, llm can gives, at least, related information from my question. With chat history ConversationBufferWindowMemory with k = 4, when I pass the next question, really "different" from previous question, but I got the same answer as the previous question. It happens several times. Anyone has the same problem with me and any solution I can fix that?

@user-qv8kx5jt6r Год назад

first time device_type cpu and next time mps?

@AGAsnow Год назад

¿Cómo podría hacer si una información no está dentro de un contexto, decir "No tengo la respuesta" en lugar de dar respuestas fuera de contexto?

@Axenide Год назад

De momento se está trabajando en eso. Hay que entender que los modelos de lenguaje no saben lo que dicen, sino que infieren sus respuestas según tus preguntas (en el caso de los modelos ajustados como chat). Imaginalo como un autocompletado con esteroides.