Converting a LangChain App from OpenAI to OpenSource

Sam Witteveen

Подписаться 63 тыс.

Просмотров 16 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Наука

Опубликовано:

16 июл 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 68

@julian-fricker Год назад

This is exactly why I'm learning langchain and creating tools with data I don't care about for now. I know one day I'll flick a switch and have the ability to do all of this locally with open source tools and not worry about the security of my real data. This is the way!

@vhater2006 Год назад

Good Luck on Privacy ;)

@robxmccarthy Год назад

Thank you so much for doing all of this work! Would be really interesting to compare the larger models. If GPT-3.5-Turbo is based on a 176b parameter model, it's going to be very difficult for a 13b model to stack up. 13b models seem more appropriate for fine tuning, where the limited parameter count can be focused on specific context and domains - such as these texts and a QA structure for answering questions over the text. The example QA instructions and labels could be generated using OpenAI to ask questions over the text as in your first example. This is all very expensive and time consuming though... So I think you'd really need a real world business use case to justify the experimentation and development time required.

@Rems766 Год назад

Mate you are doing all the work I planned, for me. Thanks a lot.

@jarekmor Год назад

Unique content and format. Practical examples. Something amazing! Don't stop making new videos.

@tejaswi1995 Год назад

The video I was most waiting for on your channel 🔥

@thewimo8298 Год назад

Thank you Sam! Appreciate the guide with the non-OpenAI LLMs!

@JonathanYankovich Год назад

This might be trivial; but I’d love a video on the difference between running a notebook and running a cli vs api. All the demos use notebooks, but to make it useful we need apis and cli!

@theh1ve Год назад

I'd like to see this too. I want my model inferences running on one network machine and a GUI running on another with API calls.

@rudy9546 Год назад

Top tier content

@georgep.8478 Год назад

This is great. Please follow up on fine tuning a smaller model on the text and epub

@fv4466 Год назад

As a new comer, your discussion on the difference among models and prompt tuning is extremely helpful. Your video pins down the shortcoming of the current Retrieval-Augmented Language Modeling. It is very informative. Is any good way just to digest the html as raw? Is it always better to convert the html pages to text and following your process described in your video? Any tools do you recommend?

@clray123 Год назад

I think it will get interesting when people start tuning these open source models with QLoRa and some carefully designed task-specific datasets. If you browse through the chat-based datasets these models are pretrained with, there's a lot of crap in there, so no wonder the outputs are not amazing. I believe the jury is still out to what extent a smaller finetuned model could outperform a large general one on a highly specialized task. Although based on the benchmarks of the Guanaco model family, it seems that the raw model size also matters a lot.

@pubgkiller2903 Год назад

Biggest drawback is QLoRA will take a long time to generate the answer from Context

@DaTruAndi Год назад

Can you look into using the quantized models (GPTQ 4 bit or GGML 4.1) for example with langchain?

@reinerheiner1148 Год назад

I've really wondered how open source models would perform with langchain vs gpt 3.5 turbo so thanks for making that video. I suspected that the open source models would probably not perform as well but I did not think it would be that bad. Could you maybe provide us with a list of LLM's you tried that didnt work out, so we can cross them off our list of models to try for langchain? In any case thanks for making this notebook, it'll make it so much easier for me to mess around with open source models and langchain!

@acortis Год назад

This was very helpful! Thanks so much for doing these videos. May I suggest that you do a video on the things that are needed to fine-tune some of the LLMs having a specific goal in mind? not sure that this is something that can be done on a colab, but knowing what are the steps and the required resources might be very helpful. Thanks again!

@samwitteveenai Год назад

I will certainly make some more fine-tuning vids. anything good examples you mean by "having a specific goal in mind"?

@acortis Год назад

@@samwitteveenai I saw your video on fine-tuning with PEFT on the English quotes, and I thought the final result was a bit of a hit-and-miss. I was wondering what specific type of datasets would be needed for, say, reasoning or data extraction (a la squadv2). Overall, I have the sense that LLMs are trying to train on too much data (why in the world we are trying to get exact arithmetic is beyond me!). I think that it would be more efficient if there was a more specific model just dedicated to learning English grammar and then smaller, topic-specific, models. Just my gut feeling.

@samwitteveenai Год назад

@@acortis This is something I am working on a lot. The PEFT task was partially due to me not training it very long, it was just to give people something they could use to learn on. Reasoning is a task that normally requires bigger models etc. for few shot tasks. I am currently training models around 3B for very specific types of tasks around ReACT and PAL. I totally agree about the arithmetic etc. what I am interested in though is models that can do the PAL tasks etc. I have a video on that from about 2 months ago. I will make some more fine tuning content. I want to show QLoRA and some other cool stuff in PEFT as well.

@henkhbit5748 Год назад

Great video, love the comparison with open source. Would be Nice if u can show how to fine tune an os model, small model, with your own instruct dataset. BTW: how to add new embeddings in a currrent chroma DB? DB.sdd(....)?

@user-sg7cw9ml8w Год назад

Is it optimal to pass the user query to the retriever directly? Wouldn't asking the language model to decide what to search for (like using a tool) be better? Also, if 3 chunks in 1 doc were found, I wonder if its better to order them sequentially as they show up in the doc..

@user-bc5ry6ym2f Год назад

Thank you for such content. Is there any possibility to do the same without using only cloud-native platform and GPU? If I wanna launch smth similar on-premises with CPU?

@PleaseOpenSourceAI Год назад

Great job, but these HF models are really large - even 7B ones take more than 12Gb of memory, so can't really run them on local cuda core. I'm almost at the point of beginning to try to figure out how to use GPTQ models for these purposes). It's been a month already and seems like no one is doing it for some reason. Do you know if there is some big obvious roadblock on this path?

@darshitmehta3768 Год назад

Hello Sam, Thank you for this amazing video. I am also facing issue for open source model like the same way in video. The open source models are giving answers them self if the data is not present in the PDF or chromadb. Are you having idea how can we achieve thing like openai for open source and which model we can use for that?

@creativeuser9086 Год назад

Could you please point me to a video you’ve done abouy how the embedding model works? Specifically, I want to know how does it transform a whole chunk of data (paragraph) into 1 embedding vector (instead of multiple vectors per token)?

@bobchelios9961 Год назад

i would love some information on the RAG models you mentioned near the end

@creativeuser9086 Год назад

Fine tuning is hard. But RLHF is what takes the model to the next level and on par with the top commercial models. Wanna try to do it?

@samwitteveenai Год назад

RLHF isn't the panacea that most people make it out to be it. I have tried it for some things. I will make a video about it at some point.

@creativeuser9086 Год назад

@@samwitteveenai I guess RLHF is hard to implement and is still in research territory.

@pranjuls-dt1sp Год назад

Excellent stuff!!🔥🔥 Just curious to know, is there a way to extract unstructured information like invoice data extraction / receipt labels / medical bills info description etc. Using open source LLMs? Like using langchain + wizard/vicuna to perform such nlp tasks?

@samwitteveenai Год назад

you can try the Unstructed package or something like an open source OCR model

@cdgaeteM Год назад

Thanks, Sam; your channel is great! I have developed a couple of APIs. Gorilla seems to be very interesting. I would love to hear your opinion through a video. Best!

@samwitteveenai Год назад

Yes Gorilla does seem interesting I read the abstract a few days ago, need to go back and check it out properly. Thanks for reminding me!

@creativeuser9086 Год назад

Can you try with the falcon-40B ?

@DaTruAndi Год назад

Wouldn’t it make more sense to chunk tokenized sequences instead of the untokenized text? You don’t know the length of the tokenizations of each chunk, but maybe you should. Also handling of special sequences, like ### Assistant, would they be represented as special tokens? If so, handling them in the token space eg as additional stop tokens for the next answer may make sense.

@samwitteveenai Год назад

Yes, but honestly most the time it doesn't matter than much. The tokens way is a perfectly valid way to do it but here I was trying to keep it simple. You can use fancier ways for things like interviews. I have one project that has one set of docs that are financial interviews where I took the time to write a custom splitter for question / answer chunks and it certainly helps. Another challenge with custom open source models too are the different tokenizers. Eg. the original LLaMA models have a 32k vocab tokenizer but the fully open source ones are using 50K+ etc. So we want to make the indexes once but test them on multiple models. So in cases like this using token indexing doesn't always help too much. Often the key thing is to have a good overlap size and that should be tested

@dhruvilshah9881 Год назад

Hi, Sam. Thank you for all the videos - I have been with you from the first video. Learned so much from these tutorials. Can you create a video on Fine Tuning LLaMA/Alpaca/ VertexAI(text-bison) or any other feasible LLM for retrieval purposes? Retrieval purposes could be - 1) Asking something about the private data (in GBs/TBs) on local repository. 2) Extracting some specific information from the local data.

@samwitteveenai Год назад

Thanks for being around from the start :D. I want to get back more in to showing Fine-tuning especially now the truly open LLaMA models are out. I try to show something that people can run in Colab so probably won't do TBs of data. Do you have any suggested datasets I could use?

@user-wr4yl7tx3w Год назад

Which LLM is instruct embeddings compatible with? Is it a common standard?

@samwitteveenai Год назад

It will work with any LLM you use for conversational part. Embedding models are independent of the conversation LLM, they are for retrieval.

@vhater2006 Год назад

Hello Thank your for sharing , so if i want to use langchain and HF "Just Open" a pipelines finally it get it .why not using big models from HF on you example a 40b 65b to get "better" results ?

@samwitteveenai Год назад

mostly because people won't have the GPUs to serve them. Also HF doesn't serve most the big models for free on their API

@ygshkmr123 Год назад

Hey Sam, Do you have any idea how can reduce inference time on open-source LLM model

@samwitteveenai Год назад

Multiple GPUs, Quantization, Flash attention and other hacks. I am thinking about doing a video about this . Any particular model you are using ?

@rakeshpurohit3190 Год назад

Will this be able to give insights into the given doc like writing pattern, tone, language etc?

@samwitteveenai Год назад

It will use the those from the docs and you can set those in the prompts

@adriangabriel3219 Год назад

What dataset would you use for fine-tuning?

@samwitteveenai Год назад

Depends on the task. Mostly I use internal datasets for fine tuning.

@HimanshuSingh-ov5gw Год назад

How much time would this e5 embedding model take to embed large files or larger no. of files like 1500 text files?

@samwitteveenai Год назад

1500 isn't that large, on a decent GPU probably looking at 10s of mins max. Probably a lot shorter depending on each file length. Of course once indexed just save them to use in the future etc.

@HimanshuSingh-ov5gw Год назад

@@samwitteveenai Thanks! Btw your videos are very helpful!

@kumargaurav2170 Год назад

The kind of understanding wrt what user is exactly looking out for is currently best performed by OpenAI & PaLM APIs between all the hype.

@samwitteveenai Год назад

Totally agree. Lots of people are looking for open source models and it can work for certain uses, but GPT3/4, Palm Bison/Unicorn and Claude are the ones that work the best for this kind of thing.

@123arskas Год назад

Hey Sam, awesome work. I wanted to ask you something: 1- Suppose we've a lot of call transcripts of multiple agents 2- I want to summarize the transcripts of a month (lets say January) 3- The call transcripts can be from 5 to 600 in a month for a single agent 4- I want to use GPT-3.5 models not the other GPT models. How would I use LangChain to deal with that much amount of data using Async Programming? I want the number of Tokens and number of Requests to OpenAI API to be below the recommended level so nothing crashes. Any place where I can learn to do this sort of task?

@samwitteveenai Год назад

Take a look at the summarization vids I made and especially the map_reduce stuff so that would do lots of small summaries which you can then do summaries of summaries etc.

@123arskas Год назад

@@samwitteveenai Thank you

@user-wr4yl7tx3w Год назад

Have you tried the Falcon LLM model?

@samwitteveenai Год назад

Yes Falcon7B was the original model I wanted to make the video with but it didnt work well.

@alexdantart Год назад

Please, tell me your Collab environment... even in Collab Pro I get: OutOfMemoryError: CUDA out of memory. Tried to allocate 288.00 MiB (GPU 0; 15.77 GiB total capacity; 14.08 GiB already allocated; 100.12 MiB free; 14.41 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

@samwitteveenai Год назад

I usually use an A100. You will need the Colab Pro+ to run on Colab

@pubgkiller2903 Год назад

Thanks Sam …. It’s great. Would you please implement the same concept with Falcon ?

@samwitteveenai Год назад

I did try to do the video with Falcon7B but the outputs weren't that good at all.

@pubgkiller2903 Год назад

@@samwitteveenai one question, are these big models like Falcon, Stable Vicuña etc can work on windows laptop on Jupyter Notebook? Or they require Unix system only?

@fv4466 Год назад

@@samwitteveenai Wow! I thought it was highly praised.