Create Your Own ChatGPT with PDF Data in 5 Minutes (LangChain Tutorial)

Liam Ottley

Подписаться 251 тыс.

Просмотров 222 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

28 сен 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 313

@LiamOttley Год назад

Leave your questions below! 😎 📚 My Free Skool Community: bit.ly/3uRIRB3 🤝 Work With Me: www.morningside.ai/ 📈 My AI Agency Accelerator: bit.ly/3wxLubP

@stefano94103 Год назад

Excellent! Thank you for your hard work to put these together.

@LiamOttley Год назад

My pleasure! Thanks for watching

@AlbyTheMovieCreator Год назад

This video was copied from the beginning to the end from the channel Prompt Engineering

@stefano94103 Год назад

@@AlbyTheMovieCreator Oh wow I totally didn't know that. Thanks for the heads up! SMH😒

@SedhuujGorem 8 месяцев назад

The Best tool for this is ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-bcK7LldB3dk.html I like some of the transitions, but sometimes they're a bit too much and are seemingly random. Since we use these persistent elements that transition across pages to indicate some kind of relationship between the previous and the next states, some of your transitions confuse me because I can't immediately see what the relationship is. For example 1:23 of the selectable tiles (which weren't selected) transition into being two switches... does that mean anything? are they related in some way? I see this as random and a bad use of the design language. However, at 3:14 I like the transition from switches to the ticks on a paper, that makes sense to me. Epic presentation tho

@AmitKumar-ct8df Год назад

couple of things observed. 1. Its not free because integration with openAI is required 2. It is too slow. For two page PDF it takes somewhere around 10-20 seconds to respond when I am on a 48GPU machine

@shanamin1561 Год назад

Is there any way I can test it for free? I used a PDF with only one page and it says "You exceeded your current quota, please check your plan and billing details"

@TheAmit4sun Год назад

@@shanamin1561 You can not its entire smartness is dependent of open AIs embeddings and its not free.

@featherly4267 Год назад

Brother can you make video on how to use autogpt for beginners 😊

@MichielVermandel Год назад

Thanks for the great video! One question: which OpenAI model is used to retrieve the answer? Is it gpt-35-turbo or ada or...? Where is it defined?

@tibz11c 5 месяцев назад

Great Work! Can we do this with a local or a smaller language model ?

@maxdranitsa 6 месяцев назад

Liam, if there an option to make the assistant always use the data that has been uploaded to knowledgebase? It doesn't read the KB files every time and uses the links that even doesn't exist

@harshavardhan7097 Год назад

Can i do it on jupyter notebook rather then using colab

@JJBoi8708 Год назад

What is a good way to split text in a textbook pdf because on one page it has 2 columns, text on the left and right side?

@mr.pantherpanther1013 9 месяцев назад

Hey Liam @ 03.22 you said we can upload pdf data by entering the pdf name. But what if we have more pdf, life for example I have 5 pdf?

@timtensor6994 11 месяцев назад

great tutorial , can it be modified to support multiple pdfs ?

@yiyuanzhang6335 Год назад

will need a video on how to do this for multiple pdfs

@ayanbahukhandi1869 Год назад

Can I do it with multiple PDFs? like for each pdf I'll just chunk every page?

@ahmedmiftah8308 Год назад

Merge pdfs

@audacityhour3104 Месяц назад

@@ahmedmiftah8308what’s the point in merging the pdfs if the chunks is going to break them up anyways Each section should be able to be another pdf which makes since anyways

@jitenbhalavat5738 8 месяцев назад

What if we have multiple PDfs and we want to fetch the Answer from that pdf ? like for an example : I have 20 Pdfs, and if I ask one question then it should fetch the answer from any one of the Pdf (correct obviously) and show me as a output.

@minhe9008 Год назад

great tutorial! I have hundreds of research papers in pdf format. Can I use this approach to build a vector db and then chat with chatgpt? Is there a limit to the size of db? any pitfall to avoid?thanks!

@monicadesai7928 5 месяцев назад

Can you share video on voice based search in pdf Document.

@omountassir Год назад

Freaking Great Content! Keep Rocking 💯

@tspang1977 Год назад

Hi Liam, great video. I do have a question, from the following code, i notice that we don't have to specifically turn the "query" into embeddings, before it performs a search against the vector db? Is it because the function "similiary_serach" internally calls the openapi embedding to perform words embeddings? query = "Who created transformers?" docs = db.similarity_search(query)

@InsightConsulting-w6i Год назад

This is great Liam, thank you for sharing, what's the simple automated way to deploy this code to a basic online application/chat page

@ElijahTheProfit1 Год назад

Hey Liam, How many PDFs can I use this on? I have 1000+ instructional documents on an information system I use and have been trying to create a chatbot with this database embedded for quick question answering. Would i have to combine all the pdfs? can i put them all through vectorization? What are your thoughts?

@LiamOttley Год назад

Getting lots of qs like this, will make a video on chatting over many PDFs this week 👍🏻

@sammiller9855 Год назад

@@LiamOttley It would also be cool if it could handle other files types such as epub , doc and markdown files.

@neoszhane Год назад

Liam is there a way to use langchain with open source models such as Vicuna?

@ElijahTheProfit1 Год назад

@@LiamOttley Just subbed and turned on notifications!! Love that your responsive and videos are great man!

@quangdinhdota2388 Год назад

@@LiamOttley waiting your next video.

@zoumanakeita8016 Год назад

Straightforward and concise! Great explanation. How do you extract the exact page number where the answer was found?

@GALTechEnterprises-m7c Год назад

Thank you, keep going.

@luigiseven Год назад

Awesome work

@we-hb4ni Год назад

Is there a limit to the number of PDF chunks you can add to the vector DB?

@LiamOttley Год назад

Not necessarily, if you cram it full of thousands of chunks I'd assume the recall just gets slower and slower and uses more resources on your system. Best to setup different indexes for different information or use namespaces (Pinecone feature)

@Iatalksbrasil Год назад

great video! help me to complete me knowlege about best praticies in prompt!

@sayamkhan4209 Год назад

Where does he describe the model to use for output? Is he using Da Vinci 003?

@vrynstudios Год назад

i am noob here. Is it possible to embed it on a site? If I embed, is it standalone? or still it uses GPT API calls and costs?

@studentjntuh 3 месяца назад

NameError Traceback (most recent call last) in () 1 # Get embedding model ----> 2 embeddings = OpenAIEmbeddings() 3 4 # Create vector database 5 db = FAISS.from_documents(chunks, embeddings) NameError: name 'OpenAIEmbeddings' is not defined

@nobody_there_ Год назад

Can it run on a macbook air ?

@johnjoesafatso Год назад

Amazing content. Thank you! Is there a way to do this with PDFs that have graphics and images?

@suriyakrishnan5177 Год назад

Can you explain this same example using expressJS? Coz no other tutorial hasn't used expressJS to illustrate this example

@DarkGourmand 3 месяца назад

I need help on this topic!

@ameynaik2743 Год назад

Is there an alternative to open Ai embedding engine which is competitive and free?

@vipanchika5059 Год назад

You point out that nothing is impossible if we would have been taken a systematic process

@sayamkhan4209 Год назад

How can i add multiple documents to the same one embedding?

@opticchill2706 8 месяцев назад

Im sorry to ask but can you make a video on putting a chatbot into a website?

@julesclarke6140 5 месяцев назад

Is this method safe from private documents?

@saurabhagarwal9253 Год назад

that was very helpful. how can I add more PDFs to the knowledge base?

@katemariageorge7396 6 месяцев назад

were you able to do it?

@besmart2350 10 месяцев назад

how many pdf's can I upload max? Is it possible to have a gpt that gives responses based on the information from 200+ uploaded pdf's

@AlirezaMirhabibi 5 месяцев назад

Interesting, Is this works on Persian language ?

@kingarthur0407 Год назад

I've written a prompt for GPT-4 that I use with chatGPT in Macromancy formatting to transform it into a legal assistant, and the results have been stellar. Is it possible to encode this prompt into the system you describe so that the bot operates with it in mind?

@kingarthur0407 Год назад

bump

@Aaronius_Maximus Год назад

I've been watching a lot of your videos and they are very helpful but man you gotta stop banging your arms on the table lol - Might I suggest a mic that hangs from the ceiling? Thanks for the content regardless!

@hebaahmad-e6l Год назад

please where i can find the all code

@AllexFerreira Год назад

how canI add more PDFs?

@ahmadmoner Год назад

what about asking it questions outside of the pdf? is it possible to make it says "i dont know" instead of using gpt LLM?

@LiamOttley Год назад

This requires an intent classifier, quite a lot more complicated but I can make a video on it if you'd like!

@ahmadmoner Год назад

@Liam Ottley yes please, Would be fantastic. Much appreciate it

@adeeluet Год назад

Hi Liam, I am getting FileNotFoundError when running the textract.process command. I have the pdf file in the same project folder as my .ipynb file. I am using visual studio code

@LiamOttley Год назад

are you on windows? you may need to copy the path of the document (right click in file explorer)

@johnpotter26 Год назад

Same - I think its an issue with pdftotext

@douglasbushong3920 Год назад

I've made it a rule that if I hear anything like, "what I'm about to show you" more than once in a video, I turn it off.

@SreeragSanilkumar Год назад

WARNING:langchain.embeddings.openai:Retrying langchain.embeddings.openai.embed_with_retry.._embed_with_retry in 4.0 seconds as it raised RateLimitError: You exceeded your current quota, please check your plan and billing details.. This pops up while I'm creating the vector db

@harishmohanan3454 Год назад

I am experiencing the same problem. Can you help me out.

@rishabpoddar3866 Год назад

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. yfinance 0.2.18 requires beautifulsoup4>=4.11.1, but you have beautifulsoup4 4.8.2 which is incompatible.

@1Esteband Год назад

Thank you it worked perfectly despite generating an error on the pip install. ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. yfinance 0.2.18 requires beautifulsoup4>=4.11.1, but you have beautifulsoup4 4.8.2 which is incompatible.

@Finalform77 Год назад

Hi Liam, I am getting 'authentication Error' when running 2. section of the code "Embed text and store embeddings" . I have not change anything yet just running it as is. Any suggestion?

@crackerhax 4 месяца назад

This is not a 5 minute job for almost any pdf because many would contain table data or multiple columns. If you want it done in 5 minutes its going to have to be text-only. So your video title is deceptive. And this will not work well.

@noteniceu Год назад

Can you feed it multiple pdf at the same time like a group of 300 or would you have to run each line individually.

@GiovaDuarte Год назад

This is great. Is it possible to retrieve images from the PDF? I have a PDF with many graphics that help understand the content. Do you have any ideas as to how I can provide images as part of the conversation?

@MCroppered Год назад

What type of graphics are you talking about?

@quinnherden Год назад

You could leverage Lang Chain's agent feature set to use computer vision to analyze your images.

@GiovaDuarte Год назад

@@MCroppered the PDF I have has images embedded and I was wondering if how I could recall these during a conversation

@GiovaDuarte Год назад

@@quinnherden I will research this. Thanks!

@gaben7 Год назад

@@GiovaDuarte if you figure out how to bring images along with the conversation, let us know how please

@ShriyaShah-tk3nr 3 месяца назад

I'm facing this issue: module 'openai' has no attribute 'error'

@escoladetecnologia Год назад

Thanks, but it would be interesting to do without the OpenAI API, as it is paid and it would be very expensive for large projects to be analyzing PDFs with it, it could be another hugging face model, I am trying to do something in this direction, if you have any ideas let us know let me know!

@utsabkundu27 Год назад

Yeah i am also trying to build something like that without using the OpenAI API.

@suhaglal6526 Год назад

Me too

@nihonkeizaishinbun2254 Год назад

Do you found something interesting ? For large pdfs project the price is very important

@misalambasta 6 месяцев назад

Anyone found something?

@aradinac Год назад

can i ask whether you paod for the OPENAI KEY OR YOU DID IT WITH THE FREE TRIAL? Cuz am encountering this error RateLimitError: You exceeded your current quota, please check your plan and billing details.

@bhagyasripottumuthu2215 Год назад

Were you able to get around the error ?

@LordPBA Год назад

there is a little error, in the embedding section OpenAIembeddings is not defined, if I am not wrong just add a line -> import openai (I wonder on 115.760 views how many has really done the tutorial XD )

@Miya-ub5qn Год назад

Thank you very much for this great video!!! One question. On the part of Create chat bot with chat memory (OPTIONAL), I received the following message "DeprecationWarning: on_submit is deprecated. Instead, set the .continuous_update attribute to False and observe the value changing with: mywidget.observe(callback, 'value'). input_box.on_submit(on_submit)" Why? Would you be able to fix it?

@ranjitherusa7139 Год назад

I am having same issue Is the optional segment should be on same py program?

@virajraina9803 7 месяцев назад

It show error in the line for FAISS it says attribute openai has no attributes error

@xsaber69x 7 месяцев назад

Same D:

@AliRezaei-cx8kk 5 месяцев назад

@@xsaber69x @virajraina9803 any solution guys?

@joma4284 Год назад

A beginner's question, if everything happens locally, why do we need an OpenAI API Key?

@borisbadinoff1291 Год назад

👏👏 Hey Liam, your five-minute tutorial is fantastic! Kudos and thanks for putting the effort to produce it. Your app is exactly what any knowledge worker is craving for: We all have gigabytes of pdf files in some folder named "READ", "TO READ" or "__TO READ" (so it stays on top of the root :), but never get to it (probably distracted by all these tutorials to become more productive we love to watch). A bot that can read that stuff for us, so we can continue to wing it is a true godsend. :D

@SimonStJohn Год назад

Hey Liam! Awesome...could you do one that scrapes data from blog/website for embedded chatbot for a blog?

@JennyManEngineer Год назад

Nice tutorial! I learned a lot from it. I used the learnings and added my own spin (using a data sync tool to pull from custom knowledge base and integrating with Pinecone instead of feeding input from a static doc), wondering what you think: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-qyUmVW88L_A.html Anyway, thanks for making an awesome informative video!

@studentjntuh 3 месяца назад

@xViperCodes Месяц назад

I am creating a chatbot to help employees, I have a 220 page contract pdf that I need my chatbot to be able to answer questions about accurately. The issue is, fine-tuning with the data doesn’t produce accurate outputs. Would this be a good way to achieve this?

@quangdinhdota2388 Год назад

Amazing Video. I have a question: Can Your notebook (code) run with muti file pdf?

@jaimeat Год назад

How can I implement this chatbot with my custom Db with a longterm memory and access from the phone ?? Any guide?

@jaimeat Год назад

This would be very interesting

@shtookaralph5205 27 дней назад

Thanks for the awesome video Liam, do you need a GPU to do such a thing? it looks like a GPT license/subscription is required as well ?

@aipy5147 Год назад

Great video! I was wondering why is it a private chatbot when you're using openAI key and sending the information to LLM GPT-3.5? How can you secure sensitive data with your method? Thank you sharing your knowledge.

@lubeckable 9 месяцев назад

Using and hosting by yourself a custom open source LLM like llama or mistral

@xsaber69x 7 месяцев назад

guys i have a error in the line: "db = FAISS.from_documents(chunks, embeddings)", any idea to fix it?

@willyjauregui6541 3 месяца назад

Out of complete ignorance, is Langchaining the best method currently available to increase the perform of our LLMs Chatbots? If not, what is it or what other methods are out there that I may be missing. Thanks for answering.

@e-niche Год назад

What is the point of doing this, you can just put the data in a MYSQL and write a PHP code to search it. This is overcomplicating task just so you can say: hey guys, look what I made in AI? Completely useless.

@jd2161 Год назад

These are lazy videos... show an example. Do you want people to buy? Show solutions not features. Sales rules. Show how this solves issues. I just bought activepieces but this video doesn't show me why i should add taskmagic

@flyinonminds6415 Год назад

great video!, it is possible to add more than 1 pdf with that code ?, will be possible to provide a code for multiple pdf ? thank you

@themotivationhub1355 Год назад

Yo I’ve made plugins but don’t know how to test it so can you give some ideas .(I don’t have access to the plugins yet.I’m in the waitlist)

@TheSavageGoose 10 месяцев назад

How can I pass in several files at once? I don't just want to use one PDF.

@mohsenghafari7652 6 месяцев назад

hi. please help me. how to create custom model from many pdfs in Persian language? tank you.

@iainhmunro 9 месяцев назад

Many that, but it is failing - AttributeError: module 'openai' has no attribute 'error'

@karlaschmidtke7727 5 месяцев назад

How long would it take with 50 long 100 page pdf? Which cpu would be needed?

@nobody_there_ Год назад

I'm looking for someone that could install all this for me. Anyone ? No body ? Cheers.

@prammar1951 6 месяцев назад

Dude you can't keep code with errors up on your youtube channel, just update it

@zikwin Год назад

It is possible to create this knowledge based with local llm model like vicuna, llma etc? ... hmm

@tecnobyte23 Год назад

Good question! I know that LangChain integrates with gpt4all, but so far I haven't found any videos or articles talking about it.

@marcosemeria97 Год назад

Can you suggest alternatives to OpenAI in terms of embeddings and llm? They are too expensive their APIs

@ThackerayAudrey-j5g 8 дней назад

Martinez Cynthia Davis Matthew Jones Shirley

@audacityhour3104 Месяц назад

I appreciate the tutorial.. the lack of dark mode is total eye torture

@ronan815 7 месяцев назад

Hi, sorry, there is an issue in colab, first script: ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. pydrive2 1.6.3 requires six>=1.13.0, but you have six 1.12.0 which is incompatible. yfinance 0.2.36 requires beautifulsoup4>=4.11.1, but you have beautifulsoup4 4.8.2 which is incompatible.. By the way, do you plan to make an adaptation for Mistral AI?

@TheUselessgeneration Год назад

I cant wait until we can expand this to all documents. I assume that is what Microsoft 365 Copilot will do.

@denizkapteina2151 Год назад

Thanks for the super video. I have a question: in the overview you show that ChatGPT3.5 is used, or that the query is last processed by 3.5. But in the code I can't find any reference to it. Where is my mistake?

@LiamOttley Год назад

The default LLM for Langchains "OpenAI()" is text-davinci-003 and "ChatOpenAI()" is gpt-3.5-turbo I believe

@Skandawin78 3 месяца назад

getting error while creating vector db - module 'openai' has no attribute 'error'

@karanpatel3447 3 месяца назад

same

@nischalvasisth3153 11 месяцев назад

Problem with this approach I find out that Your Bot gonna reply on irrelvant question "Ask who is spiderman". So even after providing a prompt which clearly Instruct system Not to response If current context doesn't hold knowledge LLM gonna reply. How to handle this scenario?

@naturallydope247 Год назад

This was definitely one of your better videos. You explained Langchain well and I’m glad you used the colab notebook instead of Jupyter or repl.

@joepbaks Год назад

Thanks a lot man, been trying to get this to work via other ways for days. This was so easy, great tutorial. How would you transfer something like this to a user friendly ux/ui?

@chrispac6264 Год назад

Ask ChatGPT4

@GregBreak 3 месяца назад

How can be "my own.." if you call API??

@Kira-lh3tm Год назад

dumb question, total noob: how do i compile the code?

@ryanjames3907 Год назад

thank you for time, effort and generosity, I wish very good things for you.

@siddhantmohanty1578 7 месяцев назад

Hi, THANK YOU for sharing your knowledge. Could please let me know how many PDF can we train using this technique and does this LLM remember what PDFs it has been trained on or do we have to train the LLM at before running the query?

@ikjb8561 Год назад

Can you use .txt files instead of PDFs? Great video and content. Thanks

@noteniceu Год назад

Good question

@creneemugo94 Год назад

Here’s how to do it with text files: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-c0YDDSWr3t0.html

@mastamindchaan387 Год назад

I have no idea what “chunks” are or how to count them. I didn't understand where and how I have to rewrite the code to let the Programm find one or more of my PDFs. Is it the path or just the file name? What if I have more than one PDF? Should I convert the PDFs into a chunk file or just split them and keep it as a PDF!? But how I insert multiple PDFs into the Code wasn't shown. Sorry but If you do a tutorial and assume people know how to do it, then you don't need to do it at all.