How to Build Knowledge Graphs With LLMs (python tutorial)

Johannes Jolkkonen | Funktio AI

Подписаться 6 тыс.

Просмотров 73 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

28 сен 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 76

@Epistemophilos 9 месяцев назад

Great video without any annoying music, thanks! Would be great to see a from-scratch video about how you actually use this in answering user questions, combining the graph data and LLM capabillities.

@123unhooked 9 месяцев назад

Thank you so much for including also the price tag. Seeing that it is only a few cents that such proof of concepts accumulate to is really encouraging to go and try it out. Also everything else in this video was absolute gold! Really complete, really A-to-Z. Thank you so much.

@chrisogonas 5 месяцев назад

That was a great share on knowledge graphs and LLMs. Thanks for putting it together.

@masked00000 8 месяцев назад

Excellent video, thankyou for actually coding and showing the process, I was long stuck in this

@theuser810 3 месяца назад

Can we do this with LlamaIndex?

@eduugr 2 месяца назад

Hi Johanes, thank you and congratulations for the video! On your prompts, when you instruct the LLM how to extract entities, you are kind of describing a model for the data, right? (Project, Technology, Client). On more traditional methodologies for generating knowledge graphs, this looks similar to the ontology. Do you think would be correct to think of it like this?

@sgttomas 7 месяцев назад

amazing! thank you very much.

@moviecules1697 4 месяца назад

What if you have the neo4j desktop version? How do you access it from your code?

@joffreylemery6414 10 месяцев назад

Great Job Johannes ! I'm curious to discuss about the interest of going with Azur OpenAI instead of directly to OpenAI Thx, and once again, great job !

@johannesjolkkonen 10 месяцев назад

Hey Joffrey, thank you! The main reason is that on Azure, you can run the model with a dedicated and isolated endpoint, and all data that's passed to that endpoint is covered by Azure's enterprise-grade data privacy guarantees. Another thing which is important for a lot of companies is that you can choose the region in which this endpoint is hosted 🌍🌎

@mikelewis1166 10 месяцев назад

Can you point me to where Azure offers privacy guarantees to open AI users on their platform? We were considering it for our clients, but I cannot find documentation that seems to include Azure openAI under their data privacy terms,@@johannesjolkkonen

@u2b83 10 месяцев назад

Cloud platforms are currently such an annoyance because of the "essential complexity" required for them to capture service charges and maintain security. It makes me think of the dial-up model of internet access back in the day or the clumsy process of installing printer drivers, instead of plugging the thing in and selecting print.

@baltimorewarcats 9 месяцев назад

Souomolinen sisu!

@u2b83 10 месяцев назад

So this is what happens with the data from the quarterly HR forms/questionnaires lol

@luisdanielmesa 9 месяцев назад

...horrible injection vulnerability

@dinoscheidt 9 месяцев назад

It’s a simple concept video… in a Jupyter notebook… without tests or anything. Injection risks are really the far end of hurdles for public production code here.

@123unhooked 9 месяцев назад

That is just rude. Not saying that it is wrong, but criticizing it here is just so out of place. Shame on you for not respecting the quality product this guy offered.

@jgcornell 7 месяцев назад

That’s why we need to take data cleansing seriously in AI

@JordyBackes 6 месяцев назад

😂

@trinityblood5622 6 месяцев назад

This guy must be a hardcore relational database guy having no relationship's outside his primary/foreign key. Don’t put such constraints in your life bro... Remove duplicate elements from your life and travers the nodes of real world concepts. 😂

@chriseun0503 3 дня назад

Johannes - thank you for the video! what r ur thoughts on building a KG-native CRM?

@lhxperimental 10 месяцев назад

Thank you. Subscribed. There are so many AI channels that just talk how you can build this and that with LLMs and other word soup techniques, but don't actually show the process.

@PijanitsaVode 5 месяцев назад

40 mn de cuisine, no linguistics, but that's the trend. All in all, one can copy it all except - Azure subscription - entity typing and some Neo4J setup - data sources ?

@3ti65 5 месяцев назад

great video! is there a reason why you didn't have the LLM generate the cypher?

@johannesjolkkonen 5 месяцев назад

Thanks! Generating just the relationship-triplets is a simpler, less error-prone task for the LLM than generating complete Cypher with correct syntax. And because converting those triplets to Cypher is just a matter of some string-parsing, we might as well use python for that. It's always a good idea to do as much as possible with just plain old code, using LLMs just where necessary. A bit more work maybe, but a lot more reliable (:

@3ti65 5 месяцев назад

@@johannesjolkkonen I see, makes sense. Appreciate the answer! :)

@KCM25NJL 10 месяцев назад

First of all, first class presentation! I've been considering building something quite similar to utilise knowledge graphs as a method of storing long term memory for ChatGPT by proxy of function calling. The vague idea I have floating in my head, is that the relationships could be automated using the LLM at inference time with some well formatted prompts. The last part of the video where you showcase cypher generation is probably the missing piece of the puzzle for connecting the storage (Neo4J) and this is great for updating the knowledge graph. I just hope you get a chance to showcase a bi-directional example of this in your part 2, as right now I'm not strong on knowledge graph ingestion in a way that makes sense for seamless LLM output when a knowledge graph is used to supplement it.

@nikhilshingadiya7798 10 месяцев назад

Awesome man ❤❤❤🎉🎉 i love it the way you present if i have button to subscribe more i will hit it millions time

@krishnakandula6587 4 месяца назад

How are the prompt templates written, are there any guidelines to writing those ?

@Milind-eu4fc 3 месяца назад

Can I integrate the same solution with Memgraph?

@HaniehKh-v9i 5 месяцев назад

Hi. Thanks a lot for the helpful content. I have a question. When I run the ingestion_pipeline() function, I only get two entities in Neo4j. Those that have a space in their name are not covered. Could you please guide me to solve the issue?

@johannesjolkkonen 5 месяцев назад

Hey! Yeah, it's important things to ensure is that the node & relationship types and property keys don't have spaces or special characters as they are not allowed and will fail. I recall I had some sanitization already in the cypher generation to ensure this, but in any case you should be able to fix it pretty easily by running some .replace(" ", "") in your code, getting rid of spaces in the names before generating & running the cypher statements. Hope this helps!

@hy3na-xyz 10 месяцев назад

this is awesome bro just subscribed, is this the same for open source models; wanted to host using LM studio etc

@johannesjolkkonen 10 месяцев назад

Sure, it's exactly the same in principle! Of course the quality of entity extraction might vary between models, with OpenAI's models being top level. But this works fine with GPT3.5, so you could most likely get similar results with Llama 2 (:

@w_chadly 9 месяцев назад

thank you for sharing this! this is going to help so many organizations who can't afford teams of data analysts. to have this much insight into their data.... 🤯

@hassanullah1997 10 месяцев назад

Great video Johannes, thanks! Just wondering whether you could do a retrieval example of this? Would be great to see how it compares to a vector store. When you read online theres a lot saying that retrieval is slower and less efficient but not sure what the think. Would be great to get your insight with a video to explain

@johannesjolkkonen 10 месяцев назад

Hey Hassan, thank you very much! I'm going to create a video of the retrieval component very soon, and that will be out within the next couple of days (:

@hassanullah1997 10 месяцев назад

@@johannesjolkkonen good stuff. Look forward to it brother :) Do you offer consulting on stuff like this? Working on a startup where I think KG will play a role. Do you have an email I can send information to? Keep up the good work :)

@andydataguy 10 месяцев назад

@@johannesjolkkonenlet’s gooooo

@ryanslab302 9 месяцев назад

Make your text as big as possible when sharing your screen. Thank you for your video.

@johannesjolkkonen 9 месяцев назад

Noted!

@EmilioGagliardi 10 месяцев назад

excellent presentation. Love the detail and depth. Have you had to perform this on email text? I'm processing a large number of emails so the grammar and hence entities and relationships are not so clearly delineated. Was wondering if you've seen anything on performing this kind of LLM extraction on email texts or if you have any suggestions. I just started my journey into graph and its super cool, so def enjoy this content. Cheers,

@johannesjolkkonen 10 месяцев назад

Thank you Emilio! I haven't tried this myself, but it's a great idea. Extracting entities from the contents themselves might be hard for the reasons you said, but you could definitely get the sender-recipient relationships on a graph, and then use LLMs to add things like sentiment scores and email-thread summaries to those relationships. Maybe also segment email-conversations under some categories, to get a more high-level understanding of what themes people are discussing over emails. This is not so different to how graphs are already being used in social networks and content recommenders, but the LLMs definitely add more possibilities to the picture. Keep me posted if you end up doing something like this!

@kamiln8398 10 месяцев назад

on my todo list. Was looking it, many thanks !

@michalstun5187 8 месяцев назад

Great video, thanks for that! I would also be interested in data quality here. I noticed a few inconsistencies in your input data. How did LLM cope with that? How accurate is the output knowledge graph? Can you make a more detailed comparison or share the output file pls?

@kewpietonkatsu 10 месяцев назад

Very good round up. I just started to follow you. This is as useful as papers.

@JelckedeBoer 10 месяцев назад

Thanks for sharing, great content!

@Manu-m8w6m 7 месяцев назад

Is there any idea on how we will scale it? If the documents is large then entity duplication can happen in graph right? How will we solve that?

@andydataguy 10 месяцев назад

You absolute Chad! 🙏🏾

@pichirisu 9 месяцев назад

So what's the difference between drawing this on a whiteboard from statistical information and using a graph that does what you could draw on a whiteboard from statistical information?

@AshWickramasinghe 9 месяцев назад

Is it possible to use a free alternative for Neo4j?

@johannesjolkkonen 9 месяцев назад

Hey! As mentioned in the tutorial, Neo4j Aura offers your first instance for free, up to 1 million nodes. I'm not aware of a graph database that would offer free and unlimited capacity

@johny1n 5 месяцев назад

Do you think you can make it for a github repo or incoming code?

@andydataguy 10 месяцев назад

Looking forward to part 2

@vivalancsweert9913 10 месяцев назад

That was incredibly interesting and inspiring! Thank you!

@Music4ever326 9 месяцев назад

Great Video! One thing i would like to add: I think that for larger datasets it is faster / more efficient to use the import tool that comes with Neo4j Aura, instead of executing a separate query for each node / relation.

@MrDonald911 10 месяцев назад

For the second part with the chat, are you using code that basically converts natural language to cypher, then runs the cypher on the KG, returns the result, and uses that result to turn it to natural language ? That would be cool to see. Also I think I saw a similar video of yours, are you using Google's vertex AI by any chance with text-bison models ? Thanks

@johannesjolkkonen 9 месяцев назад

Yeah, that's exactly the idea. That video will be coming this Thursday or Friday! I haven't been using Vertex AI actually, so that video was probably not mine 😅

@dougclendening5896 9 месяцев назад

I'm a little confused. Why would you want to do it this way instead of using a hardcoded config of value types and their relationships? You're calling it unstructured data, but it's anything but unstructured. It has clear fields and values. So, I'm trying to understand the benefit here.

@johannesjolkkonen 9 месяцев назад

Hey Doug, that's a valid question. It's true that the markdown-files here are structured very neatly, with headings that work almost like fields. In this case, you could get something similar with just standard text-parsing, extracting values from the markdown based on the headers. However, the approach of using LLMs can be generalized to more complex situations, working with longer and messier documents (like pdfs) where the entities and relationships are more implicit, and text-parsing won't get you there. Hope that makes sense!

@dougclendening5896 9 месяцев назад

@johannesjolkkonen it does make sense. I was weighing the cost:benefit ration of using an LLM and token processing for such neatly structured data. It would be very expensive vs parsing them with a config.

@karlarsch7068 9 месяцев назад

well, thank you. that were very interesting 40mins. are you aware that your mic picks up the noise your arms make on the desk? and i dont know if "python" in a title is very smart, it scares at least me ;) nah, its probably fine, just kidding

@gatuhcreations 9 месяцев назад

you probably should not show your openai key like that

@johannesjolkkonen 9 месяцев назад

I re-generated all the credentials shown here before publishing, so these ones don't work anymore (: But it's a great point that I probably should've mentioned in the video, always rotate your creds!

@asads30 2 месяца назад

Why do you care 😂

@MegaNightdude 7 месяцев назад

😊

@jebisetitut 10 месяцев назад

Great job!

@johannesjolkkonen 10 месяцев назад

Thank you Tomaz!

@satri101 9 месяцев назад

Great video! I really learn a lot and enjoy the video. Thanks!

@Doggy_Styles_Coding 10 месяцев назад

This is awsome, thanks for the work. Your setup reminds me of Windows XP :D

@Yogic-ignition 10 месяцев назад

few questions here: 1. Can i retrieve the source documents from where the response was generated using graph knowledge? 2. How can i avoid deduplication of data, if i am planning to ingest data from multiple sources (creating a data ingest pipeline)? 3. How will i update the data present in the database which was true last week, but now it is not (like till last week my device had 3 ports, but now it has 6 ports) Thanks in advance!

@johannesjolkkonen 10 месяцев назад

Hey Mukesh, great questions! 1. Sure. You can store some metadata about the source documents (document title, link, etc.), in each node and relationship, and then include that metadata in the query results when querying the database. You could also have the documents as nodes of their own, with relationships to all nodes that originate from the document. 2. This is a key challenge in these applications. I haven't sadly done work on this myself yet, but here's a few good resources about this: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-dNGV4sLkOcA.html and margin.re/2023/06/entity-resolution-in-reagent/ A lot of people have asked about this same thing, and it'll definitely be a topic for a video soon (: 3. Assuming you have a good way to id the nodes, it's pretty easy to match the nodes by id and update their attributes with SET -statements. See here: neo4j.com/docs/getting-started/cypher-intro/updating/

@Yogic-ignition 10 месяцев назад

@@johannesjolkkonen thank you for your time and help, highly appreciate 😊 would be looking forward data deduplication and ingest pipeline video. *The notification bell icon is ON"

@aravindarjun4814 9 месяцев назад

Basicaaly am getting this error Running pipeline for 11 files in project_briefs folder Extracting entities and relationships for ./data/project_briefs\AlphaCorp AWS-Powered Sales Analytics Dashboard.md Error processing ./data/project_briefs\AlphaCorp AWS-Powered Sales Analytics Dashboard.md: Connection error. Extracting entities and relationships for ./data/project_briefs\AlphaCorp Customer Support Chatbot.md Error processing ./data/project_briefs\AlphaCorp Customer Support Chatbot.md: Connection error. While am executing the result can you help me with fixing it.

@johannesjolkkonen 9 месяцев назад

Hey Aravind! That seems like the entity extraction / LLM step is working, but there's an issue connecting to Neo4j. I would check that - Your neo4j-instance is running - Your connection url is in the correct format (neo4j+s://{your-database-id}.databases.neo4j.io:7687) - Your username and password are correct