Incredible video! Can’t thank you enough for your tutorial. I’ve been trying to decide whether a vector database or knowledge graph would be the most interesting/most efficient way to scale complexity. Answer: BOTH!! Research complete. ❤🎉
This is so good, Leann! I'm just jumping into the field of Knowledge Graphs! This will be huge for RAG applications! Why did you stop publishing videos?
Hi Kris! Thanks for the encouragement :) I didn't stop posting. Instead, I'm posting videos on another channel I work for (e.g. ru-vid.comoJVRWGfqfjQ) I will try hard to post more videos about knowledge graphs and LLMs.
Great content! I am a complete beginner: I have a Neo4j db already populated, I want to "only" do the chatbot portion connected with GPT4. Would you mind guiding me on which .py I should use in this usecase? In the meantime, I am getting a "UnboundLocalError: local variable 'nodes' referenced before assignment". Not sure what to do... Thanks!!
Great video! A question I had: Are knowledge graphs good at taking a user query, such as “What happened to Sam Altman, and also when was the OpenAI board created”? Because I’ve been working with RAG and vector db for last 5 months, and when you run that query with similarity search, it sometimes doesn’t give you both topics in your retrieved documents. Is thr knowledge graph good for this or also suffers some issues, I know there’s some step back prompting ideas to cover this but wanted to know your thoughts.
Awesome question! Yes I believe so. Though it's essential to test to confirm, why I'm leaning that way: Knowledge graphs store information with paths easy to navigate. Think about this: if "date of OpenAI board created' is a property nested under the "OpenAI Board" node, which is connected to "Sam Altman" through relationships like "Fired as CEO" and "Returned as CEO". So, when LLM understands the connections between OpenAI board and Sam Altman, it's also picking up that property info, like the board's creation date. You're spot on about vector search sometimes missing the context, especially with lots of documents. That's why mixing knowledge graphs with vector search helps keep things balanced-vector search for the granular view and knowledge graphs for the big picture. I've seen the small-to-big retrieval method too but haven't tried it yet. Let me know if you experimented something similar. I hope this is clear enough to you. Was hoping to attach a screenshot with a more visual way to illustrate. However, RU-vid doesn't allow me to do so. If you'd like to further discuss on this, feel free to add me on LinkedIn (Leann Chen) or shoot an email to: leannchen86@gmail.com.
@@lckgllm Great thank you! I’m hopeful for these semi-related queries that knowledge graphs will work out. My team is currently researching them. My only concern is for non-related queries such as “who is Sam Altman, also I was wondering if I can make an appointment for the Apple Store” - in this case, the graph between Apple Store and Sam Altman won’t be connected, and the vector embedding of the entire sentence will likely miss 1 of the 2 contexts. Using an agent could be the solution, but increases latency, I’ve had some success with stepback prompting the single queue into 2 separate queries to retieve documents from vector db.
thank you for the crisp view on KG+RAG, can we create KG on multiple csv files , currently csv agents were lacking behind to answer questions based on content, they only search for matching column for the question rather content passed.
Hi Thank you for this amazing video. I have a question about KG creation on Neo4J. May I know the prompt you used to refine and cypher query generation for the results generated of openaiKG.ipynb at 5:56
Thanks for the question! The original prompt for ChatGPT-4 was: "Please convert the entities/relationships results from spacy-llm into cypher queries for neo4j knowledge graph." Let me know if you have other questions :)
Can we implement the azure open ai creds like api key, model name, endpoint, type and version in the ipynb file and run it? Also please mention the dependency libraries of the functions.py file as visual, Node, Edge, Cypher_graph are not getting initialised in VSC while running the file....
Thanks for sharing! I'm encountering this error: Error code: 429 - {'error': {'message': 'You exceeded your current quota, please check your plan and billing details"} Any ideas on how to handle/workaround rate limits for people on the free tier? Thanks!
Are you using gpt-4 under the free tier? gpt-4 isn't available in the free tier, but gpt-3.5 is supported and you can probably use that instead. Hope this helps!
Hi! Thanks for pointing that out. I just added the zeroshot.cfg to github.com/leannchen86/openai-knowledge-graph-streamlit-app You can also find the document here: spacy.io/usage/large-language-models#zero-shot-prompts
Thanks for the question! Querying from MongoDB would be an entire different video 😂, since Langchain has different wrappers around the indexing compared to Neo4j.
It heavily depends on what your MongoDB graph database looks like,@@swetharangaraj4521. I would need more information to give constructive feedback to your question. Looking into Langchain's documentation to get some ideas is highly recommended: python.langchain.com/docs/integrations/vectorstores/mongodb_atlas