I'm actually wait for phi-3 small 128 k context length for talking to the documents which are set of different docs like docx, xlsx, txt and python scripts. They are all relevant and I want to put them all in a RAG, but maybe routing will be helpful for that too. But I need a really big context for that. Or I should somehow train it. Only option I know to train on a big document set is ask another model to generate questions and later ask to answer that questions. Anyway, any of thse would be helpful.
Brilliant content! I think it is more interesting to test a model by looking at practical applications rather than asking a series of questions that could be in the training data. You should consider making a series of videos in this format.
Excellent demo of Phi-3's RAG abilities. At the same time we seek a 3 billion parameters language model that runs well on a smartphone with at least 6 GB of RAM, we will also want a speech recognition model, and a dynamic graph neural network that can merge with a vector store to provide long-term memory.
Question: Is there a local model that you would recommend for RAG? I've been building rag systems since gpt 3 (not 3.5) and I've yet to find a model that comes close to simply understand whats being asked at that given point in the conversation, extracting relevant info from stuffed context, and providing a response. I would even have gpt 3.0 (pre-chatgpt) quote the sentence from which it got its answer. My experience so far locally is that all of the moving parts outside of the local model have to be damn near 100% perfect to work correctly and even then the model will muck it up somehow every now and again, to the point its unreliable. Which models do you recommend for this specific use-case?
I personally like the zypher models if you are looking for smaller LLMs. For bigger local LLMs, llama-3 70B is good (in my use cases) and also CommmandR+.
I don't get the concept of multiple vector stores. How do they differ? Do they store different documents? Use different embedding models? Or maybe the chunking strategies are different?
In this case, each store will contain different docs. Imagine you have different knowledge bases for different departments and you want to retrieve info from the relevant department just based on the query