How vector search and semantic ranking improve your GPT prompts

Microsoft Mechanics

Подписаться 345 тыс.

Просмотров 21 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

28 сен 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 15

@CollaborationSimplified 11 месяцев назад

These are great sessions!! It really does help to better understand what's happening under the hood - well done Pablo, Jeremy and production team!

@acodersjourney 11 месяцев назад

Your videos make software dev more accessible.

@Dineshkumar-wj8so 18 дней назад

Amazing explanation!!! Must watch video before griding the documentation.

@MSFTMechanics 14 дней назад

Glad you liked it

@rjarpa 4 месяца назад

Great video document, now is easy to understand the solution stack.

@timroberts_usa 11 месяцев назад

thank you for this clarification-- much appreciated.

@TheUsamawahabkhan 11 месяцев назад

Love it. want to see llama on azure with cognitive search and also can we plug external vector database as well with CS?

@hughesadam87 9 месяцев назад

Are these AI tools available in Azure govcloud or just commercial?

@uploaderfourteen 11 месяцев назад

Jeremy - Great to see your work on this advancing so well! One of the issues still outstanding with vector or keyword based retrieval is that, by only retrieving chunks, you aren't providing the model with the deeper 'train of thought' or 'line of reasoning' that characterises your data source as a whole (the semantic horizon is limited to the chunk size). As a consequence it seems that you can't get the model to reason over the entire data source. For example, let's imagine that your data source was Moby Dick (and let's pretend this was outside the training data)... neither vector or keyword search would allow you to ask "what is the moral of the story", as this requires developing a meta-narrative concept over all possible chunks. The only way current way language models can do this is to somehow fit the whole novel in context - but even then there are issues with how attention is dispersed over the text. In time it would be great to see whether Microsoft Mechanics can innovate around this problem somehow, as being able to reason over the full non-chunked data source would unlock much more intelligent and useful insights.

@fallinginthed33p 11 месяцев назад

Maybe there could be multiple passes to combine different vector results into one large query that attempts to answer the user's question. That context window limit is a real problem. Human brains remember both tiny details selectively and the overall gist of a document.

@uploaderfourteen 11 месяцев назад

@@fallinginthed33p Agreed! I'd be interested to see how well combining vector results works. Alternatively, we know that LLMs can determine the 'gist' of a document if it's in their training data. Based on that observation, I'd like to see (a) some deep research into exactly how the model extracts that 'gist' from its training set (I'm not sure this is fully understood yet), (b) decompose that process into its fundamental steps, and then (c) try to replicate that process through a kind of pseudo-training. My hunch is that there is, somewhere, a relatively easy solution to this... the human brain seems to nail it very easily even with very little training data, so there must be a trick we're missing in respect of LLMs. I can skim read a small sized book in very short time (barely taking in the details) and then make a fairly accurate overall appraisal of its content, purpose, key message etc... LLMs should in theory be able to outclass this through some fairly straightforward mechanism as yet not understood.

@fallinginthed33p 11 месяцев назад

@@uploaderfourteen I think in a nutshell, humans are doing both training and inference every time we read. Our context window includes the current document and past documents, and each pass updates the past documents store with new data and weights. LLMs can't do that yet: each inference run is a blank slate that depends heavily on trained weights, but to update those weights through training requires a huge amount of computing power.

@pupthelovemonkey 10 месяцев назад

@@fallinginthed33p Do the re-ranking steps and human feedback not feed back into the model to update its weights? Like for example a conversation on Bing Chat where you successfully drill down into a complex answer that takes a bit of back and forth to get to a solution like a coding problem where Bing Chat was giving you a solution that has a small error / punctuation mistake.

@fallinginthed33p 10 месяцев назад

@@pupthelovemonkey It might. It's known that OpenAI uses the chat interactions on its web interface to train its models. I don't know about Microsoft though. You can already do something similar using Lora techniques on open source models. Training doesn't happen immediately, unlike with human brains. You need to get updated weights or a training dataset and then spend hours or days running a training job.