In this video, we are demonstrating the ingestion, chunking, overlapping, embedding and vectorization process of documents in Azure AI Studio Prompt Flows using an Azure AI Search as the vectorization database.
Thanks @LinoTV for this great tutorial. The default number of tokens for chunking is 1024 tokens. Do you have any idea on how to change this value for example to 256 tokens?
I noticed you had to wait for some time for the indexing to finish. Do you recommend indexing the documents (separately) on Azure Search Service-which should be faster-and then connecting the index? Thanks!
You can, if you would like. I found the speed sporadic based on the state of things in Azure at the time. There is no guarantee that it will be faster executing the index directly.
I found that based on documentation from the OpenAI assistant that the limit is 20 files and each can not be more than 512 MB. By testing I could not upload a file much smaller than that. So I think it is a work in progress.