nice look under the hood paper reading walkthrough I was kinda hoping for a practical session sort of a load it and run it in a llamaindex pipeline but thanks for getting it out there you guys rock!
Very interesting! Thank you! I have a question: the output of the retrival is the page itself, isn't it? ( Or a set or ranked pages) Thank you!
2 месяца назад
It is interesting ... however despite that the young generation says in the session, using the document image instead of the text content is not at all new. This approach for Knowledge Retrieval from Document Images has been already implemented and tested internally by several companies, but not announced or discussed publicly. There are pros and cons. Among the cons, the ViT architecture and its tilling approach. Another issue is with the pre-trained multi-modal embedding model which has a "world knowledge" very generic and cannot adapt easily to very specialized illustration/text content.
2 месяца назад
Also, I would be interested to know the extra-cost of this VLM/LLM - based solution. Calling PaliGemma is not for free :)