Marker:Get Your PDFs Ready for RAG & LLMs|High Accuracy Open-Source Tool

Подписаться 4,3 тыс.

Просмотров 3,4 тыс.

50% 1

PDFs are essential in business, academics, and more for their consistent formatting, but extracting content can be tricky, especially with images, tables, and formulas. This is a key step in preparing text for RAG (Retrieval-Augmented Generation) applications and language models (LLMs).
In this video, we’ll show you how converting PDFs to plain text simplifies data processing for LLMs. Discover the power of Markdown in preserving information and formatting during conversion, ensuring your LLM interprets content accurately.
#ai #llm #opensourcellm #generativeai #pdfs
Blog :www.dataedgehub.com
LINKS:
Code:www.dataedgehub.com/2024/07/u...
Github Code:github.com/VikParuchuri/marker
pytorch Installation : pytorch.org/
• Advanced Function Call...
• MiniCPM-Llama3-V 2.5 -...

Хобби

Опубликовано:

31 май 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 12

@abdinegara3135 2 месяца назад

Hey man i really appreciate your video, actually you deserve a more viewers ❤

@DassS-dass 2 месяца назад

It's great 👍

@mukeshkund4465 Месяц назад

Appreciate it. How can we build RAG on top of this?? If you can make a video on that it will be very helpful.

@DataEdge01 Месяц назад

Noted thank

@isagiyoichi-mg2ds Месяц назад

Same request

@ignaciopincheira23 Месяц назад

Could you add the description of each image to the text with the aim of having a single Markdown file, similar to the original PDF? This way, it would be possible to pass a file to a language model that is readable and maintains its content.

@DataEdge01 Месяц назад

Noted!

@intellect5124 Месяц назад

Very informative video. Could you try to build a system that can run on a large number of PDFs and further convert these to .md files for an LLM to query or generate specific prompts with a UI?

@DataEdge01 Месяц назад

Noted,thanks!

@atomobianco 2 месяца назад

Details matter, you say the index is well formatted into a table but it seems to me that the Markdown displays two columns while the PDF index only had one column

@DataEdge01 2 месяца назад

The limitations were addressed in the beginning of the video