Could you add the description of each image to the text with the aim of having a single Markdown file, similar to the original PDF? This way, it would be possible to pass a file to a language model that is readable and maintains its content.
Very informative video. Could you try to build a system that can run on a large number of PDFs and further convert these to .md files for an LLM to query or generate specific prompts with a UI?
Details matter, you say the index is well formatted into a table but it seems to me that the Markdown displays two columns while the PDF index only had one column