GPT-4 Vision API: Best Way to Copy Text from Image (OCR in Python)

Подписаться 3,2 тыс.

Просмотров 14 тыс.

50% 1

OpenAI recently released the GPT Vision API allowing developers to use the amazing vision analysis capability available inside ChatGPT plus. I wanted to test the results of doing text extract from a picture of a form to see how accurate the OCR capabilities were. Also, how well it structured the data file that was outputted. As you will see in the video, the results were very impressive.
Link to project in GitHub:
github.com/AI-...

Опубликовано:

9 сен 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 48

@murilodsa 10 дней назад

Very good. It helped a lot!

@jonathanvandenberg3571 9 месяцев назад

Great stuff, and cool ideas at the end!

@maciejlegowicz5834 6 месяцев назад

Lovely. Really inspiring. With little knowledge of Python, I managed to do this. Seeing only 5.1k views in 3 months make me happy as it looks like not so many people are interested in this subject ;-) a lot of business opportunities

@aiunleashed509 6 месяцев назад

Really glad you got it going! I'm going to have a similar video on using new google gemini to do this coming up

@kevon217 9 месяцев назад

Super impressive capabilities these days.

@aiunleashed509 9 месяцев назад

It really is. Google started rolling out Gemini yesterday which will have vision capability as well...getting crazier by the day!

@anasbahttoch0913 5 месяцев назад

Thanks Man, it helped a LOT!

@vadymivanenko8591 7 месяцев назад

In my case it cannot recognize some characters. He confuses the 2 with Z, 6 with G, etc. It happens only in lines of random characters like (G12300HO). I don't even know how to teach it. I have set the temperature to zero.

@eugenl4340 8 месяцев назад

Thanks for the video!

@Great_Muzik 4 месяца назад

Great video! Please do a tutorial on how to convert scanned pdf files in a Google Drive folder to Excel using GPT-4 Vision. Thanks!

@aiunleashed509 3 месяца назад

Great suggestion! I will look to add that to future video

@micbab-vg2mu 9 месяцев назад

This API is great - thank you for the video. I wonder if it is able to recognise hand writing in diffrent languges than ENG.

@aiunleashed509 9 месяцев назад

Thanks! Generally speaking OpenAI says "The models are optimized for use in English, but many of them are robust enough to generate good results for a variety of languages." I have found in my testing that it will usually work pretty well on languages whose character set is close to English (like French for example). But if it's totally different characters like Japanese it doesn't perform as well.

@kirk7784 5 месяцев назад

I'm curious what are strategies for parsing born digital pdfs, the data is already there so it just needs to go and grab it without ocr right? How would that work?

@aiunleashed509 5 месяцев назад

Yes you are right. If the PDF already has a text layer being digital born, you can check that first. However, sometimes this text layer isn't good, and misses some text that is displayed as graphics. If you are optimizing for quality I would probably use both methods and let the AI structure from both datasets. This will cost much more in API credits though....

@uplifthabesha754 2 месяца назад

'message': 'The model `gpt-4-vision-preview` has been deprecated

@aiunleashed509 2 месяца назад

you can just use gpt-4o now, its multimodal including vision. Check my channel I did a video on it.

@benjaminsaladin1345 6 месяцев назад

I had the exact same idea today, especially with the functions calling. Did you manage to get that to work? Cool video, btw🎉

@aiunleashed509 6 месяцев назад

I haven't had a chance but will be doing more auto classification on a new project I am working on so will do some followup videos soon

@DuneKraftwerk 7 месяцев назад

Nice video btw. Sorry i do not share your excitement as I tried with more beefy images, like engineering drawings like P&ID, Location or connection diagram. I am able to get some information when I zoom the images to a certain level, but full scale GPT tell me to use a OCR software. Also, you cannot get the bonding rectangles of your text for further processing with html and css. So I will stick with Google Vision API for now to do this, i guess it is less expensive than GPT anyway offering a free tier of 1000 images/month and much faster.

@aiunleashed509 7 месяцев назад

Fair enough, my experience has been more on automating corporate document processing. I had good luck with those type of docs like invoices and forms. It seems from the comments it falls apart quickly with more complicated use case like you describe. But remember this API is still in preview, and will get much better from here

@i.am.rossalex 7 месяцев назад

If an image is not good quality, but readable for human, it can recognize a text with mistakes. Tested on estimates that was sent via Whatsapp.

@aiunleashed509 7 месяцев назад

Yes that's a great use case. It's also nice in that it can automatically correct spelling and format from the estimate. Lets say its handwritten and the estimator spelled a product wrong, it can usually detect and fix that leading to better data and analytics

@i.am.rossalex 7 месяцев назад

@@aiunleashed509 but it made mistakes in digits, not in words. So it will be difficult to fix.

@pabloenzozanitti3411 6 месяцев назад

Great video. Quality appears to have degraded heavily since this video. Sometimes it outright refuses to scan images containing names as they're personal information

@aiunleashed509 6 месяцев назад

Thanks for watching. I agree, and have had many comments that the quality has degraded since initial release which is disappointing (it's supposed to get better over time right). That is definitely the advantage of using a local open source model for this purpose. I've got my eye on this for future videos

@tommydavies2426 7 месяцев назад

Hi there, I have tried to implement something similar and I get the response saying things like this: I'm sorry, but I am unable to access external links or view images, so I cannot analyze the image or read any text from it. My capabilities are limited to processing and generating text-based information. If you can provide the text from the image, I'd be happy to help analyze or discuss it with you. However, if I use GPT4 chat window as normal, upload my invoice, it can read it no problem. Have you came across this?

@aiunleashed509 7 месяцев назад

Hi, I haven't seen that myself, but haven't used the vision API in a few weeks. I heard a new version is coming soon, so maybe some degradation on this preview. Could you send me a link to one of the test images and I will give it a try

@pabloenzozanitti3411 6 месяцев назад

Make sure you have set the vision model instead of a regular gpt model

@remisanlis5344 2 месяца назад

Hi, very interesting video ! Watching from Paris, France ! I'm currently developing a solution based on that, is it possible to talk together briefly on messenger or something else ?

@aiunleashed509 2 месяца назад

Hey, would love to chat email me: int.unleashed@gmail.com

@glowmarkdesigns 7 месяцев назад

Hey! I have an excellent use for this to help small business but have no idea how to make it work. Could we talk about it to see if it's something that could be done?

@aiunleashed509 7 месяцев назад

For sure, just email: int.unleashed@gmail.com and we can chat

@kurniadrajat4267 7 месяцев назад

this is free api from chatgpt? or pay? , i cant run this program

@aiunleashed509 7 месяцев назад

This is a pay per usage API. because it deals with images which are more token intensive it is more expensive than the standard

@SunilSamson-w2l 2 месяца назад

Does it also work with PDF files instead of .img ?

@aiunleashed509 2 месяца назад

it says: We currently support PNG (.png), JPEG (.jpeg and .jpg), WEBP (.webp), and non-animated GIF (.gif). So no you would have to convert it first

@SunilSamson-w2l 2 месяца назад

@@aiunleashed509 thank you !

@adammessier4431 8 месяцев назад

Hi, is it necessary to pay to get access to this API.

@aiunleashed509 8 месяцев назад

Yes it is billed like the other APIs. Check out the details here under GPT-4 Turbo -> gpt-4-1106-vision-preview. openai.com/pricing It even has a pricing calculator where you put in the resolution of the image and it says how much it would cost to analyze

@pabloenzozanitti3411 6 месяцев назад

It is. You need credits in your OpenAI account, but free credits won't do, you need to fund at least $5 to enable this model

@abhishekgaikwad6105 6 месяцев назад

Will it work for bad hand written text ??

@aiunleashed509 6 месяцев назад

In my testing it did pretty well, but of course their is limits. I am actually working on testing out the different Vision APIs with handwriting and should have a video about it soon