Extract Tables from PDF and convert to Excel sheet with Paddle OCR text detection and recognition.

Neuralearn

Подписаться 6 тыс.

Просмотров 48 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

4 окт 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 261

@JujutsuMan 10 месяцев назад

Impressive content for Deep Learning OCR! Many thanks!

@neuralearn 10 месяцев назад

You're welcome :)

@BailinCAI 4 месяца назад

impressive, struggling right now for my little side project using ocr, u helped a lot man, appreciate it

@AmanChauhan-hr1wh 4 месяца назад

hii does this notebook working for you actually for me it's not can u please help

@BailinCAI 4 месяца назад

@@AmanChauhan-hr1wh well i just use his method, not totally copy from him, the result i implemented by myself is not really 100% correct so i end it up by using the azure api, it's really 100% correct and the speed of processing is so fast as well

@aerocyropyros 2 месяца назад

Thanks lad, it gave me some ideas how to apply it paddleOCR in my research mate

@vishaldas6346 Год назад

Hi, you have done a phenomenol job, by explaining PaddleOCR in detail. Can you please let me know if we can do the training of PaddleOCR on custom datasets for extracting data from tables of different length in pdfs or images.

@christianrazvan Год назад

Well that is a very simple and readable table, it's easy enough to do it with basic if logic....but try a no border , very near to border content , on a scanned image of a table

@niroshiniedayaratne4066 2 года назад

Thank you very much for this! Very insightful!

@neuralearn 2 года назад

Glad it was helpful :)

@Jean-nf1yh 5 месяцев назад

Broo, this is awesome, thank you very much!!!

@neuralearn 5 месяцев назад

You're welcome :)

@toto2321 Год назад

thank you man the best who explain what it is actually happening thank you so much

@neuralearn Год назад

You're welcome:)

@mohamedmagdy3872 Год назад

brilliant work!!, I would like to thank you for giving me access to notebook. keep going broo 💙💙

@neuralearn Год назад

My Pleasure :) Feel free to check out on our other videos

@alirezaghasrimanesh2431 9 месяцев назад

Thanks for yor great toturial!!!!

@ajithn7336 6 месяцев назад

Hello neuralearn, thanks for your great tutorial. Could you please proivide notebook access

@leonc5510 Год назад

Thank you for the tutorial, I have requested the notebook access

@neuralearn Год назад

Please check your mail :)

@puneetbansal8567 Год назад

Hi, Neuralearn, Thanks for creating great tutorial. Its very useful. Can you please provide notebook access ?

@HueoanThi-nv6ei 2 дня назад

I have problem with layout parser. It seems due to a conflict between paddleocr and protobuf. What should I do to fix it? Thank

@brmaaouia9715 7 месяцев назад

What if it does detect the table as table but as figure or text ?

@manoubilahbib2572 2 месяца назад

That was awesome, thanks

@ShivamSingh-sm2oy 6 месяцев назад

Hey, Thanks for the wonderful tutorial man! can you please provide access to the notebook please.

@aishwaryachowta6598 Год назад

Thank you for the tutorial !!!

@neuralearn Год назад

The pleasure is ours :)

@malakkhiari1419 Год назад

how i can fix this error "ImportError: libcudart.so.10.2: cannot open shared object file: No such file or directory" ? caused by the line of code "import layoutparser as lp"

@bharattyagi1405 Месяц назад

Hi, could you please provide access to the collab notebook.

@emailvarun Год назад

Hi Thank you for this, can youj please help me with the notebook access please, also can you please help me understand will I be able to cover most of the table formats through this?

@RohitSharma-to7yy Год назад

Hi. The content is very impressive. Would love to see the notebook and add upon this to create table in google docs instead. Please share the notebook

@_keto444 2 месяца назад

40:40 i got the following error: TypeError: int() argument must be a string, a bytes-like object or a real number, not 'list' how can i solve it?

@robertdolovcak9860 10 месяцев назад

This is first that I hear about PaddleOCR. Seems very good tool. I really appreciate the work you have done and would also want to try this. Can you please allow access to the google collab code for this?

@neuralearn 10 месяцев назад

Hello my dear Robert Please check your mail

@rrrsranjith 2 года назад

Excellent video 🔥

@neuralearn 2 года назад

Glad you loved it :)

@kenjeroldarellano4617 Год назад

Hi, Neuralearn, Thanks for creating a very useful tutorial. Can you please provide notebook access for my study?

@PadmajaPhadke-e1l 6 месяцев назад

I want convert CSV file into Json file, { field 1: {col1:text, col2:text, col3:text},{field2:{col1:text,col2:text, col3:text} in this format. Can you please help me to create this Json file. Thank You

@quocvuong6752 Год назад

Thank you so much, I really appreciate the informative video. Could you please allow access to google collab? It would be super helpful.

@neuralearn Год назад

Hello my dear Quốc, please check your mail :)

@siddharthpatel2193 Год назад

Can I get code? I followed video and wrote code and everything is working but due to some issue, out_array at end is same value. Update: Solved Thanks, this is best tutorial on this topic (saying this after going through countless tutorials, research papers and blogs in past 3 months).

@neuralearn Год назад

@ZaheerH4ck3r Год назад

Bro you're doing good work

@neuralearn Год назад

Thanks for the kind words :)

@ZaheerH4ck3r Год назад

@@neuralearn I have a question I've pdf file which is 560 pages long and which has data that other libraries do convert into excel file but its like garbage. If I use this model i'll be able to convert it?

@neuralearn Год назад

I think you should just go ahead and try. Its free :)

@PurushothamReddy-ff6vp 5 месяцев назад

Hello, I'm facing trouble when there are multiple lines within the same row, it is considering them as new rows.. how do i fix this?. Thank you!

@moez.mazhar Год назад

Hi, I've followed your procedure as is but I'm getting "ValueError: Can't convert Python sequence with mixed types to Tensor." on the Non-Max Suppression portion. Can you tell me what might be causing that please?

@pokopiko429 Год назад

Congrats, one of the best videos I've seen on this topic! Could you please grant me access to the Google Collab?

@neuralearn Год назад

Please after requesting access, check your mail inbox or spam

@EarningsApps Год назад

pls give access to notebook ...great and informative tutorial !!

@neuralearn Год назад

Please check your mail :)

@pavitrabiradar-h3p Год назад

Hello , Thanks for sharing this vedio, is this method will work for nested tables?

@statosys 7 месяцев назад

Request access for colab notebook, thank you so much.

@henrydo9731 11 месяцев назад

I have a question that if I have a table but it's in 2 pages (half of it is in 1st page and the other is in 2nd page), how could I solve this problem

@NileshKumar-ug1hl Год назад

Hi, Can you please provide the notebook access?

@francescodimartino8896 11 месяцев назад

Amazing job! Could you please share with me the google Collab? 🙏

@cissemy Год назад

Great. Is it possible to use this model for matrix recognition ? how many rows and columns, elements of matrix ?

@mkthedawn Год назад

Awesome 👍👍👍

@neuralearn Год назад

Thanks 🤗

@adillaanam4058 Год назад

hi! tysm for the video. would you pls allow access to the notebook? ty!!

@balasubramaniyang6506 Год назад

Hi Nice Explanation, Can you provide access.It's very helpfull for us.

@khushibaghel220 9 месяцев назад

Hey! I want to try out your tutorial. Could you please give access of your notebook

@neuralearn 9 месяцев назад

Hello check your mail :)

@nealgilmore337 Год назад

Hello @neuralearn - love the demo! Can you provide me access to the Colab?

@neuralearn Год назад

Done!

@harshithprakash2433 Год назад

Awesome video and interesting approach towards the problem , would you mind giving me access to that notebook..?

@neuralearn Год назад

Hello my dear Harshith, please check your mail :)

@snehitdua153 Год назад

I'm getting error in loading the model ValueError: (InvalidArgument) Device id must be less than GPU count, but received id is: 0. GPU count is: 0. [Hint: Expected id < GetGPUDeviceCount(), but received id:0 >= GetGPUDeviceCount():0.] (at /paddle/paddle/phi/backends/gpu/cuda/cuda_info.cc:242)

@malakkhiari1419 Год назад

How to get access to your notebook?

@neuralearn Год назад

Done!

@IsratjahanFateha9106 10 месяцев назад

Can I have the access of your Colab Notebook please? I have requested for the access yesterday

@neuralearn 10 месяцев назад

Hi, check your mail box or spam

@tommy-dz1yg 2 года назад

amazing vid!!!!

@neuralearn 2 года назад

Glad you enjoyed it :) More on the way!!!

@anirbanghorai3699 2 года назад

EXCELLENT! CAN YOU PLS POST A VIDEO ON Paddle OCR custom training (both detection +recognition)steps? I have my own data ..want to do a transfer learning

@neuralearn 2 года назад

We are glad this was helpful :) We shall work on that and publish as soon as possible!

@anirbanghorai3699 2 года назад

@@neuralearn glad you responded..waiting for the custom training video

@ajaychinni3148 Год назад

Please approve the access request for the Google Collab notebook. I am very interested in the code

@AdilKhan-c5q Год назад

Very informative tutorial. I really appreciate the work you have done with this code. I also want to try this. Can you please allow access to the google collab code for this?

@neuralearn Год назад

hello my dear Adil, Please check your mail :)

@chafikhermouche5136 Год назад

Hello, thank you for the tutorial !! Can I get the code please ??

@dishaparmar2609 9 месяцев назад

amazing video..! very helpful ..! could you please provide source code?

@kikigaming4595 Год назад

how to intall layout parser ? from the github now it doesn't have any file such as layout parser

@beratoren7627 Год назад

This was an amazing tutorial ! I really want to try and further tweak this. Can you please grant me access to the Google Colab Code?

@neuralearn Год назад

Hello please check your mail inbox or spam

@aishwaryadinesh7641 Год назад

Hi, I'm getting this error - (External) CUDA error(100), no CUDA-capable device is detected. [Hint: 'cudaErrorNoDevice'. This indicates that no CUDA-capable devices were detected by the installed CUDA driver. ] (at /paddle/paddle/phi/backends/gpu/cuda/cuda_info.cc:66). Can you help me out w this please?

@youssefmouknii5033 Год назад

Thank you so much for this video , Could you please allow access to google collab?

@neuralearn Год назад

Hello my dear Youssef glad this video is helpful :) Please check your mail inbox or spam

@dinaharan0213 Год назад

Hi,i am installed paddlepaddle instead of paddlepaddle-gpu bcoz i dont have gpu in my local system. I getting "AttributeError: module 'numpy' has no attribute 'int'". Is it possible to run this project in local system without gpu.

@edwinjoe6044 Год назад

I facing this error too...☹

@neuralearn Год назад

Hello my dear Dinaharan, here is a notebook which works for cpu runtime: colab.research.google.com/drive/1vZHrahaaubhWMz83jlPuvA1na_v98fUP

@dinaharan0213 Год назад

Hi, I am very happy to get your rply and wonder of your help.I am glad to have youtuber like you. I really liked your efforts for your subscribers. Thank you very much. 🤗😇👏👏👏

@neuralearn Год назад

My pleasure :)

@ss_d25 10 месяцев назад

Hi, great video. Can you please provide access to this notebook? Thanks a lot in advance.

@neuralearn 10 месяцев назад

Hi, check your mail box or spam

@ayushbansal999 Год назад

Hi, please could you provide me with the access to this colab notebook

@neuralearn Год назад

Hello my dear Ayush, Please check your mail inbox or spam

@snehitdua153 Год назад

Hey, can you please provide the link for the pdf used in the video? Thanks

@therafee 11 месяцев назад

Why do we need to clone paddle repository at 15:57

@kanakjaiswal136 Год назад

It was excellently explained. I wanted to try it out but got many errors. So, Could you please grant me access to the google Colab code?

@neuralearn Год назад

Done!

@josephebenezer8869 Год назад

Hi, could you grant me access to the notebook please?

@manojaar2008 2 года назад

Super!!!

@neuralearn 2 года назад

😊

@SaniyaFarash 9 месяцев назад

Very informative video. Can you please share the code with me ? It would be very helpful.

@revanthkumar3406 Год назад

Hey, Really Great Video ❤, can u provide access to notebook

@neuralearn Год назад

Hello my dear Kumar, Please check your mail inbox or spam

@etarhunisuhaib2031 Год назад

Thanks for this video, let's say we have a page with free text and tables, once we have our tables, how can we extract the remaining text ? when im using parser it also extract the table text from the page. i want to use your approche for tables and i want to extract only the remaining text.

@Ankur-be7dz Год назад

for only extracting texts use pdfminer

@andrewlachance2062 9 месяцев назад

just match the consecutive text from the table and parse the PDFs skipping over the text

@khaoulafattah Год назад

thank you for the explanation @Neuralearn , can u please provide me access to the colab ?

@neuralearn Год назад

Please check your mail inbox or spam :)

@sameerdeshmukh1527 Год назад

Thank you. Please can you grant me access to notebook?

@neuralearn Год назад

Please check your mail :)

@Sara-fp1zw Год назад

can you please give me the access to notebook?

@xavier6649 Год назад

Hey Great Work , can you give access to your Colab Drive ? Thanks

@neuralearn Год назад

Please check your mail :)

@rupakjha539 11 месяцев назад

Hi Neuralearn team, can u please provide me the google colab code access

@PrashantKumar-nb5ig Год назад

May be adding download links would have been more helpful,

@neuralearn Год назад

Please check your mail :)

@pratikmore4044 Год назад

I am getting the following error and not sure how can I resolve this: Error: Can not import paddle core while this file exists: /usr/local/lib/python3.10/dist-packages/paddle/fluid/libpaddle.so Tried reinstalling paddlepaddle but that didn't work.

@owaisasghar2033 9 месяцев назад

Sir issue solved?

@stevevu8654 Год назад

it's fascinating. would you mind giving me the access to the colab code?

@neuralearn Год назад

Hello my dear Steve. Please check your mail :)

@ramyas9837 9 месяцев назад

which python version ?

@frekin31 Год назад

Thank you so much for your tutorial! Can you please grant me access to the Google Colab Code?

@neuralearn Год назад

Hello, Please check your mail inbox or spam :)

@KartikSharma-hd7rd Год назад

Excellent tutorial, can you please access grant for google colab notebook :)

@neuralearn Год назад

Sure:) Check your mail!

@youseffarouk6189 Год назад

how can i use paddle ocr for receipts ?

@edwinjoe6044 Год назад

Hi @Neuralearn. I am getting this "ValueError: (InvalidArgument) Device id must be less than GPU count, but received id is: 0. GPU count is: 0. [Hint: Expected id < GetGPUDeviceCount(), but received id:0 >= GetGPUDeviceCount():0.] (at /paddle/paddle/phi/backends/gpu/cuda/cuda_info.cc:242)" I am having intel® hd graphics 2500 graphics card so I can't run the project in my system how to run the program in my system.

@neuralearn Год назад

Hello my dear Joe, here is a notebook which works for cpu runtime: colab.research.google.com/drive/1vZHrahaaubhWMz83jlPuvA1na_v98fUP

@edwinjoe6044 Год назад

@@neuralearn Thank you bro Thanks for the support 🤗

@hussainahmedsiddiqui3742 Год назад

Amazing tutorial, is this code available for use? I would appreciate it!

@neuralearn Год назад

Please check your mail :)

@snehalvats382 Год назад

Hey there! it is a wonderful video on how to work with ocr and table. i have requested for notebook access could you please provide me with the access? thank you once again for this tutorial

@neuralearn Год назад

hello my dear Snehal, please check your mail :)

@snehalvats382 Год назад

@@neuralearn dear team. I have not yet received the confirmation. It's the same email as the one I'm replying with.

@amilaviraj1014 Год назад

This is very informative tutorial! Could you please give me access to the Google Colab Code?

@neuralearn Год назад

Hi my dear Amila Please check your mail inbox or spam :)

@codagebdarija5 Год назад

j'ai eu un refus quand j'ai entrer dans ton colab notebook , pouvez-vous svp me donner accées ?

@neuralearn Год назад

Svp consultez votre boite mail

@codagebdarija5 Год назад

@@neuralearn Merci beacoup

@neuralearn Год назад

Je t'en prie

@AnkushPomendkar-s6f 10 месяцев назад

This tutorial is very helpful and informative . Can you share this code with me ?

@neuralearn 10 месяцев назад

Hi, check your mail box or spam

@googlecloudguru224 Год назад

Please provide access to this notebook

@neuralearn Год назад

Access granted!

@walkwithus6536 Год назад

Hi, if we have multiple tables (huge tables) then this method will work?

@neuralearn Год назад

Yes, it should work. I think it's best to try it for yourself :)

@sudeshkumar5600 Год назад

Hi, It is very interesting and to me. I really want to try this out. Could you please grant me access to the google colab code?

@neuralearn Год назад

Done!

@kibtiachowdhury6011 2 года назад

Hi, I want to get only paragraph text without any figure and table from any type pdf. How can I solve this?

@neuralearn Год назад

You can pick text by changing [if l.type == 'Table':] ----to --> [if l.type == 'Text:]

@kinetic_kane9033 Год назад

Hello can I please get viewing access to the colab notebook?

@neuralearn Год назад

hello Kane, please demand access and check your mail in 5 minutes

@salmankavish3134 Год назад

@Neuralearn Brother can you please grant me access to google collab?

@neuralearn Год назад

hello my dear Salman, Please check your mail :)

@therafee 11 месяцев назад

@neuralearn hello could you indicate me where is test.pdf file?? I have access to de notebook but it throws error I got: PDFPageCountError: Unable to get page count. I/O Error: Couldn't open file '/content/bahdanau attention.pdf': No such file or directory