How to Fine-tune LayoutLMv3 with Annotated Documents Using PaddleOCR | Part-1: Annonate using paddle

Подписаться 436

Просмотров 10 тыс.

50% 1

In this tutorial, we will learn how to fine-tune LayoutLMv3 with annotated documents using PaddleOCR. LayoutLMv3 is a powerful text detection and layout analysis model that can be used to extract text from documents. PaddleOCR is an open-source OCR system that supports a variety of languages and document types.
To fine-tune LayoutLMv3 with annotated documents, we will need to:
1. PaddleOCR
2. Label-studio
3. Transformers - huggingFace
code link: github.com/manikanthp/LayoutL...
LayoutLMv3, Fine-tune, Annotated Documents, PaddleOCR, Text Recognition, Document Layout Analysis, Computer Vision, Natural Language Processing, Deep Learning

Наука

Опубликовано:

11 июн 2023

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 30

@A--NabiilahNadaIswari Месяц назад

hi, can this be run on Python versions above 3.9?

@mochen-h7m Месяц назад

hello, this is very good video,now i have a question, Why use PaddleOCR to extract information from images when you can directly read and extract information from images using LayoutLMv3?

@user-tg1rg7pj5d 8 месяцев назад

brother ...i created virtual environment but still facing issue while installing paddleocr it is because or myMupdf library ...please address this if possible

@AIOdysseyhub 8 месяцев назад

Delete the current virtual env and create a new env and first install paddle libraries as mentioned in video then check if got installed or not properly then install mupdf libraries, I have installed multiple time, for me there was no issue it should be same for you as well, THanks for reaching out, Please subscribe to the channel and if your issue does not solve please let me know. Thank you 😊😊😊😊

@AjitKumarMCS 11 месяцев назад

If you provide the requirement.txt for paddleocr file then it would be very helpful. Becuase I am getting error while installing the paddleocr due to version dependency

@AIOdysseyhub 11 месяцев назад

HI Ajith, make sure to create new virtualenv , without creating anu env you will get errors for sure and also below I am pasting my pipfile [[source]] url = "pypi.org/simple" verify_ssl = true name = "pypi" [packages] label-studio = "*" pytesseract = "*" paddlepaddle = "*" paddleocr = "*" paddleclas = "*" fitz = "*" pymupdf = "*" torch = "*" transformers = "*" numpy = "==1.25.0" torchmetrics = "*" [dev-packages] [requires] python_version = "3.9" '''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' this is in my pipfile else create virtualenv using pipenv and get into the env and run below cmds 1) pipenv install paddlepaddle 2) pipenv install paddleclas 3) pipenv install paddleocr thats it, it will install I have tried many time it was working fine for me, if stills fails create new folder in other local drive and start again from creating environment so on....

@chandanha9532 4 месяца назад

Hi sir i cloned the GitHub repo that you have provided and created virtual environment after running the command pip install paddleclas I am getting the error as below, I am trying to resolve it from past 2 days not solved can you please help error: command '/usr/bin/swig' failed with exit code 1 [end of output] note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building wheel for faiss-cpu Running setup.py clean for faiss-cpu Failed to build faiss-cpu ERROR: Could not build wheels for faiss-cpu, which is required to install pyproject.toml-based projects

@medjawherzgolli9507 2 месяца назад

you should install swig and add it to path

@akashsalmuthe9846 11 месяцев назад

got this error: "ImportError: DLL load failed while importing libpaddle: The specified module could not be found."

@AIOdysseyhub 11 месяцев назад

Hi Akash, Please create a virtual environment don't install libraries in global python environment. this will definitely get some error, after creating env then run cmds to install libraries if you are using pipenv to create env then run cmds as "pipenv install this will solve that issue, IF you are still having that issue please let me know. Thanks for reaching out. Please subscribe to the channel and like the video if it helps you its keep motivates me. Thank you again 😊😊😊😊

@akashsalmuthe9846 11 месяцев назад

@@AIOdysseyhub, Working with venv, I tried with python 3.7, 3.9 and 3.10. have same error. Also used colab for the same but not able resolved this error.

@AIOdysseyhub 11 месяцев назад

@@akashsalmuthe9846 , I am using 3.9 its worked for me, if it's still troughing the same error, change the folder from different to driver, don not copy it. instead create from scratch, you might be using nested folder where some env has already created in that parent folder so change the folder to different drive itself. and create new folder then create env then install, it will work you can test it with 3.9 itself. let me know if its working. Thanks,

@akashsalmuthe9846 11 месяцев назад

@@AIOdysseyhub I will try it.

@user-ge5wr5ue1b 11 месяцев назад

Not able to install paddleclas, tried everything.

@AIOdysseyhub 11 месяцев назад

Not sure, why 🤔, I have tried just now for other project it has worked now, what error you are getting?

@user-ge5wr5ue1b 11 месяцев назад

don't know what happening, if i installed it in base it installed successfully, but when i'm trying to install in conda env its throwing an error " from /home/lokendra/.local/lib/python3.10/site-packages/~aiss_cpu.libs error: legacy-install-failure × Encountered error while trying to install package. ╰─> faiss-cpu " @@AIOdysseyhub

@NishantSah 9 месяцев назад

there are major issues in installing paddleclas. so many other people facing the issues. Even i have been trying for 2 + days

@AIOdysseyhub 9 месяцев назад

@@NishantSah have you created new environment using pipenv and installed paddleclas or you are trying to install it in global python, if you are not installing in emv you will definitely get some errors

@user-cq5wl1jk5r 10 месяцев назад

Hi, After executing the code the details about the image and its bounding box number etc. are not showing in the output. can you please help me with this. it showing the error : "OMP: Error #15: Initializing libiomp5md.dll, but found libiomp5md.dll already initialized. OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results."

@AIOdysseyhub 10 месяцев назад

Hi, Sorry for the delay response, the error might be because of libraries clash, have you created new env and installed all the required libraries in new env, if not please do that that can solve your error. Please let me know this has help you or not. Thanks for the support, Please subscribe the channel for more such videos.

@user-cq5wl1jk5r 10 месяцев назад

really helpful@@AIOdysseyhub

@user-lm8ye5jt3l 11 месяцев назад

Next following tutorial?

@AIOdysseyhub 11 месяцев назад

Hi, I have made in three parts you can check this in my channel playlist. I am also working on other parts as well as i am occupied with full time work it's getting delayed sure will upload others as well meanwhile you can check other parts as well. If you suck somewhere you can comment it. Thanks for the support ☺️ And please subscribe to the channel and like the videos if it's helpful 😄

@priyanakavasakan2894 9 месяцев назад

can i know your linkedin profile?

@AIOdysseyhub 9 месяцев назад

www.linkedin.com/in/manikanthp559/ You can also find links in about page of this RU-vid channel

@tranphu2768 8 месяцев назад

File "C:\Users\Admin\PycharmProjects\LayoutLMV3_Fine_Tuning-main\Create_LMv3_dataset_with_paddleOCR.py", line 83, in extracted_tables_to_label_studio_json_file_with_paddleOCR four_co_ord = [co_ord[0][0], co_ord[1][1], co_ord[2][0] - co_ord[0][0], co_ord[2][1] - co_ord[1][1]] ~~~~~~~~~^^^ TypeError: 'float' object is not subscriptable

@musaibahmed3145 7 месяцев назад

Did you find a solution for this?

@AIOdysseyhub 7 месяцев назад

the co_ord is not a list its float object, Please check the co_ord and print the co_ord before where it has used and track back the co_ord variable where it changing to float value. let me know if this help you or not

@user-fg7tl2gu3k 9 месяцев назад

ValueError: (InvalidArgument) Broadcast dimension mismatch. Operands could not be broadcast together with the shape of X = [6, 96, 3, 20] and the shape of Y = [6, 96, 4, 20]. Received [3] in X is not equal to [4] in Y at i:2. [Hint: Expected x_dims_array[i] == y_dims_array[i] || x_dims_array[i]

@AIOdysseyhub 8 месяцев назад

Have you changed the json file as onehot encoding manually ? If it doesn't solve please let me know Thank you!