Тёмный

LayoutLM: Pre-training of Text and Layout for Document Image Understanding (Paper Summary) 

TechViz - The Data Science Guy
Подписаться 11 тыс.
Просмотров 12 тыс.
50% 1

#ai #documentparsing #languagemodel #transformers
LayoutLM v1/v2 proposes a pre-training objective to understand document better by incorporating layout, text and actual text-image snippets. Fits very well in use-cases like Resume parsing, Bills parsing, Table parsing, etc.
⏩ Abstract: Pre-training techniques have been verified successfully in a variety of NLP tasks in recent years. Despite the widespread use of pre-training models for NLP applications, they almost exclusively focus on text-level manipulation, while neglecting layout and style information that is vital for document image understanding. In this paper, we propose the LayoutLM to jointly model interactions between text and layout information across scanned document images, which is beneficial for a great number of real-world document image understanding tasks such as information extraction from scanned documents. Furthermore, we also leverage image features to incorporate words' visual information into LayoutLM. To the best of our knowledge, this is the first time that text and layout are jointly learned in a single framework for document-level pre-training. It achieves new state-of-the-art results in several downstream tasks, including form understanding (from 70.72 to 79.27), receipt understanding (from 94.02 to 95.24) and document image classification (from 93.07 to 94.42).
⏩ OUTLINE:
0:00 - Background and Abstract
03:58 - LayoutLM pre-training mechanism, architecture and intuition
⏩ Paper Title: LayoutLM: Pre-training of Text and Layout for Document Image Understanding
⏩ Paper: arxiv.org/abs/1912.13318
⏩ Author: Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou
⏩ Organisation: Harbin Institute of Technology, Beihang University, Microsoft Research Asia
⏩ Code: github.com/microsoft/unilm/tr...
Enjoy reading articles? then consider subscribing to Medium membership, it just 5$ a month for unlimited access to all free/paid content. Subscribe now - / membership
*********************************************
If you want to support me financially which totally optional and voluntary :) ❤️
You can consider buying me chai ( because i don't drink coffee :) ) at www.buymeacoffee.com/TechvizC...
*********************************************
⏩ IMPORTANT LINKS
Research Paper Summaries: • Simple Unsupervised Ke...
*********************************************
⏩ RU-vid - / @techvizthedatascienceguy
⏩ LinkedIn - / prakhar21
⏩ Medium - / prakhar.mishra
⏩ GitHub - github.com/prakhar21
*********************************************
⏩ Please feel free to share out the content and subscribe to my channel - / @techvizthedatascienceguy
Tools I use for making videos :)
⏩ iPad - tinyurl.com/y39p6pwc
⏩ Apple Pencil - tinyurl.com/y5rk8txn
⏩ GoodNotes - tinyurl.com/y627cfsa
#techviz #datascienceguy #documentAI #naturallanguageprocessing #resumeparsing #transformers
About Me:
I am Prakhar Mishra and this channel is my passion project. I am currently pursuing my MS (by research) in Data Science. I have an industry work-ex of 3+ years in the field of Data Science and Machine Learning with a particular focus on Natural Language Processing (NLP).

Опубликовано:

 

4 авг 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 17   
@TechVizTheDataScienceGuy
@TechVizTheDataScienceGuy 2 года назад
Watch more paper summaries at ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-ykClwtoLER8.html
@sudhirpol1895
@sudhirpol1895 Год назад
Content is really good but one thing is that, in hugging face implementation they have not used OCR output for Fine-tuning task. During pre-training it is a not a multimodal model, but during fine tuning it should be called as multimodal model, right?
@TheMarComplex
@TheMarComplex Год назад
This was pretty interesting, love to know about the V1 architecture as well!
@marinamaher8211
@marinamaher8211 Год назад
Great, thanks for this clear explanation. If you do V2 & V3, it will be awesome.
@TheMarComplex
@TheMarComplex Год назад
Thanks!
@yosefasefaw4207
@yosefasefaw4207 Год назад
thanks a lot! you are amazing
@TechVizTheDataScienceGuy
@TechVizTheDataScienceGuy Год назад
You’re welcome ☺️
@mariussame9357
@mariussame9357 Год назад
Hi ! Thanks for the video ! I want to ask you a question i'm working in different use cases and the majority of the time the goal is to extract information and i found this model really interesting the problem that I have is I'm a french person so the text from which I want to extract the information are in french and I assume that this model was pretrained on english document so do you think that I can still fine tuned the model on my french document or do you have any recommendation?
@AjitKumarMCS
@AjitKumarMCS Год назад
nice summary. Please make vedio on LayoutLMv2 also
@neeleshshukla242
@neeleshshukla242 2 года назад
Nice summary. btw which editor are you using. Looks like a good way of online annotation and adding notes.
@TechVizTheDataScienceGuy
@TechVizTheDataScienceGuy 2 года назад
Hey Neelesh, thanks for appreciating. I use GoodNotes editor for annotations. You can check the link for the same in the description of any video.
@user-lj7bw2db1l
@user-lj7bw2db1l 9 месяцев назад
Do for V3 its bit different
@shloimielevitsky5983
@shloimielevitsky5983 8 месяцев назад
great video, can you do a version 2 vs version 3
@shloimielevitsky5983
@shloimielevitsky5983 6 месяцев назад
have you done one of those models? what about the LiLT model?
@yashumahajan7
@yashumahajan7 Год назад
please create a video on layoutlmv2
@TechVizTheDataScienceGuy
@TechVizTheDataScienceGuy Год назад
Sure. Thanks!
@arnavdman
@arnavdman 2 года назад
This was pretty interesting, love to know about the V1 architecture as well!
Далее
Советы на всё лето 4 @postworkllc
00:23
Layout Parser Main Presentation
15:00
Просмотров 13 тыс.
Beginning Graphic Design: Layout & Composition
5:15
Просмотров 3,5 млн
Hugging Face LayoutLMv2 Model True Inference
8:09
Просмотров 3,1 тыс.
Советы на всё лето 4 @postworkllc
00:23