#ai #documentparsing #languagemodel #transformers
LayoutLM v1/v2 proposes a pre-training objective to understand document better by incorporating layout, text and actual text-image snippets. Fits very well in use-cases like Resume parsing, Bills parsing, Table parsing, etc.
⏩ Abstract: Pre-training techniques have been verified successfully in a variety of NLP tasks in recent years. Despite the widespread use of pre-training models for NLP applications, they almost exclusively focus on text-level manipulation, while neglecting layout and style information that is vital for document image understanding. In this paper, we propose the LayoutLM to jointly model interactions between text and layout information across scanned document images, which is beneficial for a great number of real-world document image understanding tasks such as information extraction from scanned documents. Furthermore, we also leverage image features to incorporate words' visual information into LayoutLM. To the best of our knowledge, this is the first time that text and layout are jointly learned in a single framework for document-level pre-training. It achieves new state-of-the-art results in several downstream tasks, including form understanding (from 70.72 to 79.27), receipt understanding (from 94.02 to 95.24) and document image classification (from 93.07 to 94.42).
⏩ OUTLINE:
0:00 - Background and Abstract
03:58 - LayoutLM pre-training mechanism, architecture and intuition
⏩ Paper Title: LayoutLM: Pre-training of Text and Layout for Document Image Understanding
⏩ Paper: arxiv.org/abs/1912.13318
⏩ Author: Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou
⏩ Organisation: Harbin Institute of Technology, Beihang University, Microsoft Research Asia
⏩ Code: github.com/microsoft/unilm/tr...
Enjoy reading articles? then consider subscribing to Medium membership, it just 5$ a month for unlimited access to all free/paid content. Subscribe now - / membership
*********************************************
If you want to support me financially which totally optional and voluntary :) ❤️
You can consider buying me chai ( because i don't drink coffee :) ) at www.buymeacoffee.com/TechvizC...
*********************************************
⏩ IMPORTANT LINKS
Research Paper Summaries: • Simple Unsupervised Ke...
*********************************************
⏩ RU-vid - / @techvizthedatascienceguy
⏩ LinkedIn - / prakhar21
⏩ Medium - / prakhar.mishra
⏩ GitHub - github.com/prakhar21
*********************************************
⏩ Please feel free to share out the content and subscribe to my channel - / @techvizthedatascienceguy
Tools I use for making videos :)
⏩ iPad - tinyurl.com/y39p6pwc
⏩ Apple Pencil - tinyurl.com/y5rk8txn
⏩ GoodNotes - tinyurl.com/y627cfsa
#techviz #datascienceguy #documentAI #naturallanguageprocessing #resumeparsing #transformers
About Me:
I am Prakhar Mishra and this channel is my passion project. I am currently pursuing my MS (by research) in Data Science. I have an industry work-ex of 3+ years in the field of Data Science and Machine Learning with a particular focus on Natural Language Processing (NLP).
4 авг 2024