LayoutLM Annotated Paper
LayoutLM: Pre-training of Text and Layout for Document Image Understanding
Diving deeper into the domain of understanding documents, today we have a brilliant paper by folks at Microsoft. The main idea of this paper is to jointly model the text as well as layout information for documents. The authors talk about the importance of layout features in the form of 2D positional embeddings and Visual features in the form of token-wise image embeddings along with the textual features for state of the art document understanding. This paper is a solid milestone in this domain and is now actively used as a benchmark of comparison for the latest research in the area.
Please feel free to read along with the paper with my notes and highlights.
Color | Meaning |
---|---|
Green | Topics about the current paper |
Yellow | Topics about other relevant references |
Blue | Implementation details/ maths/experiments |
Red | Text including my thoughts, questions, and understandings |
Follow me on Github and star this repo for regular updates. GitHub
Also, Follow me on Twitter.
PS: For now, the PDF Above does not render properly on mobile devices, so please download the pdf from the above button or get it from my Github
CITATION
@article{Xu_2020,
title={LayoutLM: Pre-training of Text and Layout for Document Image Understanding},
ISBN={9781450379984},
url={http://dx.doi.org/10.1145/3394486.3403172},
DOI={10.1145/3394486.3403172},
journal={Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining},
publisher={ACM},
author={Xu, Yiheng and Li, Minghao and Cui, Lei and Huang, Shaohan and Wei, Furu and Zhou, Ming},
year={2020},
month={Jul}
}