LayoutLM Annotated Paper

1 minute read

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

Diving deeper into the domain of understanding documents, today we have a brilliant paper by folks at Microsoft. The main idea of this paper is to jointly model the text as well as layout information for documents. The authors talk about the importance of layout features in the form of 2D positional embeddings and Visual features in the form of token-wise image embeddings along with the textual features for state of the art document understanding. This paper is a solid milestone in this domain and is now actively used as a benchmark of comparison for the latest research in the area.

Please feel free to read along with the paper with my notes and highlights.

Color Meaning
Green Topics about the current paper
Yellow Topics about other relevant references
Blue Implementation details/ maths/experiments
Red Text including my thoughts, questions, and understandings

Follow me on Github and star this repo for regular updates. GitHub

Also, Follow me on Twitter.

PS: For now, the PDF Above does not render properly on mobile devices, so please download the pdf from the above button or get it from my Github


CITATION

@article{Xu_2020,
   title={LayoutLM: Pre-training of Text and Layout for Document Image Understanding},
   ISBN={9781450379984},
   url={http://dx.doi.org/10.1145/3394486.3403172},
   DOI={10.1145/3394486.3403172},
   journal={Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining},
   publisher={ACM},
   author={Xu, Yiheng and Li, Minghao and Cui, Lei and Huang, Shaohan and Wei, Furu and Zhou, Ming},
   year={2020},
   month={Jul}
}