LayoutLM: Pre-training of Text and Layout for Document Image Understanding
Diving deeper into the domain of understanding documents, today we have a brilliant paper by folks at Microsoft. The main idea of this paper is to jointly model the text as well as layout information for documents. The authors talk about the importance of layout features in the form of 2D positional embeddings and Visual features in the form of token-wise image embeddings along with the textual features for state of the art document understanding. This paper is a solid milestone in this domain and is now actively used as a benchmark of comparison for the latest research in the area.
Please feel free to read along with the paper with my notes and highlights.
| Color | Meaning |
|---|---|
| Green | Topics about the current paper |
| Yellow | Topics about other relevant references |
| Blue | Implementation details/ maths/experiments |
| Red | Text including my thoughts, questions, and understandings |