WebFormer Annotated Paper

1 minute read

WebFormer: The Web-page Transformer for Structure Information Extraction

Understanding tokens from unstructured web pages is challenging in practice due to a variety of web layout patterns, this is where WebFormer comes into play. In this paper, the authors propose a novel architecture, WebFormer, a Web-page transFormer model for structure information extraction from web documents. This paper also introduces rich attention patterns between HTML tokens and text tokens, which leverages the web layout for effective attention weight computation. This can prove to be a big leap in web page understanding as it provides great incremental results and a way forward for the domain.

Please feel free to read along with the paper with my notes and highlights.

Color	Meaning
Green	Topics about the current paper
Yellow	Topics about other relevant references
Blue	Implementation details/ maths/experiments
Red	Text including my thoughts, questions, and understandings

Follow me on Github and star this repo for regular updates. GitHub

Also, Follow me on Twitter.

PS: For now, the PDF Above does not render properly on mobile devices, so please download the pdf from the above button or get it from my Github

CITATION

@misc{wu2021fastformer,
      title={WebFormer: The Web-page Transformer for Structure Information Extraction}, 
      author={Qifan Wang and Yi Fang and Anirudh Ravula and Fuli Feng and Xiaojun Quan and Dongfang Liu},
      year={2022},
      eprint={2202.00217},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Twitter Facebook LinkedIn

Akshay Uppal

WebFormer Annotated Paper

WebFormer: The Web-page Transformer for Structure Information Extraction

You May Also Enjoy

DiT Annotated Paper

LayoutLMv2 Annotated Paper

Fastformer Annotated Paper

LayoutLM Annotated Paper