BERT Annotated Paper

1 minute read

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

png

The revolutionary paper by Google that increased the State-of-the-art performance for various NLP tasks and set the stepping stone for many other revolutionary architectures.

This paper leads the way and sets a direction for the entire domain. It shows clear benefits of using pre-trained models(trained on huge datasets) and transfer learning independent of the downstream tasks.

Please feel free to read along with the paper with my notes and highlights.

Color Meaning
Green Topics about the current paper
Yellow Topics about other relevant references
Blue Implementation details/ maths/experiments
Red Text including my thoughts, questions, and understandings

I have added the architectural details, my insights on the transformer architecture and some idea about positional embeddings in the end.

You can also look into the introduction to BERT implementation with Tensorflow as:

Please feel free to fork or follow the GitHub Repo for all the Annotated Papers.

PS: For now the PDF Above does not render properly on mobile device, so please download the pdf from the above button or get it from my Github


CITATION

@inproceedings{47751,
              title	= {BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding},
              author	= {Jacob Devlin and Ming-Wei Chang and Kenton Lee and Kristina N. Toutanova},
              year	= {2018},
              URL	= {https://arxiv.org/abs/1810.04805}
}