BERT Annotated Paper

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

png

The revolutionary paper by Google that increased the State-of-the-art performance for various NLP tasks and set the stepping stone for many other revolutionary architectures.

This paper leads the way and sets a direction for the entire domain. It shows clear benefits of using pre-trained models(trained on huge datasets) and transfer learning independent of the downstream tasks.

Please feel free to read along with the paper with my notes and highlights.

Color	Meaning
Green	Topics about the current paper
Yellow	Topics about other relevant references
Blue	Implementation details/ maths/experiments
Red	Text including my thoughts, questions, and understandings

I have added the architectural details, my insights on the transformer architecture and some idea about positional embeddings in the end.

You can also look into the introduction to BERT implementation with Tensorflow as:

Please feel free to fork or follow the GitHub Repo for all the Annotated Papers.

PS: For now the PDF Above does not render properly on mobile device, so please download the pdf from the above button or get it from my Github

Citation

@inproceedings{47751,
              title	= {BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding},
              author	= {Jacob Devlin and Ming-Wei Chang and Kenton Lee and Kristina N. Toutanova},
              year	= {2018},
              URL	= {https://arxiv.org/abs/1810.04805}
}

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding#

Citation#

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Citation