BERT Annotated Paper
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
The revolutionary paper by Google that increased the State-of-the-art performance for various NLP tasks and set the stepping stone for many other revolutionary architectures.
This paper leads the way and sets a direction for the entire domain. It shows clear benefits of using pre-trained models(trained on huge datasets) and transfer learning independent of the downstream tasks.
Please feel free to read along with the paper with my notes and highlights.
Color | Meaning |
---|---|
Green | Topics about the current paper |
Yellow | Topics about other relevant references |
Blue | Implementation details/ maths/experiments |
Red | Text including my thoughts, questions, and understandings |
I have added the architectural details, my insights on the transformer architecture and some idea about positional embeddings in the end.
You can also look into the introduction to BERT implementation with Tensorflow as:
Please feel free to fork or follow the GitHub Repo for all the Annotated Papers.
PS: For now the PDF Above does not render properly on mobile device, so please download the pdf from the above button or get it from my Github
CITATION
@inproceedings{47751,
title = {BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding},
author = {Jacob Devlin and Ming-Wei Chang and Kenton Lee and Kristina N. Toutanova},
year = {2018},
URL = {https://arxiv.org/abs/1810.04805}
}