BERT Annotated Paper
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding The revolutionary paper by Google that increased the State-of-the-art performance for various NLP tasks and set the stepping stone for many other revolutionary architectures. This paper leads the way and sets a direction for the entire domain. It shows clear benefits of using pre-trained models(trained on huge datasets) and transfer learning independent of the downstream tasks. Please feel free to read along with the paper with my notes and highlights. ...