RoBERTa Annotated Paper
RoBERTa: A Robustly Optimized BERT Pretraining Approach Soon after BERT got released in late 2018, A floodgate of transformer-based networks got opened. Full capabilities of BERT was going unnoticed until RoBERTa. In this paper, the authors question and improve the hyperparameters and training paradigm of BERT with carefully crafted experiments and come up with a robust and better performing network without changing the core architecture of BERT. Please feel free to read along with the paper with my notes and highlights. ...