RoBERTa Annotated Paper

1 minute read

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Soon after BERT got released in late 2018, A floodgate of transformer-based networks got opened. Full capabilities of BERT was going unnoticed until RoBERTa. In this paper, the authors question and improve the hyperparameters and training paradigm of BERT with carefully crafted experiments and come up with a robust and better performing network without changing the core architecture of BERT.

Please feel free to read along with the paper with my notes and highlights.

Color Meaning
Green Topics about the current paper
Yellow Topics about other relevant references
Blue Implementation details/ maths/experiments
Red Text including my thoughts, questions, and understandings

I highly recommend going through the BERT paper before this. If you have not check it out here

Follow me on Github and star this repo for regular updates. GitHub

PS: For now, the PDF Above does not render properly on mobile devices, so please download the pdf from the above button or get it from my Github


CITATION

@misc{liu2019roberta,
      title={RoBERTa: A Robustly Optimized BERT Pretraining Approach}, 
      author={Yinhan Liu and Myle Ott and Naman Goyal and Jingfei Du and Mandar Joshi and Danqi Chen and Omer Levy and Mike Lewis and Luke Zettlemoyer and Veselin Stoyanov},
      year={2019},
      eprint={1907.11692},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}