RoBERTa Annotated Paper

1 minute read

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Soon after BERT got released in late 2018, A floodgate of transformer-based networks got opened. Full capabilities of BERT was going unnoticed until RoBERTa. In this paper, the authors question and improve the hyperparameters and training paradigm of BERT with carefully crafted experiments and come up with a robust and better performing network without changing the core architecture of BERT.

Please feel free to read along with the paper with my notes and highlights.

Color	Meaning
Green	Topics about the current paper
Yellow	Topics about other relevant references
Blue	Implementation details/ maths/experiments
Red	Text including my thoughts, questions, and understandings

I highly recommend going through the BERT paper before this. If you have not check it out here

Follow me on Github and star this repo for regular updates. GitHub

PS: For now, the PDF Above does not render properly on mobile devices, so please download the pdf from the above button or get it from my Github

CITATION

@misc{liu2019roberta,
      title={RoBERTa: A Robustly Optimized BERT Pretraining Approach}, 
      author={Yinhan Liu and Myle Ott and Naman Goyal and Jingfei Du and Mandar Joshi and Danqi Chen and Omer Levy and Mike Lewis and Luke Zettlemoyer and Veselin Stoyanov},
      year={2019},
      eprint={1907.11692},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Twitter Facebook LinkedIn

Akshay Uppal

RoBERTa Annotated Paper

RoBERTa: A Robustly Optimized BERT Pretraining Approach

You May Also Enjoy

DiT Annotated Paper

WebFormer Annotated Paper

LayoutLMv2 Annotated Paper

Fastformer Annotated Paper