RoBERTa Annotated Paper
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Soon after BERT got released in late 2018, A floodgate of transformer-based networks got opened. Full capabilities of BERT was going unnoticed until RoBERTa. In this paper, the authors question and improve the hyperparameters and training paradigm of BERT with carefully crafted experiments and come up with a robust and better performing network without changing the core architecture of BERT.
Please feel free to read along with the paper with my notes and highlights.
Color | Meaning |
---|---|
Green | Topics about the current paper |
Yellow | Topics about other relevant references |
Blue | Implementation details/ maths/experiments |
Red | Text including my thoughts, questions, and understandings |
I highly recommend going through the BERT paper before this. If you have not check it out here
Follow me on Github and star this repo for regular updates. GitHub
PS: For now, the PDF Above does not render properly on mobile devices, so please download the pdf from the above button or get it from my Github
CITATION
@misc{liu2019roberta,
title={RoBERTa: A Robustly Optimized BERT Pretraining Approach},
author={Yinhan Liu and Myle Ott and Naman Goyal and Jingfei Du and Mandar Joshi and Danqi Chen and Omer Levy and Mike Lewis and Luke Zettlemoyer and Veselin Stoyanov},
year={2019},
eprint={1907.11692},
archivePrefix={arXiv},
primaryClass={cs.CL}
}