Cloze-driven Pretraining of Self-attention Networks
EMNLP/IJCNLP (1), pp. 5359-5368, 2019.
We present a new approach for pretraining a bi-directional transformer model that provides significant performance gains across a variety of language understanding problems. Our model solves a cloze-style word reconstruction task, where each word is ablated and must be predicted given the rest of the text. Experiments demonstrate large ...More
PPT (Upload PPT)