Byte Pair Encoding is Suboptimal for Language Model Pretraining
EMNLP, pp. 4617-4624, 2020.
The success of pretrained transformer language models in natural language processing has led to a wide range of different pretraining setups. These models employ a variety of subword tokenization methods, most notably byte pair encoding (BPE) (Sennrich et al., 2016; Gage, 1994), the WordPiece method (Schuster and Nakajima, 2012), and un...More
PPT (Upload PPT)