Byte Pair Encoding is Suboptimal for Language Model Pretraining

Bostrom Kaj
Bostrom Kaj

EMNLP, pp. 4617-4624, 2020.

EI
Other Links: arxiv.org|dblp.uni-trier.de|academic.microsoft.com

Abstract:

The success of pretrained transformer language models in natural language processing has led to a wide range of different pretraining setups. These models employ a variety of subword tokenization methods, most notably byte pair encoding (BPE) (Sennrich et al., 2016; Gage, 1994), the WordPiece method (Schuster and Nakajima, 2012), and un...More

Code:

Data:

Full Text
Your rating :
0

 

Tags
Comments