Neural Machine Translation with Byte-Level Subwords
national conference on artificial intelligence, 2020.
Almost all existing machine translation models are built on top of character-based vocabularies: characters, subwords or words. Rare characters from noisy text or character-rich languages such as Japanese and Chinese however can unnecessarily take up vocabulary slots and limit its compactness. Representing text at the level of bytes and...More
PPT (Upload PPT)