Neural Machine Translation with Byte-Level Subwords

Changhan Wang
Changhan Wang

national conference on artificial intelligence, 2020.

Cited by: 2|Bibtex|Views60
Other Links: academic.microsoft.com|arxiv.org

Abstract:

Almost all existing machine translation models are built on top of character-based vocabularies: characters, subwords or words. Rare characters from noisy text or character-rich languages such as Japanese and Chinese however can unnecessarily take up vocabulary slots and limit its compactness. Representing text at the level of bytes and...More

Code:

Data:

Full Text
Your rating :
0

 

Tags
Comments