Understanding Knowledge Distillation in Non-autoregressive Machine Translation

Cited by: 15|Bibtex|Views24
Other Links: arxiv.org

Abstract:

Non-autoregressive machine translation (NAT) systems predict a sequence of output tokens in parallel, achieving substantial improvements in generation speed compared to autoregressive models. Existing NAT models usually rely on the technique of knowledge distillation, which creates the training data from a pretrained autoregressive mode...More

Code:

Data:

Full Text
Your rating :
0

 

Tags
Comments