Semi-Autoregressive Training Improves Mask-Predict Decoding
Abstract:
The recently proposed mask-predict decoding algorithm has narrowed the performance gap between semi-autoregressive machine translation models and the traditional left-to-right approach. We introduce a new training method for conditional masked language models, SMART, which mimics the semi-autoregressive behavior of mask-predict, produci...More
Code:
Data:
Full Text
Tags
Comments