Generating Diverse Translation by Manipulating Multi-Head Attention
national conference on artificial intelligence, 2020.
Transformer model has been widely used on machine translation tasks and obtained state-of-the-art results. In this paper, we report an interesting phenomenon in its encoder-decoder multi-head attention: different attention heads of the final decoder layer align to different word translation candidates. We empirically verify this discove...More
PPT (Upload PPT)