Generating Diverse Translation by Manipulating Multi-Head Attention

national conference on artificial intelligence, 2020.

Cited by: 0|Views62

Abstract:

Transformer model has been widely used on machine translation tasks and obtained state-of-the-art results. In this paper, we report an interesting phenomenon in its encoder-decoder multi-head attention: different attention heads of the final decoder layer align to different word translation candidates. We empirically verify this discove...More

Code:

Data:

Full Text
Bibtex
Your rating :
0

 

Tags
Comments