Better Localness for Non-Autoregressive Transformer

ACM Transactions on Asian and Low-Resource Language Information Processing(2023)

引用 0|浏览4
暂无评分
摘要
The Non-Autoregressive Transformer, due to its low inference latency, has attracted much attention from researchers. Although, the performance of the non-autoregressive transformer has been significantly improved in recent years, there is still a gap between the non-autoregressive transformer and the autoregressive transformer. Considering the success of localness on the autoregressive transformer, in this work, we consider incorporating localness into the non-autoregressive transformer. Specifically, we design a dynamic mask matrix according to the query tokens, key tokens, and relative distance, and unify the localness module for self-attention and cross-attention module. We conduct experiments on several benchmark tasks, and the results show that our model can significantly improve the performance of the non-autoregressive transformer.
更多
查看译文
关键词
Non-autoregressive,localness,attention module,translation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要