Low-Rank and Locality Constrained Self-Attention for Sequence Modeling.

Qipeng Guo,Xipeng Qiu,Xiangyang Xue,Zheng Zhang

IEEE/ACM Transactions on Audio, Speech, and Language Processing（2019）

引用 26|浏览359

暂无评分

摘要

Self-attention mechanism becomes more and more popular in natural language processing (NLP) applications. Recent studies show the Transformer architecture which relies mainly on the attention mechanism achieves much success on large datasets. But a raised problem is its generalization ability is weaker than CNN and RNN on many moderate-sized datasets. We think the reason can be attributed to its u...

查看译文

关键词

Sparse matrices,Bit error rate,Matrix decomposition,Linguistics,Task analysis,Natural language processing,Data models

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要