Low-Rank and Locality Constrained Self-Attention for Sequence Modeling.

IEEE/ACM Transactions on Audio, Speech, and Language Processing(2019)

引用 26|浏览359
暂无评分
摘要
Self-attention mechanism becomes more and more popular in natural language processing (NLP) applications. Recent studies show the Transformer architecture which relies mainly on the attention mechanism achieves much success on large datasets. But a raised problem is its generalization ability is weaker than CNN and RNN on many moderate-sized datasets. We think the reason can be attributed to its u...
更多
查看译文
关键词
Sparse matrices,Bit error rate,Matrix decomposition,Linguistics,Task analysis,Natural language processing,Data models
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要