Graph Convolutions Enrich the Self-Attention in Transformers!
CoRR(2023)
摘要
Transformers, renowned for their self-attention mechanism, have achieved
state-of-the-art performance across various tasks in natural language
processing, computer vision, time-series modeling, etc. However, one of the
challenges with deep Transformer models is the oversmoothing problem, where
representations across layers converge to indistinguishable values, leading to
significant performance degradation. We interpret the original self-attention
as a simple graph filter and redesign it from a graph signal processing (GSP)
perspective. We propose graph-filter-based self-attention (GFSA) to learn a
general yet effective one, whose complexity, however, is slightly larger than
that of the original self-attention mechanism. We demonstrate that GFSA
improves the performance of Transformers in various fields, including computer
vision, natural language processing, graph pattern classification, speech
recognition, and code classification.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要