Accessing Higher-level Representations in Sequential Transformers with Feedback Memory

Fan Angela,Lavril Thibaut,Grave Edouard,Joulin Armand,Sukhbaatar Sainbayar

arxiv（2020）

引用 7|浏览107

暂无评分

摘要

Transformers are feedforward networks that can process input tokens in parallel. While this parallelization makes them computationally efficient, it restricts the model from fully exploiting the sequential nature of the input - the representation at a given layer can only access representations from lower layers, rather than the higher level representations already built in previous time steps. In this work, we propose the Feedback Transformer architecture that exposes all previous representations to all future representations, meaning the lowest representation of the current timestep is formed from the highest-level abstract representation of the past. We demonstrate on a variety of benchmarks in language modeling, neural machine translation, summarization, and reinforcement learning that the increased representation capacity can improve over Transformer baselines.

查看译文

关键词

sequential transformers,feedback memory,higher-level

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要