Reservoir Transformers
arxiv(2020)
摘要
We demonstrate that transformers obtain impressive performance even when some of the layers are randomly initialized and never updated. Inspired by old and well-established ideas in machine learning, we explore a variety of non-linear "reservoir" layers interspersed with regular transformer layers, and show improvements in wall-clock compute time until convergence, as well as overall performance, on various machine translation and (masked) language modelling tasks.
更多查看译文
关键词
transformer
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络