Training RNNs as Fast as CNNs
empirical methods in natural language processing, Volume abs/1709.02755, 2018.
Common recurrent neural network architectures scale poorly due to the intrinsic difficulty in parallelizing their state computations. In this work, we propose the Simple Recurrent Unit (SRU) architecture, a recurrent unit that simplifies the computation and exposes more parallelism. In SRU, the majority of computation for each step is ind...More
PPT (Upload PPT)