DSD: Regularizing Deep Neural Networks with Dense-Sparse-Dense Training Flow.

Song Han,Jeff Pool,Sharan Narang,Huizi Mao,Shijian Tang,Erich Elsen,Bryan Catanzaro,John Tran,William J. Dally

arXiv: Computer Vision and Pattern Recognition（2016）

引用 85|浏览162

暂无评分

摘要

Modern deep neural networks have a large number of parameters, making them very powerful machine learning systems. A critical issue for training such large networks on large-scale data-sets is to prevent overfitting while at the same time providing enough model capacity. We propose DSD, a dense-sparse-dense training flow, for regularizing deep neural networks. In the first D step, we train a dense network to learn which connections are important. In the S step, we regularize the network by pruning the unimportant connections and retrain the network given the sparsity constraint. In the final D step, we increase the model capacity by freeing the sparsity constraint, re-initializing the pruned parameters, and retraining the whole dense network. Experiments show that DSD training can improve the performance of a wide range of CNN, RNN and LSTMs on the tasks of image classification, caption generation and speech recognition. On the Imagenet dataset, DSD improved the absolute accuracy of AlexNet, GoogleNet, VGG-16, ResNet50, ResNet-152 and SqueezeNet by a geo-mean of 2.1 points (Top-1) and 1.4 points (Top-5). On the WSJ’92 and WSJ’93 dataset, DSD improved DeepSpeech2 WER by 0.53 and 1.08 points. On the Flickr-8K dataset, DSD improved the NeuralTalk BLEU score by 2.0 points. DSD training flow produces the same model architecture and doesn’t incur any inference overhead.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要