Compression of Deep Neural Networks by Combining Pruning and Low Rank Decomposition
2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)(2019)
摘要
Large number of weights in deep neural networks make the models difficult to be deployed in low memory environments such as, mobile phones, IOT edge devices as well as "inferencing as a service" environments on the cloud. Prior work has considered reduction in the size of the models, through compression techniques like weight pruning, filter pruning, etc. or through low-rank decomposition of the convolution layers. In this paper, we demonstrate the use of multiple techniques to achieve not only higher model compression but also reduce the compute resources required during inferencing. We do filter pruning followed by low-rank decomposition using Tucker decomposition for model compression. We show that our approach achieves up to 57% higher model compression when compared to either Tucker Decomposition or Filter pruning alone at similar accuracy for GoogleNet. Also, it reduces the Flops by up to 48% thereby making the inferencing faster.
更多查看译文
关键词
Model compression,Filter-Pruning,Low Rank Decomposition
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络