Toward Better Accuracy-Efficiency Trade-Offs: Divide and Co-Training

Shuai Zhao,Liguang Zhou,Wenxiao Wang,Deng Cai,Tin Lun Lam,Yangsheng Xu

IEEE TRANSACTIONS ON IMAGE PROCESSING（2022）

引用 30|浏览79

暂无评分

摘要

The width of a neural network matters since increasing the width will necessarily increase the model capacity. However, the performance of a network does not improve linearly with the width and soon gets saturated. In this case, we argue that increasing the number of networks (ensemble) can achieve better accuracy-efficiency trade-offs than purely increasing the width. To prove it, one large network is divided into several small ones regarding its parameters and regularization components. Each of these small networks has a fraction of the original one's parameters. We then train these small networks together and make them see various views of the same data to increase their diversity. During this co-training process, networks can also learn from each other. As a result, small networks can achieve better ensemble performance than the large one with few or no extra parameters or FLOPs, i. e., achieving better accuracy-efficiency trade-offs. Small networks can also achieve faster inference speed than the large one by concurrent running. All of the above shows that the number of networks is a new dimension of model scaling. We validate our argument with 8 different neural architectures on common benchmarks through extensive experiments.

查看译文

关键词

Training, Neural networks, Convolution, Computer architecture, Costs, Tin, Kernel, Image classification, divide networks, co-training, deep networks ensemble

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要