Large scale distributed neural network training through online distillationEI

摘要

Techniques such as ensembling and distillation promise model quality improvements when paired with almost any base model. However, due to increased test-time cost (for ensembles) and increased complexity of the training pipeline (for distillation), these techniques are challenging to use in industrial settings. In this paper we explore a variant of distillation which is relatively straightforward to use as it does not require a complicated multi-stage setup or many new hyperparameters. Our first claim is that online distillation enabl...更多
个人信息

 

您的评分 :

Volume abs/1804.032352018,

被引用次数9|引用|16
标签
作者
评论