Large scale distributed neural network training through online distillation
ICLR, Volume abs/1804.03235, 2018.
Techniques such as ensembling and distillation promise model quality improvements when paired with almost any base model. However, due to increased test-time cost (for ensembles) and increased complexity of the training pipeline (for distillation), these techniques are challenging to use in industrial settings. In this paper we explore a ...More
PPT (Upload PPT)