Large scale distributed neural network training through online distillation

Gabriel Pereyra
Gabriel Pereyra
Robert Ormándi
Robert Ormándi

ICLR, Volume abs/1804.03235, 2018.

Cited by: 67|Bibtex|Views137|Links
EI

Abstract:

Techniques such as ensembling and distillation promise model quality improvements when paired with almost any base model. However, due to increased test-time cost (for ensembles) and increased complexity of the training pipeline (for distillation), these techniques are challenging to use in industrial settings. In this paper we explore a ...More

Code:

Data:

Your rating :
0

 

Tags
Comments