Wordreg: Mitigating the Gap between Training and Inference with Worst-Case Drop Regularization
ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2023)
摘要
Dropout has emerged as one of the most frequently used techniques for training deep neural networks (DNNs). Although effective, the sampled sub-model by random dropout during training is inconsistent with the full model (without dropout) during inference. To mitigate this undesirable gap, we propose WordReg, a simple yet effective regularization built on dropout that enforces the consistency between the outputs of different sub-models sampled by dropout. Specifically, WordReg first obtains the worst-case dropout by maximizing the divergence between the outputs with two sub-models with different random dropouts. And then, it encourages the agreements between the outputs of the two sub-models with worstcase divergence. Extensive experiments on diverse DNNs and tasks reveal that WordReg can achieve notable and consistent improvements over non-regularized models and yields some state-of-the-art results. Theoretically, we verify that WordReg can reduce the gap between training and inference.
更多查看译文
关键词
Image Recognition,Language Understanding,Graph Mining,Dropout,Regularization
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要