Parameter rollback averaged stochastic gradient descent for language model.

Zhao Cheng,Guanlin Chen,Wenyong Weng, Qi Lu,Wujian Yang

J. Comput. Methods Sci. Eng.（2022）

引用 0|浏览2

暂无评分

摘要

Recently, AWD-LSTM (ASGD Weight-Dropped LSTM) has achieved good result in the language model, and many AWD-LSTM based models have obtained state-of-the-art perplexities. However, in fact, large-scale neural language models have been shown to be prone to overfitting. In AWD-LSTM original paper, the author decided to adopt the way of retraining calling finetune to get a better result. In this paper, we present a simple yet effective parameter rollback mechanism for neural language models. And we introduce the parameter rollback averaged stochastic gradient descent (PR-ASGD), wherein the parameter “step” in ASGD will decrease according to a certain probability. Using this strategy, we achieve better word level perplexities on Penn Treebank: 56.26 based on AWD-LSTM model and 53.57 based on AWD-LSTM-MoS (AWD-LSTM Mixture of Softmaxes) model.

查看译文

关键词

Optimizer,language model,machine learning

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要