Parameter rollback averaged stochastic gradient descent for language model.

J. Comput. Methods Sci. Eng.(2022)

引用 0|浏览2
暂无评分
摘要
Recently, AWD-LSTM (ASGD Weight-Dropped LSTM) has achieved good result in the language model, and many AWD-LSTM based models have obtained state-of-the-art perplexities. However, in fact, large-scale neural language models have been shown to be prone to overfitting. In AWD-LSTM original paper, the author decided to adopt the way of retraining calling finetune to get a better result. In this paper, we present a simple yet effective parameter rollback mechanism for neural language models. And we introduce the parameter rollback averaged stochastic gradient descent (PR-ASGD), wherein the parameter “step” in ASGD will decrease according to a certain probability. Using this strategy, we achieve better word level perplexities on Penn Treebank: 56.26 based on AWD-LSTM model and 53.57 based on AWD-LSTM-MoS (AWD-LSTM Mixture of Softmaxes) model.
更多
查看译文
关键词
Optimizer,language model,machine learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要