An empirical study of cyclical learning rate on neural machine translation

Weixuan Wang, Choon Meng Lee,Jianfeng Liu,Talha Colakoglu,Wei Peng

NATURAL LANGUAGE ENGINEERING（2023）

引用 2|浏览7

暂无评分

摘要

In training deep learning networks, the optimizer and related learning rate are often used without much thought or with minimal tuning, even though it is crucial in ensuring a fast convergence to a good quality minimum of the loss function that can also generalize well on the test dataset. Drawing inspiration from the successful application of cyclical learning rate policy to computer vision tasks, we explore how cyclical learning rate can be applied to train transformer-based neural networks for neural machine translation. From our carefully designed experiments, we show that the choice of optimizers and the associated cyclical learning rate policy can have a significant impact on the performance. In addition, we establish guidelines when applying cyclical learning rates to neural machine translation tasks.

查看译文

关键词

Neural machine translation, Cyclical learning rate, Optimizer, Adam, Batch size

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要