HMA: An Efficient Training Method for NLP Models

2021 5TH INTERNATIONAL CONFERENCE ON INNOVATION IN ARTIFICIAL INTELLIGENCE (ICIAI 2021)(2021)

引用 0|浏览18
暂无评分
摘要
Nowadays, deep learning has been widely used for solving natural language processing (NLP) problems. Embedding matrices are common-used in the NLP deep models for automatical feature learning. However, the sparsity of embedding matrices makes it challenging to efficiently train the NLP models in data parallelism. When training with synchronous optimization methods, the aggregation on sparse gradients brings high communication cost and low scalability for distributed training. In this paper, we combine Model Average (MA) and synchronous optimization methods together, and propose HMA, a hybrid training method for NLP deep models. Furthermore, we implement HMA method in Horovod+TensorFlow training framework and conduct experimental evaluation with representative NLP models. For NLP models with a large number of sparse parameters, HMA saves over 30% wall-clock time compared with the state-of-the-art distributed training framework, while maintaining the same final training loss.
更多
查看译文
关键词
distributed training, deep learning, model average, local sgd
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要