Fast training of a transformer for global multi-horizon time series forecasting on tensor processing units

J.-Luis García-Nava,Juan J. Flores,Victor M. Tellez,Felix Calderon

The Journal of Supercomputing（2022）

引用 0|浏览23

暂无评分

摘要

Time Series Forecasting (TSF) is essential to key domains, and the Transformer neural network has advanced the state-of-the-art on global, multi-horizon TSF benchmarks. The quadratic time and memory complexity of the Vanilla Transformer (VT) hinders its application to Big Data environments; therefore, multiple efficient variants of the VT that lower complexity via sparse self-attention have been proposed. However, less complex algorithms do not directly produce faster executions, and machine learning models for Big Data are typically trained on accelerators designed for dense-matrix computation that render slower performance with sparse matrices. To better compare the accuracy-speed trade-off of the VT and its variants, it is essential to test them on such accelerators. We implemented a cloud-based VT on Tensor Processing Units to address this task. Experiments on large-scale datasets show that our Transformer achieves good predictive performance when compared to state-of-the-art models while reducing training times from hours to under 2 min.

查看译文

关键词

Deep learning,Self-attention,Cloud computing,TPU

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要