FinGPT-HPC: Efficient Pretraining and Finetuning Large Language Models for Financial Applications with High-Performance Computing
CoRR(2024)
摘要
Large language models (LLMs) are computationally intensive. The computation
workload and the memory footprint grow quadratically with the dimension (layer
width). Most of LLMs' parameters come from the linear layers of the transformer
structure and are highly redundant. These linear layers contribute more than
80
finetune LLMs efficiently, there are three major challenges to address: 1)
reducing redundancy of the linear layers; 2) reducing GPU memory footprint; 3)
improving GPU utilization when using distributed training. Prior methods, such
as LoRA and QLoRA, utilized low-rank matrices and quantization to reduce the
number of trainable parameters and model size, respectively. However, the
resulting model still consumes a large amount of GPU memory. In this paper, we
present high-performance GPU-based methods that exploit low-rank structures to
pretrain and finetune LLMs for financial applications. We replace one
conventional linear layer of the transformer structure with two narrower linear
layers, which allows us to reduce the number of parameters by several orders of
magnitude. By quantizing the parameters into low precision (8-bit and 4-bit),
the memory consumption of the resulting model is further reduced. Compared with
existing LLMs, our methods achieve a speedup of 1.3X and a model compression
ratio of 2.64X for pretaining without accuracy drop. For finetuning, our
methods achieve an average accuracy increase of 6.3
and financial tasks, respectively, and GPU memory consumption ratio of 6.3X.
The sizes of our models are smaller than 0.59 GB, allowing inference on a
smartphone.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要