PiSSA: Principal Singular Values and Singular Vectors Adaptation of Large Language Models
CoRR(2024)
摘要
As the parameters of LLMs expand, the computational cost of fine-tuning the
entire model becomes prohibitive. To address this challenge, we introduce a
PEFT method, Principal Singular values and Singular vectors Adaptation (PiSSA),
which optimizes a significantly reduced parameter space while achieving or
surpassing the performance of full-parameter fine-tuning. PiSSA is inspired by
Intrinsic SAID, which suggests that pre-trained, over-parametrized models
inhabit a space of low intrinsic dimension. Consequently, PiSSA represents a
matrix W within the model by the product of two trainable matrices A and B,
plus a residual matrix W^res for error correction. SVD is employed to
factorize W, and the principal singular values and vectors of W are utilized to
initialize A and B. The residual singular values and vectors initialize the
residual matrix W^res, which keeps frozen during fine-tuning. Notably,
PiSSA shares the same architecture with LoRA. However, LoRA approximates Delta
W through the product of two matrices, A, initialized with Gaussian noise, and
B, initialized with zeros, while PiSSA initializes A and B with principal
singular values and vectors of the original matrix W. PiSSA can better
approximate the outcomes of full-parameter fine-tuning at the beginning by
changing the essential parts while freezing the "noisy" parts. In comparison,
LoRA freezes the original matrix and updates the "noise". This distinction
enables PiSSA to convergence much faster than LoRA and also achieve better
performance in the end. Due to the same architecture, PiSSA inherits many of
LoRA's advantages, such as parameter efficiency and compatibility with
quantization. Leveraging a fast SVD method, the initialization of PiSSA takes
only a few seconds, inducing negligible cost of switching LoRA to PiSSA.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要