Unraveling Key Factors of Knowledge Distillation
arxiv(2023)
摘要
Knowledge distillation, a technique for model compression and performance
enhancement, has gained significant traction in Neural Machine Translation
(NMT). However, existing research primarily focuses on empirical applications,
and there is a lack of comprehensive understanding of how student model
capacity, data complexity, and decoding strategies collectively influence
distillation effectiveness. Addressing this gap, our study conducts an in-depth
investigation into these factors, particularly focusing on their interplay in
word-level and sequence-level distillation within NMT. Through extensive
experimentation across datasets like IWSLT13 En$\rightarrow$Fr, IWSLT14
En$\rightarrow$De, and others, we empirically validate hypotheses related to
the impact of these factors on knowledge distillation. Our research not only
elucidates the significant influence of model capacity, data complexity, and
decoding strategies on distillation effectiveness but also introduces a novel,
optimized distillation approach. This approach, when applied to the IWSLT14
de$\rightarrow$en translation task, achieves state-of-the-art performance,
demonstrating its practical efficacy in advancing the field of NMT.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要