Enhancing Data Quality in Federated Fine-Tuning of Foundation Models
CoRR(2024)
摘要
In the current landscape of foundation model training, there is a significant
reliance on public domain data, which is nearing exhaustion according to recent
research. To further scale up, it is crucial to incorporate collaboration among
multiple specialized and high-quality private domain data sources. However, the
challenge of training models locally without sharing private data presents
numerous obstacles in data quality control. To tackle this issue, we propose a
data quality control pipeline for federated fine-tuning of foundation models.
This pipeline computes scores reflecting the quality of training data and
determines a global threshold for a unified standard, aiming for improved
global performance. Our experiments show that the proposed quality control
pipeline facilitates the effectiveness and reliability of the model training,
leading to better performance.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要