Towards Multi-modal Transformers in Federated Learning
CoRR(2024)
摘要
Multi-modal transformers mark significant progress in different domains, but
siloed high-quality data hinders their further improvement. To remedy this,
federated learning (FL) has emerged as a promising privacy-preserving paradigm
for training models without direct access to the raw data held by different
clients. Despite its potential, a considerable research direction regarding the
unpaired uni-modal clients and the transformer architecture in FL remains
unexplored. To fill this gap, this paper explores a transfer multi-modal
federated learning (MFL) scenario within the vision-language domain, where
clients possess data of various modalities distributed across different
datasets. We systematically evaluate the performance of existing methods when a
transformer architecture is utilized and introduce a novel framework called
Federated modality complementary and collaboration (FedCola) by addressing the
in-modality and cross-modality gaps among clients. Through extensive
experiments across various FL settings, FedCola demonstrates superior
performance over previous approaches, offering new perspectives on future
federated training of multi-modal transformers.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要