Many heads but one brain: FusionBrain-a single multimodal multitask architecture and a competition br

D. D. Bakshandaeva,D. V. Dimitrov,V. S. Arkhipkin,A. V. Shonenkov,M. S. Potanin,D. K. Karachev,A. V. Kuznetsov,A. D. Voronov,A. A. Petiushko,V. F. Davydova,E. V. Tutubalina

ArXiv（2023）

引用 0|浏览33

暂无评分

摘要

Supporting the current trend in the AI community, we present the AI Journey 2021 Challenge called FusionBrain, the first competition which is targeted to make a universal architecture which could process different modalities (in this case, images, texts, and code) and solve multiple tasks for vision and language. The FusionBrain Challenge combines the following specific tasks: Code2code Translation, Handwritten Text recognition, Zero-shot Object Detection, and Visual Question Answering. We have created datasets for each task to test the participants' submissions on it. Moreover, we have collected and made publicly available a new handwritten dataset in both English and Russian, which consists of 94,128 pairs of images and texts. We also propose a multimodal and multitask architecture - a baseline solution, in the centre of which is a frozen foundation model and which has been trained in Fusion mode along with Single-task mode. The proposed Fusion approach proves to be competitive and more energy-efficient compared to the task-specific one

查看译文

关键词

multimodality,multitask,bilinguality,foundation models,FusionBrain challenge

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要