Multimodal Representation Learning for Real-World Applications

PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, ICMI 2022（2022）

Cited 0|Views6

No score

Abstract

Multimodal representation learning has shown tremendous improvements in recent years. An extensive set of works for fusing multiple modalities have shown promising results on the public benchmarks. However, most famous works target unrealistic settings or toy datasets, and a considerable gap exists between the real-world implications of the existing methods. In this work, we aim to bridge the gap between the well-defined benchmark settings and the real-world use cases. We aim to explore architectures inspired by existing promising approaches that have the potential to be implemented in real-world instances. Moreover, we also try to move the research forward by addressing questions that can be solved using multimodal approaches and have a considerable impact on the community. With this work, we attempt to leverage the multimodal representation learning methods, which directly apply to real-world settings.

Translated text

Key words

Multimodal Representations,Multimodal Fusion,Cross-modal Processing,Deep Learning Architectures,Machine Learning

AI Read Science

Must-Reading Tree

Example

Generate MRT to find the research sequence of this paper

Chat Paper

Summary is being generated by the instructions you defined