Deep Audio-visual Learning: A Survey

Hao Zhu,Man-Di Luo,Rui Wang,Ai-Hua Zheng,Ran He

INTERNATIONAL JOURNAL OF AUTOMATION AND COMPUTING（2021）

引用 57|浏览87

暂无评分

摘要

Audio-visual learning, aimed at exploiting the relationship between audio and visual modalities, has drawn considerable attention since deep learning started to be used successfully. Researchers tend to leverage these two modalities to improve the performance of previously considered single-modality tasks or address new challenging problems. In this paper, we provide a comprehensive survey of recent audio-visual learning development. We divide the current audio-visual learning tasks into four different subfields: audio-visual separation and localization, audio-visual correspondence learning, audio-visual generation, and audio-visual representation learning. State-of-the-art methods, as well as the remaining challenges of each subfield, are further discussed. Finally, we summarize the commonly used datasets and challenges.

查看译文

关键词

Deep audio-visual learning, audio-visual separation and localization, correspondence learning, generative models, representation learning

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要