Audiovisual Zooming: What You See Is What You Hear

Arun Asokan Nair,Austin Reiter,Changxi Zheng,Shree Nayar

Proceedings of the 27th ACM International Conference on Multimedia（2019）

引用 17|浏览63

暂无评分

摘要

When capturing videos on a mobile platform, often the target of interest is contaminated by the surrounding environment. To alleviate the visual irrelevance, camera panning and zooming provide the means to isolate a desired field of view (FOV). However, the captured audio is still contaminated by signals outside the FOV. This effect is unnatural---for human perception, visual and auditory cues must go hand-in-hand. We present the concept ofAudiovisual Zooming, whereby an auditory FOV is formed to match the visual. Our framework is built around the classic idea of beamforming, a computational approach to enhancing sound from a single direction using a microphone array. Yet, beamforming on its own can not incorporate the auditory FOV, as the FOV may include an arbitrary number of directional sources. We formulate our audiovisual zooming as a generalized eigenvalue problem and propose an algorithm for efficient computation on mobile platforms. To inform the algorithmic and physical implementation, we offer a theoretical analysis of our algorithmic components as well as numerical studies for understanding various design choices of microphone arrays. Finally, we demonstrate audiovisual zooming on two different mobile platforms: a mobile smartphone and a 360$^\circ $ spherical imaging system for video conference settings.

查看译文

关键词

audio enhancement, audiovisual zooming, beamforming

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要