See me Speaking? Differentiating on Whether Words are Spoken On Screen or Off to Optimize Machine Dubbing

Shravan Nayak,Timo Baumann,Supratik Bhattacharya,Alina Karakanta,Matteo Negri,Marco Turchi

Multimodal Interfaces and Machine Learning for Multimodal Interaction（2020）

引用 4|浏览20

暂无评分

摘要

ABSTRACTDubbing is the art of finding a translation from a source into a target language that can be lip-synchronously revoiced, i. e., that makes the target language speech appear as if it was spoken by the very actors all along. Lip synchrony is essential for the full-fledged reception of foreign audiovisual media, such as movies and series, as violated constraints of synchrony between video (lips) and audio (speech) lead to cognitive dissonance and reduce the perceptual quality. Of course, synchrony constraints only apply to the translation when the speaker's lips are visible on screen. Therefore, deciding whether to apply synchrony constraints requires an automatic method for detecting whether an actor's lips are visible on screen for a given stretch of speech or not. In this paper, we attempt, for the first time, to classify on- from off-screen speech based on a corpus of real-world television material that has been annotated word-by-word for the visibility of talking lips on screen. We present classification experiments in which we classify

查看译文

关键词

audiovisual machine translation, dubbing, multi-modal speech processing, activity recognition

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要