Demv-Matchmaker: Emotional Temporal Course Representation And Deep Similarity Matching For Automatic Music Video Generation

2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2016)

引用 11|浏览6
暂无评分
摘要
This paper presents a deep similarity matching-based emotion-oriented music video (MV) generation system, called DEMV-matchmaker, which utilizes an emotion-oriented deep similarity matching (EDSM) metric as a bridge to connect music and video. Specifically, we adopt an emotional temporal course model (ETCM) to respectively learn the relationship between music and its emotional temporal phase sequence and the relationship between video and its emotional temporal phase sequence from an emotion-annotated MV corpus. An emotional temporal structure preserved histogram (ETPH) representation is proposed to keep the recognized emotional temporal phase sequence information for EDSM metric construction. A deep neural network (DNN) is then applied to learn an EDSM metric based on the ETPHs for the given positive (official) and negative (artificial) MV examples. For MV generation, the EDSM metric is applied to measure the similarity between ETPHs of video and music. The results of objective and subjective experiments demonstrate that DEMV-matchmaker performs well and can generate appealing music videos that can enhance the viewing and listening experience.
更多
查看译文
关键词
Automatic music video generation,deep similarity learning,cross-modal media retrieval
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要