Dancing with the Sound in Edge Computing Environments

Wangli Hao,Meng Han,Shancang Li,Fuzhong Li

WIRELESS NETWORKS（2024）

引用 0|浏览5

暂无评分

摘要

Conventional motion predictions have achieved promising performance. However, the length of the predicted motion sequences of most literatures are short, and the rhythm of the generated pose sequence has rarely been explored. To pursue high quality, rhythmic, and long-term pose sequence prediction, this paper explores a novel dancing with the sound task, which is appealing and challenging in computer vision field. To tackle this problem, a novel model is proposed, which takes the sound as an indicator input and outputs the dancing pose sequence. Specifically, our model is based on the variational autoencoder (VAE) framework, which encodes the continuity and rhythm of the sound information into the hidden space to generate a coherent, diverse, rhythmic and long-term pose video. Extensive experiments validated the effectiveness of audio cues in the generation of dancing pose sequences. Concurrently, a novel dataset of audiovisual multimodal sequence generation has been released to promote the development of this field.

查看译文

关键词

Deep learning,Video sequence generation,Audio guided generation,Multimodel perception,Edge computing

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要