Multi modal human action recognition for video content matching

MULTIMEDIA TOOLS AND APPLICATIONS(2020)

引用 4|浏览29
暂无评分
摘要
Human action recognition (HAR)in videos is a challenging task in computer vision. Conventional methods are prone to explore the spatiotemporal or optical representations for video actions. However, optical representation might be inefficient in some real-life situations, such as object occlusion and dim light. To address this issue, this paper presents a novel approach for human action recognition by jointly exploiting video and Wi-Fi clues. We leverage the fact that Wi-Fi signals carry discriminative information of human actions, which is robust to optical limitations. To validate this innovative thought, we conceive a practical framework for HAR and setup a dataset containing both video clips and Wi-Fi Channel State Information of human actions. The 3D convolutional neural network was used to extract the video features and the statistical algorithms were used to extract radio features. A classical linear support vector machine is employed as the classifier after the video and radio feature fusion. Comprehensive experiments on this dataset achieved desirable results with the maximum improvement in accuracy by 10%. This demonstrates our promising findings: with the aid of Wi-Fi Channel State Information, the performance of the video action recognition methods can be improved significantly, even under the optical limitation.
更多
查看译文
关键词
Human action recognition, Video and Wi-Fi clues, Multi modal learning, Convolutional neural networks
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要