Quantized depth image and skeleton-based multimodal dynamic hand gesture recognition

The Visual Computer(2023)

引用 0|浏览11
暂无评分
摘要
An existing approach to dynamic hand gesture recognition is to use multimodal-fusion CRNN (Convolutional Recurrent Neural Networks) on depth images and corresponding 2D hand skeleton coordinates. However, an underlying problem in this method is that raw depth images possess a very low contrast in the hand ROI (region of interest). They do not highlight the details which are important to fine-grained hand gesture recognition details such as finger orientation, the overlap between the fingers and the palm, or overlap between multiple fingers. To address this issue, we propose generating quantized depth images as an alternative input modality to raw depth images. This creates sharp relative contrasts between key parts of the hand, which improves gesture recognition performance. In addition, we explore some ways to tackle the high variance problem in previously researched multimodal-fusion CRNN architectures. We obtained accuracies of 90.82 and 89.21% (14 and 28 gestures, respectively) on the DHG-14/28 dataset and accuracies of 93.81 and 90.24% (14 and 28 gestures, respectively) on the SHREC-2017 dataset, which is a significant improvement over previous multimodal-dusion CRNNs.
更多
查看译文
关键词
Convolutional recurrent neural networks,Dynamic hand gesture recognition,Multimodal-fusion networks,Depth image,Hand skeleton joint points
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要