A BiLSTM and CTC Based Multi-Sensor Information Fusion Frame for Continuous Sign Language Recognition
2024 10th International Conference on Electrical Engineering, Control and Robotics (EECR)(2024)
Abstract
While sign language recognition has been widely applied in human-robot interaction, the applications of continuous sign language recognition (CSLR) remain limited. Currently, a major challenge in CSLR is the scarcity of publicly available continuous sign language datasets which are often in video format. Additionally, visual information often suffers from issues such as hand blur, overlap, and disappearance. To tackle these challenges, we propose a multi-sensor information fusion CSLR based on Bi-directional Long Short-Term Memory (BiLSTM) network and Connectionist Temporary Classification (CTC) algorithm. Firstly, an RGB camera and a MYO armband are used to simultaneously collect a continuous sign language dataset, which includes three different modalities of information: RGB video, the IMU signals and sEMG signals. Then, keyframes of the RGB videos are extracted using the IMU signals to save computational costs and reduce the word error rate (WER) of CSLR. To fully utilize the information from the three modalities, a multimodal-fusion-based end-to-end CSLR model is constructed based on BiLSTM network and CTC algorithm. Comparative experiments are performed to verify the effectiveness of the proposed method. Experimental results demonstrate that the combination of the three modalities achieves the best performance, with a WER as low as 10.3% in CSLR.
MoreTranslated text
Key words
continuous sign language recognition,multi-mode fusion,bi-directional long short-term memory,connection-ist temporary classification
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined