USAF: Multimodal Chinese named entity recognition using synthesized acoustic features

Information Processing & Management(2023)

引用 1|浏览11
暂无评分
摘要
Due to the particularity of Chinese word formation, the Chinese Named Entity Recognition (NER) task has attracted extensive attention over recent years. Recently, some researchers have tried to solve this problem by using a multimodal method combining acoustic features and text features. However, the text-speech data pairs required by the above methods are lacking in real-world scenarios, making it difficult to apply widely. To address this, we proposed a multimodal Chinese NER method called USAF, which uses synthesized acoustic features instead of actual human speech. USAF aligns text and acoustic features through unique position embeddings and uses a multi-head attention mechanism to fuse the features of the two modalities, which stably improves the performance of Chinese named entity recognition. To evaluate USAF, we implemented USAF on three Chinese NER datasets. Experimental results show that USAF witnesses a stable improvement compare to text-only methods on each dataset, and outperforms SOTA external-vocabulary-based method on two datasets. Specifically, compared to the SOTA external-vocabulary-based method, the F1 score of USAF is improved by 1.84 and 1.24 on CNERTA and Aishell3-NER, respectively.
更多
查看译文
关键词
Chinese named entity recognition, Multimodal learning, Acoustic feature synthesis, Position embedding
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要