MCSSpell:Optimal Path Selection of Candidate Characters by Integrating Multimodal Information and Copy Mechanism for Chinese Spelling Correction.

2023 International Conference on Asian Language Processing (IALP)(2023)

引用 0|浏览1
Chinese spelling correction (CSC) is a traditional Natural Language Processing task that aims to detect and correct spelling errors in Chinese text. Many advanced studies have adopted BERT non-autoregressive language models, masking method performs well on the CSC task. However, the model still has certain limitations. Firstly, BERT predicts tokens based on the assumption of independence, meaning it does not learn the relationships between masked tokens. As a result, it fails to model character-level dependencies effectively. Additionally, BERT tends to correct characters to more common habitual expressions, leading to the problem of over correction. To address these issues, this paper proposes a novel Candidate Character Optimal Path Selector (CCOPS) that models the dependencies between adjacent Chinese characters using attention mechanisms to alleviate the issue of character coherence. We incorporate a particular copy mechanism into the selector, which guides the model to choose the original character when both the original input and the candidate output are plausible, mitigating the problem of over correction. Furthermore, we effectively integrate the multimodal information of characters to guide error correction in terms of semantics, phonetics, and visual similarities. The experiments show that our model, compared to the latest research, has improved Detection-Level F1 by 2.7%, 1.2%, and 1.2% on three datasets, and Correction-Level F1 by 2.6%" 0.8%" and 1.3% respectively,
Multimodal,CSC,Transformer,Copy Mecha-nism,BERT
AI 理解论文