Rethinking the Value of Gazetteer in Chinese Named Entity Recognition

NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2022, PT I(2022)

引用 1|浏览23
暂无评分
摘要
Gazetteer is widely used in Chinese named entity recognition (NER) to enhance span boundary detection and type classification. However, to further understand the generalizability and effectiveness of gazetteers, the NLP community still lacks a systematic analysis of the gazetteer-enhanced NER model. In this paper, we first re-examine the effectiveness of several common practices of the gazetteer-enhanced NER models and carry out a series of detailed analyses to evaluate the relationship between the model performance and the gazetteer characteristics, which can guide us to build a more suitable gazetteer. The findings of this paper are as follows: (1) the gazetteer has a positive impact on the NER model in most situations. (2) the performance of the NER model greatly benefits from the high-quality pre-trained lexeme embeddings. (3) a good gazetteer should cover more entities that can be matched in both the training set and testing set.
更多
查看译文
关键词
Gazetteer, Chinese named entity recognition, Knowledge enhancement
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要