Chinese Lexical Simplification

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING(2021)

引用 11|浏览102
暂无评分
摘要
Lexical simplification has attracted much attention in many languages, which is the process of replacing complex words in a given sentence with simpler alternatives of equivalent meaning. Although the richness of vocabulary in Chinese makes the text very difficult to read for children and non-native speakers, there is no research work for the Chinese lexical simplification (CLS) task. To circumvent difficulties in acquiring annotations, we manually create the first benchmark dataset for CLS, which can be used for evaluating the lexical simplification systems automatically. To acquire a more thorough comparison, we present five different types of methods as baselines to generate substitute candidates for the complex word that includes synonym-based approach, word embedding-based approach, BERT-based approach, sememe-based approach, and a hybrid approach. Finally, we design the experimental evaluation of these baselines and discuss their advantages and disadvantages. To our best knowledge, this is the first study for CLS task.
更多
查看译文
关键词
Task analysis, Benchmark testing, Dictionaries, Bit error rate, Annotations, Speech processing, Feature extraction, Lexical simplification, BERT, unsupervised, pretrained language model
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要