Divide-and-Conquer Text Simplification by Scalable Data Enhancement

Sanqiang Zhao, Rui Ma, Hui Shen,Daqing He

openalex(2022)

引用 0|浏览0
暂无评分
摘要
Text simplification is a task to reduce the complexity of a text while retain its original meaning. It can facilitate people with low-literacy skills or language impairments, such as children and individuals with dyslexia and aphasia, to read and understand complicated materials. Normally, substitution, deletion, reordering, and splitting are considered as four core operations for performing text simplification. Thus an ideal model should be capable of executing these operations appropriately to simplify a text. However, by examining the degree that each operation is exerted in different datasets, we observe that there is a salient discrepancy between the human annotation and existing training data that is widely used for training simplification models. To alleviate this discrepancy, we propose an unsupervised data construction method that distills each simplifying operation into data via different automatic data enhancement measures. The empirical results demonstrate that the resulting dataset SimSim can support models to achieve better performance by performing all operations properly.
更多
查看译文
关键词
enhancement,text,divide-and-conquer
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要