Cross-Corpus Readability Compatibility Assessment for English Texts

Zhenzhen Li, Han Ding,Shaohong Zhang

IEEE ACCESS(2023)

引用 0|浏览10
暂无评分
摘要
Text readability assessment has gained significant attention from researchers in various domains. However, the lack of exploration into corpus compatibility poses a challenge as different research groups utilize different corpora. In this study, we propose a novel evaluation framework, Cross-corpus text Readability Compatibility Assessment (CRCA), to address this issue. The framework encompasses three key components: (1) Corpus: CEFR, CLEC, CLOTH, NES, OSP, and RACE. Linguistic features, GloVe word vector representations, and their fusion features were extracted. (2) Classification models: Machine learning methods (XGBoost, SVM) and deep learning methods (BiLSTM, Attention-BiLSTM) were employed. (3) Compatibility metrics: RJSD, RRNSS, and NDCG metrics. Our findings revealed: (1) Validated corpus compatibility, with OSP standing out as significantly different from other datasets. (2) An adaptation effect among corpora, feature representations, and classification methods. (3) Consistent outcomes across the three metrics, validating the robustness of the compatibility assessment framework. The outcomes of this study offer valuable insights into corpus selection, feature representation, and classification methods, and it can also serve as a beginning effort for cross-corpus transfer learning.
更多
查看译文
关键词
Feature extraction,Measurement,Readability metrics,Deep learning,Machine learning,Training,Indexes,Transfer learning,Compatibility assessment,cross-corpus,text readability assessment,transfer learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要