Do NLP and machine learning improve traditional readability formulas?

Thomas François,Eleni Miltsakaki

PITR '12: Proceedings of the First Workshop on Predicting and Improving Text Readability for target reader populations（2012）

引用 18|浏览52

暂无评分

摘要

Readability formulas are methods used to match texts with the readers' reading level. Several methodological paradigms have previously been investigated in the field. The most popular paradigm dates several decades back and gave rise to well known readability formulas such as the Flesch formula (among several others). This paper compares this approach (henceforth "classic") with an emerging paradigm which uses sophisticated NLP-enabled features and machine learning techniques. Our experiments, carried on a corpus of texts for French as a foreign language, yield four main results: (1) the new readability formula performed better than the "classic" formula; (2) "non-classic" features were slightly more informative than "classic" features; (3) modern machine learning algorithms did not improve the explanatory power of our readability model, but allowed to better classify new observations; and (4) combining "classic" and "non-classic" features resulted in a significant gain in performance.

查看译文

关键词

Readability formula,Flesch formula,new readability formula,readability model,methodological paradigm,modern machine,new observation,popular paradigm,explanatory power,foreign language,traditional readability formula

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要