Association measures for collocation extraction

INTERNATIONAL JOURNAL OF CORPUS LINGUISTICS(2024)

引用 0|浏览9
暂无评分
摘要
In this study, we propose a new evaluation scheme to assess the strengths and limitations of collocation extraction measures and explore type-sensitive methods for extracting collocations. We introduced the pooling strategy widely used in Information Retrieval and automated the evaluation process using online dictionaries. Sixteen well-known metrics are evaluated based on their effectiveness and then distributional and linguistic compared. The results show that Group A methods (e.g. z-score, Dice, PMI) are more effective in extracting low-frequency collocations with relatively small extraction scales. In contrast, Group B methods (e.g. t-test, LMI, LLR) perform better at finding high-frequency collocations, most of which outperform Group A methods as the extraction scale increases. Moreover, Group A prefers NN collocations, while Group B identifies collocations with a wide range of syntactic structures. This study provides suggestions for studies to identify hybrid extraction methods as well as for language educators and dictionary compilers.
更多
查看译文
关键词
collocation extraction,pooling,association measure,statistical metrics,evaluation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要