DutchSemCor: Targeting the ideal sense-tagged corpus.

Piek Vossen,Attila Gorog,Ruben Izquierdo,A Van Den Bosch, Vu,Faculteit Der Letteren

LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION（2012）

引用 27|浏览26

暂无评分

摘要

Word Sense Disambiguation (WSD) systems require large sense-tagged corpora along with lexical databases to reach satisfactory results. The number of English language resources for developed WSD increased in the past years while most other languages are still under-resourced. The situation is no different for Dutch. In order to overcome this data bottleneck, the DutchSemCor project will deliver a Dutch corpus that is sense-tagged with senses from the Cornetto lexical database. In this paper, we discuss the different conflicting requirements for a sense-tagged corpus and our strategies to fulfill them. We report on a first series of experiments to support our semi-automatic approach to build the corpus.

查看译文

关键词

Active Learning,Word Sense Disambiguation,Semantic Annotation,Machine Learning

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要