Reassessing the value of resources for cross-lingual transfer of POS tagging models

Language Resources and Evaluation(2016)

引用 1|浏览22
暂无评分
摘要
When linguistically annotated data is scarce, as is the case for many under-resourced languages, one has to resort to less complete forms of annotations obtained from crawled dictionaries and/or through cross-lingual transfer. Several recent works have shown that learning from such partially supervised data can be effective in many practical situations. In this work, we review two existing proposals for learning with ambiguous labels which extend conventional learners to the weakly supervised setting: a history-based model using a variant of the perceptron, on the one hand; an extension of the Conditional Random Fields model on the other hand. Focusing on the part-of-speech tagging task, but considering a large set of ten languages, we show (a) that good performance can be achieved even in the presence of ambiguity, provided however that both monolingual and bilingual resources are available; (b) that our two learners exploit different characteristics of the training set, and are successful in different situations; (c) that in addition to the choice of an adequate learning algorithm, many other factors are critical for achieving good performance in a cross-lingual transfer setting.
更多
查看译文
关键词
Weakly supervised learning,POS tagging,Cross-lingual transfer
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要