Part-of-Speech Tagging for Middle English through Alignment and Projection of Parallel Diachronic Texts.
EMNLP-CoNLL(2007)
摘要
We demonstrate an approach for inducing a tagger for historical languages based on ex- isting resources for their modern varieties. Tags from Present Day English source text are projected to Middle English text using alignments on parallel Biblical text. We explore the use of multiple alignment ap- proaches and a bigram tagger to reduce the noise in the projected tags. Finally, we train a maximum entropy tagger on the output of the bigram tagger on the target Biblical text and test it on tagged Middle English text. This leads to tagging accuracy in the low 80's on Biblical test material and in the 60's on other Middle English material. Our re- sults suggest that our bootstrapping meth- ods have considerable potential, and could be used to semi-automate an approach based on incremental manual annotation.
更多查看译文
关键词
multiple alignment,maximum entropy
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络