Using latent information for natural language processing tasks
Using latent information for natural language processing tasks(2013)
摘要
In a broad sense, latent information in natural language processing tasks refers to any information that is not plainly observable from raw data. Such latent information is found in abundance in many natural language processing tasks. Learning latent information itself could be the purpose of the task or it can be learned and utilized to improve relevant tasks. For example, in unsupervised learning of word alignment from parallel corpora, learning latent information is the task. Learning latent annotation for context free grammar falls into the latter category since latent annotation leads to better parsing accuracy. Depending on the availability of the data, latent information may be learned in a supervised manner or an unsupervised manner. This dissertation presents three different types of latent information that are learned and used to improve various natural language processing tasks, mainly focusing on different stages of machine translation. First, we discuss unsupervised learning of tokenization from parallel corpora using alignment between a bilingual sentence pair as latent information. Second, we examine using empty categories to improve parsing and machine translation. In these tasks, empty categories are latent information that are learned from raw text and applied to the respective tasks. Finally, we look at learning latent annotation for synchronous context free grammar, which leads us to more accurate and faster string-to-tree machine translation.
更多查看译文
关键词
unsupervised manner,latent annotation,various natural language processing,unsupervised learning,parallel corpus,empty category,natural language processing task,machine translation,string-to-tree machine translation,latent information
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络