Unsupervised Word Discovery from Phonetic Input Using Nested Pitman-Yor Language Modeling
international conference on robotics and automation(2013)
摘要
In this paper we consider the unsupervised word discovery from phonetic input. We employ a word segmentation algorithm which simultaneously develops a lexicon, i.e., the transcription of a word in terms of a phone sequence, learns a n-gram language model describing word and word sequence probabilities, and carries out the segmentation itself. The underlying statistical model is that of a Pitman-Yor process, a concept known from Bayesian non-parametrics, which allows for an a priori unknown and unlimited number of different words. Using a hierarchy of Pitman-Yor processes, language models of different order can be employed and nesting it with another hierarchy of Pitman-Yor processes on the phone level allows for backing off unknown word unigrams by phone m-grams. We present results on a large-vocabulary task, assuming an error-free phone sequence is given. We finish by discussing options how to cope with noisy phone sequences.
更多查看译文
关键词
Text segmentation,Language model,Lexicon,Phone,Transcription (linguistics),Statistical model,Natural language processing,Speech recognition,Computer science,Artificial intelligence
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络