Web data harvesting for speech understanding grammar induction

openalex(2013)

引用 1|浏览0
暂无评分
摘要
The development of a grammar for a spoken dialogue system can be greatly accelerated by using a corpus describing the application. However the development of such a corpus is a slow and expensive process. This paper proposes unsupervised methods for finding relevant corpora in the Web and mining the most informative parts. We show that by utilizing perplexity we are able to increase the in-domainess (precision) of the mined corpora, while by utilizing the rank of the web search engine we can increase the generalizability (recall). The results show that using only unsupervised and language independent methods we can compete with corpora created manually with expert knowledge.
更多
查看译文
关键词
speech,web,data
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要