Web data harvesting for speech understanding grammar induction

Ioannis Klasinas,Alexandros Potamianos,Elias Iosif,Spiros Georgiladakis,Gianluca Mameli

openalex（2013）

引用 1|浏览0

暂无评分

摘要

The development of a grammar for a spoken dialogue system can be greatly accelerated by using a corpus describing the application. However the development of such a corpus is a slow and expensive process. This paper proposes unsupervised methods for finding relevant corpora in the Web and mining the most informative parts. We show that by utilizing perplexity we are able to increase the in-domainess (precision) of the mined corpora, while by utilizing the rank of the web search engine we can increase the generalizability (recall). The results show that using only unsupervised and language independent methods we can compete with corpora created manually with expert knowledge.

查看译文

关键词

speech,web,data

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要