Did You Say U2 Or Youtube? Inferring Implicit Transcripts From Voice Search Logs

Milad Shokouhi,Umut Ozertem,Nick Craswell

WWW '16: 25th International World Wide Web Conference Montréal Québec Canada April, 2016（2016）

引用 24|浏览97

暂无评分

摘要

Web search via voice is becoming increasingly popular, taking advantage of recent advances in automatic speech recognition. Speech recognition systems are trained using audio transcripts, which can be generated by a paid annotator listening to some audio and manually transcribing it. This paper considers an alternative source of training data for speech recognition, called implicit transcription. This is based on Web search clicks and reformulations, which can be interpreted as validating or correcting the recognition done during a real Web search. This can give a large amount of free training data that matches the exact characteristics of real incoming voice searches and the implicit transcriptions can better reflect the needs of real users because they come from the user who generated the audio. On an overall basis we demonstrate that the new training data has value in improving speech recognition. We further show that the in-context feedback from real users can allow the speech recognizer to exploit contextual signals, and reduce the recognition error rate further by up to 23%.

查看译文

关键词

Speech retrieval,speech recognition,personalized search

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要