Experiments with Cross-Language Speech Retrieval for Lower-Resource Languages.

AIRS(2019)

引用 1|浏览36
暂无评分
摘要
Cross-language speech retrieval systems face a cascade of errors due to transcription and translation ambiguity. Using 1-best speech recognition and 1-best translation in such a scenario could adversely affect recall if those 1-best system guesses are not correct. Accurately representing transcription and translation probabilities could therefore improve recall, although possibly at some cost in precision. The difficulty of the task is exacerbated when working with languages for which limited resources are available, since both recognition and translation probabilities may be less accurate in such cases. This paper explores the combination of expected term counts from recognition with expected term counts from translation to perform cross-language speech retrieval in which the queries are in English and the spoken content to be retrieved is in Tagalog or Swahili. Experiments were conducted using two query types, one focused on term presence and the other focused on topical retrieval. Overall, the results show that significant improvements in ranking quality result from modeling transcription and recognition ambiguity, even in lower-resource settings, and that adapting the ranking model to specific query types can yield further improvements.
更多
查看译文
关键词
languages,retrieval,speech,cross-language,lower-resource
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要