Creating a ground truth multilingual dataset of news and talk show transcriptions through crowdsourcing
Language Resources and Evaluation, Volume 51, Issue 2, 2017, Pages 283-317.
This paper describes the development of a multilingual and multigenre manually annotated speech dataset, freely available to the research community as ground truth for the evaluation of automatic transcription systems and spoken language translation systems. The dataset includes two video genres--television broadcast news and talk-shows--...More
Get fulltext within 24h
Full Text (Upload PDF)
PPT (Upload PPT)