Creating a ground truth multilingual dataset of news and talk show transcriptions through crowdsourcing

Language Resources and Evaluation, Volume 51, Issue 2, 2017, Pages 283-317.

Cited by: 3|Views34
EI

Abstract:

This paper describes the development of a multilingual and multigenre manually annotated speech dataset, freely available to the research community as ground truth for the evaluation of automatic transcription systems and spoken language translation systems. The dataset includes two video genres--television broadcast news and talk-shows--...More

Code:

Data:

Your rating :
0

 

Tags
Comments