The JHU Speaker Recognition System for the VOiCES 2019 Challenge

INTERSPEECH(2019)

引用 35|浏览127
暂无评分
摘要
This paper describes the systems developed by the JHU team for the speaker recognition track of the 2019 VOiCES from a Distance Challenge. On this far-field task, we achieved good performance using systems based on state-of-the-art deep neural network (DNN) embeddings. In this paradigm, a DNN maps variable-length speech segments to speaker embeddings, called x-vectors, that are then classified using probabilistic linear discriminant analysis (PLDA). Our submissions were composed of three x-vector-based systems that differed primarily in the DNN architecture, temporal pooling mechanism, and training objective function. On the evaluation set, our best single-system submission used an extended time-delay architecture, and achieved 0.435 in actual DCF, the primary evaluation metric. A fusion of all three x-vector systems was our primary submission, and it obtained an actual DCF of 0.362.
更多
查看译文
关键词
speaker recognition, VOiCES Challenge 2019
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要