The JHU Speaker Recognition System for the VOiCES 2019 Challenge

David Snyder,Jesús Villalba,Nanxin Chen,Daniel Povey,Gregory Sell,Najim Dehak,Sanjeev Khudanpur

INTERSPEECH（2019）

引用 35|浏览127

暂无评分

摘要

This paper describes the systems developed by the JHU team for the speaker recognition track of the 2019 VOiCES from a Distance Challenge. On this far-field task, we achieved good performance using systems based on state-of-the-art deep neural network (DNN) embeddings. In this paradigm, a DNN maps variable-length speech segments to speaker embeddings, called x-vectors, that are then classified using probabilistic linear discriminant analysis (PLDA). Our submissions were composed of three x-vector-based systems that differed primarily in the DNN architecture, temporal pooling mechanism, and training objective function. On the evaluation set, our best single-system submission used an extended time-delay architecture, and achieved 0.435 in actual DCF, the primary evaluation metric. A fusion of all three x-vector systems was our primary submission, and it obtained an actual DCF of 0.362.

查看译文

关键词

speaker recognition, VOiCES Challenge 2019

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要