General Sequence Teacher-Student Learning.

Jeremy Heng Meng Wong, Mark John Francis Gales, Yu Wang

IEEE Transactions on Audio, Speech, and Language Processing（2019）

引用 9|浏览52

暂无评分

摘要

In automatic speech recognition, performance gains can often be obtained by combining an ensemble of multiple models. However, this can be computationally expensive when performing recognition. Teacher–student learning alleviates this cost by training a single student model to emulate the combined ensemble behaviour. Only this student needs to be used for recognition. Previously investigated teacher–student criteria often limit the forms of diversity allowed in the ensemble, and only propagate information from the teachers to the student at the frame level. This paper addresses both of these issues by examining teacher–student learning within a sequence-level framework, and assessing the flexibility that these approaches offer. Various sequence-level teacher–student criteria are examined in this work, to propagate sequence posterior information. A training criterion based on the Kullback–Leibler KL-divergence between context-dependent state sequence posteriors is proposed that allows for a diversity of state cluster sets to be present in the ensemble. This criterion is shown to be an upper bound to a more general KL-divergence between word sequence posteriors, which places even fewer restrictions on the ensemble diversity, but whose gradient can be expensive to compute. These methods are evaluated on the augmented multi-party interaction AMI meeting transcription and MGB-3 television broadcast audio tasks.

查看译文

关键词

Diversity reception,Computational modeling,Hidden Markov models,Training,Acoustics,Topology,Speech processing

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要