Speaker diarization using an end-to-end model

Quan Wang, Yash Sheth,Ignacio Lopez Moreno,Li Wan

user-5d8054e8530c708f9920ccce（2020）

引用 0|浏览40

暂无评分

摘要

Techniques are described for training and/or utilizing an end-to-end speaker diarization model. In various implementations, the model is a recurrent neural network (RNN) model, such as an RNN model that includes at least one memory layer, such as a long short-term memory (LSTM) layer. Audio features of audio data can be applied as input to an end-to-end speaker diarization model trained according to implementations disclosed herein, and the model utilized to process the audio features to generate, as direct output over the model, speaker diarization results. Further, the end-to-end speaker diarization model can be a sequence-to-sequence model, where the sequence can have variable length. Accordingly, the model can be utilized to generate speaker diarization results for any of various length audio segments.

查看译文

关键词

Speaker diarisation,Recurrent neural network,Speech recognition,End-to-end principle,Computer science,Variable length

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要