Speaker recognition using isomorphic graph attention network based pooling on self-supervised representation *

APPLIED ACOUSTICS(2024)

引用 0|浏览7
暂无评分
摘要
The emergence of self -supervised representation (i.e., wav2vec 2.0) allows speaker -recognition approaches to process spoken signals through foundation models built on speech data. Nevertheless, effective fusion on the representation requires further investigating, due to the inclusion of fixed or sub -optimal temporal pooling strategies. Despite of improved strategies considering graph learning and graph attention factors, non-injective aggregation still exists in the approaches, which may influence the performance for speaker recognition. In this regard, we propose a speaker recognition approach using Isomorphic Graph ATtention network (IsoGAT) on self -supervised representation. The proposed approach contains three modules of representation learning, graph attention, and aggregation, jointly considering learning on the self -supervised representation and the IsoGAT. Then, we perform experiments for speaker recognition tasks on VoxCeleb1&2 datasets, with the corresponding experimental results demonstrating the recognition performance for the proposed approach, compared with existing pooling approaches on the self -supervised representation.
更多
查看译文
关键词
Speaker recognition,Self-supervised representation,Isomorphic graph attention network,Pooling
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要