Sabr: Sparse, Anchor-Based Representation Of The Speech Signal

Christopher Liberatore,Sandesh Aryal,Zelun Wang,Seth Polsley,Ricardo Gutierrez-Osuna

16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5（2015）

引用 25|浏览30

暂无评分

摘要

We present SABR (Sparse, Anchor-Based Representation), an analysis technique to decompose the speech signal into speaker-dependent and speaker-independent components. Given a collection of utterances for a particular speaker, SABR uses the centroid for each phoneme as an acoustic "anchor," then applies Lasso regularization to represent each speech frame as a sparse non-negative combination of the anchors. We illustrate the performance of the method on a speaker-independent phoneme recognition task and a voice conversion task. Using a linear classifier, SABR weights achieve significantly higher phoneme recognition rates than Mel frequency Cepstral coefficients. SABR weights can also be used directly to perform accent conversion without the need to train a speaker-to-speaker regression model.

查看译文

关键词

speech analysis, voice conversion, speaker independent representation, auditory phonetics, sparse coding

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要