Automatic Recognition of Sound Categories from Their Vocal Imitation Using Audio Primitives Automatically Found by SI-PLCA and HMM.

CMMR（2017）

引用 25|浏览4

暂无评分

摘要

In this paper we study the automatic recognition of sound categories (such as fridge, mixers or sawing sounds) from their vocal imitations. Vocal imitations are made of a succession over time of sounds produced using vocal mechanisms that can largely differ from the ones used in speech. We develop here a recognition approach inspired by automatic-speech-recognition systems, with an acoustic model (that maps the audio signal to a set of probability over “phonemes”) and a language model (that represents the expected succession of “phonemes” for each sound category). Since we do not know what are the underlying “phonemes” of vocal imitations we propose to automatically estimate them using Shift-Invariant Probabilistic Latent Component Analysis (SI-PLCA) applied to a dataset of vocal imitations. The kernel distributions of the SI-PLCA are considered as the “phonemes” of vocal imitation and its impulse distributions are used to compute the emission probabilities of the states of a set of Hidden Markov Models (HMMs). To evaluate our proposal, we test it for a task of automatically recognizing 12 sound categories from their vocal imitations.

查看译文

关键词

Vocal imitation, Sound design, Sound recognition, Shift-invariant probabilistic-latent-component-analysis, Hidden markov model

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要