Leveraging Pre-Trained Autoencoders for Interpretable Prototype Learning of Music Audio
CoRR(2024)
摘要
We present PECMAE, an interpretable model for music audio classification
based on prototype learning. Our model is based on a previous method, APNet,
which jointly learns an autoencoder and a prototypical network. Instead, we
propose to decouple both training processes. This enables us to leverage
existing self-supervised autoencoders pre-trained on much larger data
(EnCodecMAE), providing representations with better generalization. APNet
allows prototypes' reconstruction to waveforms for interpretability relying on
the nearest training data samples. In contrast, we explore using a diffusion
decoder that allows reconstruction without such dependency. We evaluate our
method on datasets for music instrument classification (Medley-Solos-DB) and
genre recognition (GTZAN and a larger in-house dataset), the latter being a
more challenging task not addressed with prototypical networks before. We find
that the prototype-based models preserve most of the performance achieved with
the autoencoder embeddings, while the sonification of prototypes benefits
understanding the behavior of the classifier.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要