Incremental Dataset Definition for Large Scale Musicological Research

DLfM '14: Proceedings of the 1st International Workshop on Digital Libraries for Musicology(2014)

引用 0|浏览1
暂无评分
摘要
Conducting experiments on large scale musical datasets often requires the definition of a dataset as a first step in the analysis process. This is a classification task, but metadata providing the relevant information is not always available or reliable and manual annotation can be prohibitively expensive. In this study we aim to automate the annotation process using a machine learning approach for classification. We evaluate the effectiveness and the trade-off between accuracy and required number of annotated samples. We present an interactive incremental method based on active learning with uncertainty sampling. The music is represented by features extracted from audio and textual metadata and we evaluate logistic regression, support vector machines and Bayesian classification. Labelled training examples can be iteratively produced with a web-based interface, selecting the samples with lowest classification confidence in each iteration. We apply our method to address the problem of instrumentation identification, a particular case of dataset definition, which is a critical first step in a variety of experiments and potentially also plays a significant role in the curation of digital audio collections. We have used the CHARM dataset to evaluate the effectiveness of our method and focused on a particular case of instrumentation recognition, namely on the detection of piano solo pieces. We found that uncertainty sampling led to quick improvement of the classification, which converged after ca. 100 samples to values above 98%. In our test the textual metadata yield better results than our audio features and results depend on the learning methods. The results show that effective training of a classifier is possible with our method which greatly reduces the effort of labelling where a residual error rate is acceptable.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要