SiPTH: Singing Transcription Based on Hysteresis Defined on the Pitch-Time Curve

Audio, Speech, and Language Processing, IEEE/ACM Transactions  (2015)

引用 46|浏览19
暂无评分
摘要
In this paper, we present a method for monophonic singing transcription based on hysteresis defined on the pitch-time curve. This method is designed to perform note segmentation even when the pitch evolution during the same note behaves unstably, as in the case of untrained singers. The selected approach estimates the regions in which the chroma is stable, these regions are classified as voiced or unvoiced according to a decision tree classifier using two descriptors based on aperiodicity and power. Then, a note segmentation stage based on pitch intervals of the sung signal is carried out. To this end, a dynamic averaging of the pitch curve is performed after the beginning of a note is detected in order to roughly estimate the pitch. Deviations of the actual pitch curve with respect to this average are measured to determine the next note change according to a hysteresis process defined on the pitch-time curve. Finally, each note is labeled using three single values: rounded pitch (to semitones), duration and volume. Also, a complete evaluation methodology that includes the definition of different relevant types of errors, measures and a method for the computation of the evaluation measures are presented. The proposed system improves significantly the performance of the baseline approach, and attains results similar to previous approaches.
更多
查看译文
关键词
acoustic signal processing,decision trees,hysteresis,SiPTH,decision tree classifier,hysteresis process,monophonic singing transcription,note segmentation,pitch curve,pitch evolution,pitch intervals,pitch-time curve,Acoustic signal processing,fundamental frequency,pitch,singing transcription,singing voice analysis
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要