A crnn-gcn piano transcription model based on audio and skeleton features

Yuqing Li,Xianke Wang, Ruimin Wu,Wei Xu,Wenqing Cheng

ICASSP Workshops（2023）

引用 0|浏览12

暂无评分

摘要

Piano transcription is a fundamental problem in music information retrieval, which aims to infer the note sequence from the piano multimedia. This paper proposes a piano transcription model named CRNN-GCN, which is fused with audio transcription model Onset and Frames(CRNN) and visual transcription model Graph Convolutional Network(GCN). CRNN extracts features from the audio, and GCN extracts features from hand skeletons rather than video frames, which effectively reduces video memory and computational complexity. All the features are then integrated to obtain better transcription results. On our self-built dataset OMAPS2, the F1-scores of the single-modal CRNN and GCN are 89.98% and 61.63%, while the F1-score of the multi-modal CRNN-GCN reaches 92.06%, which is the best result at present.

查看译文

关键词

GCN,Multimodal,Piano Transcription

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要