Towards Efficient and Real-Time Piano Transcription Using Neural Autoregressive Models
arxiv(2024)
摘要
In recent years, advancements in neural network designs and the availability
of large-scale labeled datasets have led to significant improvements in the
accuracy of piano transcription models. However, most previous work focused on
high-performance offline transcription, neglecting deliberate consideration of
model size. The goal of this work is to implement real-time inference for piano
transcription while ensuring both high performance and lightweight. To this
end, we propose novel architectures for convolutional recurrent neural
networks, redesigning an existing autoregressive piano transcription model.
First, we extend the acoustic module by adding a frequency-conditioned FiLM
layer to the CNN module to adapt the convolutional filters on the frequency
axis. Second, we improve note-state sequence modeling by using a pitchwise LSTM
that focuses on note-state transitions within a note. In addition, we augment
the autoregressive connection with an enhanced recursive context. Using these
components, we propose two types of models; one for high performance and the
other for high compactness. Through extensive experiments, we show that the
proposed models are comparable to state-of-the-art models in terms of note
accuracy on the MAESTRO dataset. We also investigate the effective model size
and real-time inference latency by gradually streamlining the architecture.
Finally, we conduct cross-data evaluation on unseen piano datasets and in-depth
analysis to elucidate the effect of the proposed components in the view of note
length and pitch range.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要