High Fidelity Song Identification via Audio Decomposition and Fingerprint Reconstruction by CNN and LSTM Networks

semanticscholar(2020)

引用 0|浏览0
暂无评分
摘要
The task of generic audio input identification is achieved through numerous robust techniques, one of which is audio source separation, which isolates a particular source from a confluence of audio samplings. While these techniques accomplish consistently high performance levels on manufactured audio, inexorable discrepancies arise when these algorithms are implemented to interpret human audio, due to its unpredictable composition and innate lack of regularity. Currently, softwares such as Shazam and other audio analysis algorithms have attained relatively successful results on processed audio, but improvements and developments of these mechanisms to recognize unfiltered human audio are absent from the field. Within this project we consider the identification of musical samplings, which arises in the familiar conundrum where an individual cannot distinguish the identity of a sampling of a song but would like to. We propose a mechanism for robust audio recognition and filtration, by programming a signal analysis and identification model that transforms distorted human audio into signal frequencies, then deploys a recurrent neural network LSTM unit to process the continuously sequenced frequency points and associated pitch points, subsequently applying softmax classification to identify the correct song.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要