AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
The reference fingerprints in the database are separately compared against the input audio stream

Generalized Time-Series Active Search With Kullback-Leibler Distance For Audio Fingerprinting

IEEE SIGNAL PROCESSING LETTERS, no. 8 (2006): 465-468

Cited: 18|Views36
SCOPUS EI WOS
Full Text
Bibtex
Weibo

Abstract

In this letter, a new audio fingerprinting approach is presented. We investigate to improve robustness by more precise statistical fingerprint modeling with common component Gaussian mixture models (CCGMMs) and Kullback-Leibler (KL) distance, which is more suitable to measure the dissimilarity between two probabilistic models. To address ...More

Code:

Data:

0
Introduction
  • AUDIO fingerprinting includes a wide variety of applications and has received a lot of attention recently.
  • Various audio fingerprinting algorithms have been proposed [1]–[3].
  • An ideal fingerprinting system should be able to identify different versions of the same audio content consistently, regardless of the distortions due to compression, transmission, and so on.
  • It should be computationally efficient.
  • Typical applications include analysis of broadcast music/commercials, copyright management over the Internet, or finding metadata for unlabeled audio
Highlights
  • AUDIO fingerprinting includes a wide variety of applications and has received a lot of attention recently
  • The task of finding given audio clips in an audio stream where the stream may be corrupted by distortions is used as a testbed for studying these issues
  • While previous studies explored various features that were robust to distortions [2], [3], we focus on improving robustness by more precise statistical fingerprint modeling and better distance measure
  • One approach is that the input fingerprints are separately compared with the database to find a match
  • The reference fingerprints in the database are separately compared against the input audio stream
Methods
  • The accuracy value is the precision rate when the precision rate equals the recall rate; the efficiency value is the percentage of the number of matching calculations that the active search skips compared with exhaustive search.
  • On average over the three distortions, using KL distance with 0.005 bias could skip 93.47% of the exhaustive matches, take 3.04 s to search a 3-s clip in the 10-h stream, and reduce the error rate by 31.7% compared with using distance.
  • By adjusting the bias for KL distance, the authors can achieve a trade-off between accuracy and efficiency
Conclusion
  • A new audio search approach is proposed. The main feature is its joint use of refined statistical modeling (CCGMMs), KL distance measure, and generalized active search.
  • While still exhaustive, can be more efficiently executed
  • Another approach, which is presented here, employs the time-series active search.
  • The reference fingerprints in the database are separately compared against the input audio stream.
  • While the indexing method makes use of the redundancy of the distances between an input fingerprint and the stored reference fingerprints to achieve efficient search [8], the active search makes use of the continuity of the distances between a reference fingerprint
Tables
  • Table1: ACCURACY AND EFFICIENCY
Funding
  • This work was supported by the China Ministry of Information Industry
Reference
  • P. Cano, E. Batlle, T. Kalker, and J. Haitsma, “A review of algorithms for audio fingerprinting,” in Proc. Int. Workshop Multimedia Signal Processing, 2002.
    Google ScholarLocate open access versionFindings
  • C. J. C. Burges, J. C. Platt, and S. Jana, “Distortion discriminant analysis for audio fingerprinting,” IEEE Trans. Speech Audio Process., vol. 11, no. 3, pp. 165–174, May 2003.
    Google ScholarLocate open access versionFindings
  • S. Sukittanon, L. E. Atlas, and J. Pitton, “Modulation-scale analysis for content identification,” IEEE Trans. Signal Process., vol. 52, no. 10, pp. 3023–3035, Oct. 2004.
    Google ScholarLocate open access versionFindings
  • Y. Wang and C. Huang, “Speaker-and-environment change detection in broadcast news using the common component GMM-based divergence measure,” in Proc. INTERSPEECH, 2004, pp. 1069–1072.
    Google ScholarLocate open access versionFindings
  • S. Young, D. Kershaw, J. Odell, D. Ollason, V. Valtchev, and P. Woodland, The htk Book Version 3.0. Cambridge, U.K.: Cambridge Univ. Press, 2000.
    Google ScholarFindings
  • K. Kashino, T. Kurozumi, and H. Murase, “A quick search method for audio and video signals based on histogram pruning,” IEEE Trans. Multimedia, vol. 5, no. 3, pp. 348–357, Sep. 2003.
    Google ScholarLocate open access versionFindings
  • V. Venkatachalam, L. Cazzanti, N. Dhillon, and M. Wells, “Automatic identification of sound recordings,” IEEE Signal Process. Mag., vol. 21, no. 2, pp. 92–99, Mar. 2004.
    Google ScholarLocate open access versionFindings
  • J. Goldstein, J. C. Platt, and C. J. C. Burges, “Redundant bit vectors for quickly searching high-dimensional regions,” in Deterministic and Statistical Methods in Machine Learning, J. Winkler, M. Niranjan, and N. Lawrence, Eds., Springer Lecture Notes on Computer Science, vol. 3635, pp. 137–158, 2005.
    Google ScholarLocate open access versionFindings
0
Your rating :

No Ratings

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn