AI helps you reading Science
AI generates interpretation videos
AI extracts and analyses the key points of the paper to generate videos automatically
AI parses the academic lineage of this thesis
AI extracts a summary of this paper
The reference fingerprints in the database are separately compared against the input audio stream
Generalized Time-Series Active Search With Kullback-Leibler Distance For Audio Fingerprinting
IEEE SIGNAL PROCESSING LETTERS, no. 8 (2006): 465-468
SCOPUS EI WOS
In this letter, a new audio fingerprinting approach is presented. We investigate to improve robustness by more precise statistical fingerprint modeling with common component Gaussian mixture models (CCGMMs) and Kullback-Leibler (KL) distance, which is more suitable to measure the dissimilarity between two probabilistic models. To address ...More
PPT (Upload PPT)
- AUDIO fingerprinting includes a wide variety of applications and has received a lot of attention recently.
- Various audio fingerprinting algorithms have been proposed –.
- An ideal fingerprinting system should be able to identify different versions of the same audio content consistently, regardless of the distortions due to compression, transmission, and so on.
- It should be computationally efficient.
- Typical applications include analysis of broadcast music/commercials, copyright management over the Internet, or finding metadata for unlabeled audio
- AUDIO fingerprinting includes a wide variety of applications and has received a lot of attention recently
- The task of finding given audio clips in an audio stream where the stream may be corrupted by distortions is used as a testbed for studying these issues
- While previous studies explored various features that were robust to distortions , , we focus on improving robustness by more precise statistical fingerprint modeling and better distance measure
- One approach is that the input fingerprints are separately compared with the database to find a match
- The reference fingerprints in the database are separately compared against the input audio stream
- The accuracy value is the precision rate when the precision rate equals the recall rate; the efficiency value is the percentage of the number of matching calculations that the active search skips compared with exhaustive search.
- On average over the three distortions, using KL distance with 0.005 bias could skip 93.47% of the exhaustive matches, take 3.04 s to search a 3-s clip in the 10-h stream, and reduce the error rate by 31.7% compared with using distance.
- By adjusting the bias for KL distance, the authors can achieve a trade-off between accuracy and efficiency
- A new audio search approach is proposed. The main feature is its joint use of refined statistical modeling (CCGMMs), KL distance measure, and generalized active search.
- While still exhaustive, can be more efficiently executed
- Another approach, which is presented here, employs the time-series active search.
- The reference fingerprints in the database are separately compared against the input audio stream.
- While the indexing method makes use of the redundancy of the distances between an input fingerprint and the stored reference fingerprints to achieve efficient search , the active search makes use of the continuity of the distances between a reference fingerprint
- Table1: ACCURACY AND EFFICIENCY
- This work was supported by the China Ministry of Information Industry
- P. Cano, E. Batlle, T. Kalker, and J. Haitsma, “A review of algorithms for audio fingerprinting,” in Proc. Int. Workshop Multimedia Signal Processing, 2002.
- C. J. C. Burges, J. C. Platt, and S. Jana, “Distortion discriminant analysis for audio fingerprinting,” IEEE Trans. Speech Audio Process., vol. 11, no. 3, pp. 165–174, May 2003.
- S. Sukittanon, L. E. Atlas, and J. Pitton, “Modulation-scale analysis for content identification,” IEEE Trans. Signal Process., vol. 52, no. 10, pp. 3023–3035, Oct. 2004.
- Y. Wang and C. Huang, “Speaker-and-environment change detection in broadcast news using the common component GMM-based divergence measure,” in Proc. INTERSPEECH, 2004, pp. 1069–1072.
- S. Young, D. Kershaw, J. Odell, D. Ollason, V. Valtchev, and P. Woodland, The htk Book Version 3.0. Cambridge, U.K.: Cambridge Univ. Press, 2000.
- K. Kashino, T. Kurozumi, and H. Murase, “A quick search method for audio and video signals based on histogram pruning,” IEEE Trans. Multimedia, vol. 5, no. 3, pp. 348–357, Sep. 2003.
- V. Venkatachalam, L. Cazzanti, N. Dhillon, and M. Wells, “Automatic identification of sound recordings,” IEEE Signal Process. Mag., vol. 21, no. 2, pp. 92–99, Mar. 2004.
- J. Goldstein, J. C. Platt, and C. J. C. Burges, “Redundant bit vectors for quickly searching high-dimensional regions,” in Deterministic and Statistical Methods in Machine Learning, J. Winkler, M. Niranjan, and N. Lawrence, Eds., Springer Lecture Notes on Computer Science, vol. 3635, pp. 137–158, 2005.