Efficient Global String Kernel with Random Features: Beyond Counting Substructures

pp. 520-528, 2019.

Cited by: 1|Bibtex|Views47|DOI:https://doi.org/10.1145/3292500.3330923
EI
Other Links: dl.acm.org|dblp.uni-trier.de|academic.microsoft.com|arxiv.org
Weibo:
We present a new family of positive-definite string kernels that take into account the global properties hidden in the data strings through the global alignments measured by Edit Distance

Abstract:

Analysis of large-scale sequential data has been one of the most crucial tasks in areas such as bioinformatics, text, and audio mining. Existing string kernels, however, either (i) rely on local features of short substructures in the string, which hardly capture long discriminative patterns, (ii) sum over too many substructures, such as a...More

Code:

Data:

0
Introduction
  • String classification is a core learning task and has drawn considerable interests in many applications such as computational biology [20, 21], text categorization [26, 44], and music classification [9].
  • Over the last two decades, a number of string kernel methods [7, 19, 21, 22, 24, 36] have been proposed, among which the kspectrum kernel [21], (k, m)-mismatch kernel and its fruitful variants [22,23,24] have gained much popularity due to its strong empirical performance
  • These kernels decompose the original strings into sub-structures, i.e., a short k-length subsequence as a k-mer, and count the occurrences of k-mers in the original sequence to define a feature map and its associated string kernels.
  • These methods only consider the local properties of the short substructures in the strings, failing to capture the global properties highly related to some discriminative features of strings, i.e., relatively long subsequences
Highlights
  • String classification is a core learning task and has drawn considerable interests in many applications such as computational biology [20, 21], text categorization [26, 44], and music classification [9]
  • The first interesting observation is that our method performs substantially better than SSK and ASK, often by a large margin, i.e., Random String Embeddings (RSE) achieves 25% - 33% higher accuracy than SSK and ASK on three protein datasets
  • RSE achieves much better performance than KSVM on all of datasets, highlighting the importance of truly p.d. kernel compared to the indefinite kernel even in the Krein space
  • When increasing R to larger number, all variants converge to the optimal performance of the exact kernel. This confirms our analysis in Theory 1 that the RGE approximation can guarantee the fast convergence to the exact kernel. Another important observation is that all variants of RSE scales linearly with increase in the size of the random string embedding R
  • We present a new family of positive-definite string kernels that take into account the global properties hidden in the data strings through the global alignments measured by Edit Distance
  • Since we can generate random samples from the distribution, we can use as many as needed to achieve performance close to an exact kernel
  • Several interesting future directions are listed below: i) our method can be further exploited with other distance measure that consider the global or local alignments; ii) other non-linear solver can be applied to potentially improve the classification of our embedding compared to our currently used linear Support Vector Machine (SVM) solver; iii) our method can be applied in the application domain like computational biology for the domain-specific problems
Results
  • Fig. 2b empirically corroborated that RSE achieves linear scalability in terms of the length of string L
  • These emperical results provide a strong evidence to demonstrate that RSE derived from the newly proposed global string kernel scales linearly in both number of string samples and length of string.
  • The authors' method opens the door for developing a new family of string kernels that enjoy both higher accuracy and linear scalability on real-world string data
Conclusion
  • The authors present a new family of positive-definite string kernels that take into account the global properties hidden in the data strings through the global alignments measured by Edit Distance.

    The authors' Random String Embedding, derived from the proposed kernel through Random Feature approximation, enjoys double benefits of producing higher classification accuracy and scaling linearly in terms of both number of strings and the length of a string.
  • The authors present a new family of positive-definite string kernels that take into account the global properties hidden in the data strings through the global alignments measured by Edit Distance.
  • The authors' Random String Embedding, derived from the proposed kernel through Random Feature approximation, enjoys double benefits of producing higher classification accuracy and scaling linearly in terms of both number of strings and the length of a string.
  • The authors' newly defined global string kernels pave a simple yet effective way to handle real-world large-scale string data.
  • Several interesting future directions are listed below: i) the method can be further exploited with other distance measure that consider the global or local alignments; ii) other non-linear solver can be applied to potentially improve the classification of the embedding compared to the currently used linear SVM solver; iii) the method can be applied in the application domain like computational biology for the domain-specific problems
Summary
  • Introduction:

    String classification is a core learning task and has drawn considerable interests in many applications such as computational biology [20, 21], text categorization [26, 44], and music classification [9].
  • Over the last two decades, a number of string kernel methods [7, 19, 21, 22, 24, 36] have been proposed, among which the kspectrum kernel [21], (k, m)-mismatch kernel and its fruitful variants [22,23,24] have gained much popularity due to its strong empirical performance
  • These kernels decompose the original strings into sub-structures, i.e., a short k-length subsequence as a k-mer, and count the occurrences of k-mers in the original sequence to define a feature map and its associated string kernels.
  • These methods only consider the local properties of the short substructures in the strings, failing to capture the global properties highly related to some discriminative features of strings, i.e., relatively long subsequences
  • Results:

    Fig. 2b empirically corroborated that RSE achieves linear scalability in terms of the length of string L
  • These emperical results provide a strong evidence to demonstrate that RSE derived from the newly proposed global string kernel scales linearly in both number of string samples and length of string.
  • The authors' method opens the door for developing a new family of string kernels that enjoy both higher accuracy and linear scalability on real-world string data
  • Conclusion:

    The authors present a new family of positive-definite string kernels that take into account the global properties hidden in the data strings through the global alignments measured by Edit Distance.

    The authors' Random String Embedding, derived from the proposed kernel through Random Feature approximation, enjoys double benefits of producing higher classification accuracy and scaling linearly in terms of both number of strings and the length of a string.
  • The authors present a new family of positive-definite string kernels that take into account the global properties hidden in the data strings through the global alignments measured by Edit Distance.
  • The authors' Random String Embedding, derived from the proposed kernel through Random Feature approximation, enjoys double benefits of producing higher classification accuracy and scaling linearly in terms of both number of strings and the length of a string.
  • The authors' newly defined global string kernels pave a simple yet effective way to handle real-world large-scale string data.
  • Several interesting future directions are listed below: i) the method can be further exploited with other distance measure that consider the global or local alignments; ii) other non-linear solver can be applied to potentially improve the classification of the embedding compared to the currently used linear SVM solver; iii) the method can be applied in the application domain like computational biology for the domain-specific problems
Tables
  • Table1: Statistical properties of the datasets
  • Table2: Comparisons among eight variants of RSE in terms of classification accuracy. Each sampling strategy combines either DF (direct LD distance as features in String Kernels (6)) or SF (soft version of LD distance as features in String Kernels (7))
  • Table3: Comparing RSE against other state-of-the-art methods in terms of classification accuracy and computational time (seconds). The symbol "–" stands for either "run out of memory" (with total 256G) or runtime greater than 36 hours
Download tables as Excel
Reference
  • Chih-Chung Chang and Chih-Jen Lin. 201LIBSVM: a library for support vector machines. ACM transactions on intelligent systems and technology 2, 3 (2011), 27.
    Google ScholarLocate open access versionFindings
  • Jie Chen, Lingfei Wu, Kartik Audhkhasi, Brian Kingsbury, and Bhuvana Ramabhadrari. 2016. Efficient one-vs-one kernel ridge regression for speech recognition. In ICASSP. IEEE, 2454–2458.
    Google ScholarLocate open access versionFindings
  • Yihua Chen, Eric K Garcia, Maya R Gupta, Ali Rahimi, and Luca Cazzanti. 2009. Similarity-based classification: Concepts and algorithms. Journal of Machine Learning Research 10, Mar (2009), 747–776.
    Google ScholarLocate open access versionFindings
  • Kyunghyun Cho, Bart Van Merriënboer, Dzmitry Bahdanau, and Yoshua Bengio. 201On the properties of neural machine translation: Encoder-decoder approaches. arXiv:1409.1259 (2014).
    Findings
  • Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv:1412.3555 (2014).
    Findings
  • Corinna Cortes, Patrick Haffner, and Mehryar Mohri. 2004. Rational kernels: Theory and algorithms. Journal of Machine Learning Research 5, Aug (2004), 1035–1062.
    Google ScholarLocate open access versionFindings
  • Nello Cristianini, John Shawe-Taylor, et al. 2000. An introduction to support vector machines and other kernel-based learning methods. Cambridge university press.
    Google ScholarFindings
  • Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin. 200LIBLINEAR: A library for large linear classification. Journal of machine learning research 9, Aug (2008), 1871–1874.
    Google ScholarLocate open access versionFindings
  • Muhammad Farhan, Juvaria Tariq, Arif Zaman, Mudassir Shabbir, and Imdad Ullah Khan. 2017. Efficient Approximation Algorithms for Strings Kernel Based Sequence Classification. In NIPS. 6938–6948.
    Google ScholarFindings
  • Andrew Frank and Arthur Asuncion. 20UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California. School of information and computer science 213 (2010).
    Locate open access versionFindings
  • Leo Gordon, Alexey Ya Chervonenkis, Alex J Gammerman, Ilham A Shahmuradov, and Victor V Solovyev. 2003. Sequence alignment kernel for recognition of promoter regions. Bioinformatics 19, 15 (2003), 1964–1971.
    Google ScholarLocate open access versionFindings
  • Derek Greene and Pádraig Cunningham. 2006. Practical solutions to the problem of diagonal dominance in kernel document clustering. In ICML. ACM, 377–384.
    Google ScholarLocate open access versionFindings
  • Klaus Greff, Rupesh K Srivastava, Jan Koutník, Bas R Steunebrink, and Jürgen Schmidhuber. 2017. LSTM: A search space odyssey. IEEE transactions on neural networks and learning systems 28, 10 (2017), 2222–2232.
    Google ScholarLocate open access versionFindings
  • Bernard Haasdonk and Claus Bahlmann. 2004. Learning with distance substitution kernels. In Joint Pattern Recognition Symposium. Springer, 220–227.
    Google ScholarLocate open access versionFindings
  • David Haussler. 1999. Convolution kernels on discrete structures. Technical Report. Department of Computer Science, University of California at Santa Cruz.
    Google ScholarFindings
  • Po-Sen Huang, Haim Avron, Tara N Sainath, Vikas Sindhwani, and Bhuvana Ramabhadran. 2014. Kernel methods match Deep Neural Networks on TIMIT.. In ICASSP. 205–209.
    Google ScholarLocate open access versionFindings
  • Catalin Ionescu, Alin Popa, and Cristian Sminchisescu. 20Large-scale datadependent kernel approximation. In AIStats. 19–27.
    Google ScholarFindings
  • Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. CoRR abs/1412.6980 (2014).
    Findings
  • Rui Kuang, Eugene Ie, Ke Wang, Kai Wang, Mahira Siddiqi, Yoav Freund, and Christina Leslie. 2005. Profile-based string kernels for remote homology detection and motif extraction. Journal of bioinformatics and computational biology 3, 03 (2005), 527–550.
    Google ScholarLocate open access versionFindings
  • Pavel P Kuksa, Pai-Hsi Huang, and Vladimir Pavlovic. 2009. Scalable algorithms for string kernels with inexact matching. In NIPS. 881–888.
    Google ScholarLocate open access versionFindings
  • Christina Leslie, Eleazar Eskin, and William Stafford Noble. 2001. The spectrum kernel: A string kernel for SVM protein classification. In Biocomputing 2002. World Scientific, 564–575.
    Google ScholarLocate open access versionFindings
  • Christina Leslie, Eleazar Eskin, Jason Weston, and William Stafford Noble. 2003. Mismatch string kernels for SVM protein classification. In NIPS. Neural information processing systems foundation.
    Google ScholarLocate open access versionFindings
  • Christina Leslie and Rui Kuang. 2004. Fast string kernels using inexact matching for protein sequences. Journal of Machine Learning Research 5, Nov (2004), 1435–1455.
    Google ScholarLocate open access versionFindings
  • Christina S Leslie, Eleazar Eskin, Adiel Cohen, Jason Weston, and William Stafford Noble. 2004. Mismatch string kernels for discriminative protein classification. Bioinformatics 20, 4 (2004), 467–476.
    Google ScholarLocate open access versionFindings
  • Vladimir I Levenshtein. 1966. Binary codes capable of correcting deletions, insertions, and reversals. In Soviet physics doklady, Vol. 10. 707–710.
    Google ScholarLocate open access versionFindings
  • Huma Lodhi, Craig Saunders, John Shawe-Taylor, Nello Cristianini, and Chris Watkins. 2002. Text classification using string kernels. Journal of Machine Learning Research 2, Feb (2002), 419–444.
    Google ScholarLocate open access versionFindings
  • Gaëlle Loosli, Stéphane Canu, and Cheng Soon Ong. 2016. Learning SVM in Krein spaces. IEEE transactions on pattern analysis and machine intelligence 38, 6 (2016), 1204–1216.
    Google ScholarLocate open access versionFindings
  • Saul B Needleman and Christian D Wunsch. 1970. A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of molecular biology 48, 3 (1970), 443–453.
    Google ScholarLocate open access versionFindings
  • Michel Neuhaus and Horst Bunke. 2006. Edit distance-based kernel functions for structural pattern classification. Pattern Recognition 39, 10 (2006), 1852–1863.
    Google ScholarLocate open access versionFindings
  • Ali Rahimi and Benjamin Recht. 2008. Random features for large-scale kernel machines. In NIPS. 1177–1184.
    Google ScholarFindings
  • Alessandro Rudi and Lorenzo Rosasco. 2017. Generalization properties of learning with random features. In NIPS. 3218–3228.
    Google ScholarFindings
  • Hiroto Saigo, Jean-Philippe Vert, Nobuhisa Ueda, and Tatsuya Akutsu. 2004. Protein homology detection using string alignment kernels. Bioinformatics 20, 11 (2004), 1682–1689.
    Google ScholarLocate open access versionFindings
  • Bernhard Schölkopf, Koji Tsuda, Jean-Philippe Vert, Director Sorin Istrail, Pavel A Pevzner, Michael S Waterman, et al. 2004. Kernel methods in computational biology. MIT press.
    Google ScholarLocate open access versionFindings
  • Si Si, Cho-Jui Hsieh, and Inderjit S Dhillon. 2017. Memory efficient kernel approximation. The Journal of Machine Learning Research 18, 1 (2017), 682–713.
    Google ScholarLocate open access versionFindings
  • Temple F Smith and Michael S Waterman. 1981. Comparison of biosequences. Advances in applied mathematics 2, 4 (1981), 482–489.
    Google ScholarLocate open access versionFindings
  • Alex J Smola and SVN Vishwanathan. 2003. Fast kernels for string and tree matching. In NIPS. 585–592.
    Google ScholarLocate open access versionFindings
  • Nitish Srivastava, Geoffrey E. Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research 15, 1 (2014), 1929–1958.
    Google ScholarLocate open access versionFindings
  • MICHAEL S Waterman, JANA Joyce, and Mark Eggert. 1991. Computer alignment of sequences. Phylogenetic analysis of DNA sequences (1991), 59–72.
    Google ScholarFindings
  • Chris Watkins. 1999. Dynamic alignment kernels. NIPS (1999), 39–50.
    Google ScholarLocate open access versionFindings
  • Jason Weston, Bernhard Schölkopf, Eleazar Eskin, Christina Leslie, and William Stafford Noble. 2003. Dealing with large diagonals in kernel matrices. Annals of the Institute of Statistical Mathematics 55, 2 (2003), 391–408.
    Google ScholarLocate open access versionFindings
  • Christopher KI Williams and Matthias Seeger. 2001. Using the Nyström method to speed up kernel machines. In NIPS. 682–688.
    Google ScholarLocate open access versionFindings
  • Lingfei Wu, Pin-Yu Chen, Ian En-Hsu Yen, Fangli Xu, Yinglong Xia, and Charu Aggarwal. 2018. Scalable spectral clustering using random binning features. In KDD. ACM, 2506–2515.
    Google ScholarLocate open access versionFindings
  • Lingfei Wu, Ian EH Yen, Jie Chen, and Rui Yan. 2016. Revisiting random binning features: Fast convergence and strong parallelizability. In KDD. ACM, 1265–1274.
    Google ScholarLocate open access versionFindings
  • Lingfei Wu, Ian EH Yen, Kun Xu, Fangli Xu, Avinash Balakrishnan, Pin-Yu Chen, Pradeep Ravikumar, and Michael J Witbrock. 2018. Word Mover’s Embedding: From Word2Vec to Document Embedding. EMNLP (2018), 4524âĂŞ4534.
    Google ScholarLocate open access versionFindings
  • Lingfei Wu, Ian En-Hsu Yen, Fangli Xu, Pradeep Ravikuma, and Michael Witbrock. 2018. D2KE: From Distance to Kernel and Embedding. arXiv:1802.04956 (2018).
    Findings
  • Lingfei Wu, Ian En-Hsu Yen, Jinfeng Yi, Fangli Xu, Qi Lei, and Michael Witbrock.
    Google ScholarFindings
  • 2018. Random Warping Series: A Random Features Method for Time-Series Embedding. In AIStats. 793–802.
    Google ScholarFindings
  • [47] Zhengzheng Xing, Jian Pei, and Eamonn Keogh. 2010. A brief survey on sequence classification. ACM Sigkdd Explorations Newsletter 12, 1 (2010), 40–48.
    Google ScholarLocate open access versionFindings
  • [48] Li Yujian and Liu Bo. 2007. A normalized Levenshtein distance metric. IEEE transactions on pattern analysis and machine intelligence 29, 6 (2007), 1091–1095.
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments