Learning SVM Classifiers with Indefinite Kernels

AAAI, pp. 942-948, 2012.

Cited by: 28|Bibtex|Views167
EI
Other Links: dblp.uni-trier.de|academic.microsoft.com
Weibo:
We presented a novel joint optimization model over SVM classifications and principal component analysis to conduct SVM training with indefinite kernels assisted by kernel component analysis

Abstract:

Recently, training support vector machines with indefinite kernels has attracted great attention in the machine learning community. In this paper, we tackle this problem by formulating a joint optimization model over SVM classifications and kernel principal component analysis. We first reformulate the kernel principal component analysis a...More

Code:

Data:

0
Introduction
  • Support vector machines (SVMs) with kernels have attracted a lot attention due to their good generalization performance.
  • The kernel function in a standard SVM produces a similarity kernel matrix over samples, which is required to be positive semi-definite.
  • This positive semi-definite property of the kernel matrix ensures the SVMs can be efficiently solved using convex quadratic programming.
  • In many applications the underlying similarity functions do not produce positive semi-definite kernels (Chen et al 2009).
  • Training SVMs with indefinite kernels poses a challenging optimization problem since convex solutions for standard SVMs are not valid in this learning scenario
Highlights
  • Support vector machines (SVMs) with kernels have attracted a lot attention due to their good generalization performance
  • We propose a novel joint optimization model over SVM classifications and kernel principal component analysis to address the problem of learning with indefinite kernels
  • We show that the spectrum modification methods reviewed in the previous section can be equivalently reexpressed as kernel transformations in the form of Eq(9) with proper V matrices
  • We investigated the problem of training SVMs with indefinite kernels
  • We presented a novel joint optimization model over SVM classifications and principal component analysis to conduct SVM training with indefinite kernels assisted by kernel component analysis
  • The proposed model can be used for both binary classifications and multi-class classifications
Methods
  • The authors conducted experiments on both synthetic data sets and real world data sets to compare the proposed method, denoted as SVM-CA, with a few spectrum modification methods, the robust SVM (Luss and d’Aspremont 2007), and the kernel Fisher’s discriminant on indefinite kernels (IKFD) (Pekalska and Haasdonk 2008).
  • The authors conducted experiments on several real world data sets used for learning with indefinite kernels, including a few data sets used in (Chen et al 2009), i.e., yeast, amazon, aural sonar, voting, patrol and protein, and a data set collected in (Pekalska and Haasdonk 2008), i.e., catcortex
  • These data sets are represented by similarity matrices produced using different similarity measures.
  • The authors assumed symmetric similarity kernel matrix K0 in the proposed model
Conclusion
  • The authors first reformulated the kernel principal component analysis (KPCA) to a kernel transformation model and demonstrated its connections to spectrum modification methods with indefinite kernels.
  • The authors presented a novel joint optimization model over SVM classifications and principal component analysis to conduct SVM training with indefinite kernels assisted by kernel component analysis.
  • The authors' experimental results on both synthetic data sets and real world data sets demonstrated the proposed approach can significantly outperform the spectrum modification methods, the robust SVMs and the kernel Fisher’s discriminant on indefinite kernels (IKFD)
Summary
  • Introduction:

    Support vector machines (SVMs) with kernels have attracted a lot attention due to their good generalization performance.
  • The kernel function in a standard SVM produces a similarity kernel matrix over samples, which is required to be positive semi-definite.
  • This positive semi-definite property of the kernel matrix ensures the SVMs can be efficiently solved using convex quadratic programming.
  • In many applications the underlying similarity functions do not produce positive semi-definite kernels (Chen et al 2009).
  • Training SVMs with indefinite kernels poses a challenging optimization problem since convex solutions for standard SVMs are not valid in this learning scenario
  • Methods:

    The authors conducted experiments on both synthetic data sets and real world data sets to compare the proposed method, denoted as SVM-CA, with a few spectrum modification methods, the robust SVM (Luss and d’Aspremont 2007), and the kernel Fisher’s discriminant on indefinite kernels (IKFD) (Pekalska and Haasdonk 2008).
  • The authors conducted experiments on several real world data sets used for learning with indefinite kernels, including a few data sets used in (Chen et al 2009), i.e., yeast, amazon, aural sonar, voting, patrol and protein, and a data set collected in (Pekalska and Haasdonk 2008), i.e., catcortex
  • These data sets are represented by similarity matrices produced using different similarity measures.
  • The authors assumed symmetric similarity kernel matrix K0 in the proposed model
  • Conclusion:

    The authors first reformulated the kernel principal component analysis (KPCA) to a kernel transformation model and demonstrated its connections to spectrum modification methods with indefinite kernels.
  • The authors presented a novel joint optimization model over SVM classifications and principal component analysis to conduct SVM training with indefinite kernels assisted by kernel component analysis.
  • The authors' experimental results on both synthetic data sets and real world data sets demonstrated the proposed approach can significantly outperform the spectrum modification methods, the robust SVMs and the kernel Fisher’s discriminant on indefinite kernels (IKFD)
Tables
  • Table1: Characteristics of the four synthetic data sets and the average classification errors (%) of the six comparison methods
  • Table2: Comparison results in terms of classification error rates (%) on binary classification data sets. The means and standard deviations of the error rates over 50 random repeats are reported
  • Table3: Comparison results in terms of classification error rates (%) on multi-class classification data sets. The means and standard deviations of the error rates over 50 random repeats are reported
Download tables as Excel
Related work
  • The dual formulation of standard SVMs is a linear constrained quadratic programming, which provides a natural form to address nonlinear classification using kernels max α α e− α 2 Y K0Y α (1)

    s.t. α diag(Y ) = 0, 0 ≤ α ≤ C where Y is a diagonal matrix of the labels, and K0 is a kernel matrix. The positive semi-definite property of K0 ensures the problem (1) to be a convex optimization problem and thus a global optimal solution can be solved efficiently. However, when K0 is indefinite, one loses the underlying theoretical support for the kernel methods and the optimization problem (1) is no longer convex. For the nonconvex optimization problem (1) with indefinite kernels, with a simple modification, a sequential minimal optimization (SMO) algorithm can still converge to a stationary point, but not necessarily a global maximum (Lin and Lin 2003).

    Instead of solving the quadratic optimization problem (1) with indefinite kernels directly, many approaches are focused on deriving a surrogate positive semi-definite kernel matrix K from the indefinite kernel K0. A simple and popular way to obtain such a surrogate kernel matrix is to modify the spectrum of K0 using methods such as clip, flip, and shift (Wu, Chang, and Zhang 2005). Let K0 = U ΛU , where Λ = diag(λ1, . . . , λN ) is the diagonal matrix of the eigenvalues, and U is the orthogonal matrix of corresponding eigenvectors. The clip method produces an approximate positive semi-definite kernel Kclip by clipping all negative eigenvalues to zero, Kclip = U diag(max(λ1, 0), · · · , max(λN , 0))U . (2)
Reference
  • Boyd, S., and Vandenberghe, L. 2004. Convex Optimization. Cambridge University Press.
    Google ScholarFindings
  • Chen, J., and Ye, J. 2008. Training SVM with indefinite kernels. In Proceedings of International conference on Machine Learning (ICML). Chen, Y.; Garcia, E.; Gupta, M.; Rahimi, A.; and Cazzanti, L. 2009. Similarity-based classification: Concepts and algorithms. Journal of Machine Learning Research 10:747–776.
    Google ScholarLocate open access versionFindings
  • Chen, Y.; Gupta, M.; and Recht, B. 2009. Learning kernels from indefinite similarities. In Proceedings of International conference on Machine Learning (ICML). Golub, G., and Loan, C. V. 1996. Matrix Computations. Johns Hopkins University Press.
    Google ScholarLocate open access versionFindings
  • Graepel, T.; Herbrich, R.; Bollmann-Sdorra, P.; and Obermayer, K. 1999. Classification on pairwise proximity data. In Advances in Neural Information Processing Systems (NIPS). Guo, Y., and Schuurmans, D. 2009. A reformulation of support vector machines for general confidence functions. In Proceedings of Asian Conference on Machine Learning.
    Google ScholarLocate open access versionFindings
  • Hsu, C., and Lin, C. 2002. A comparison of methods for multi-class support vector machines. IEEE transact. on Neural Networks 13(2):415–425.
    Google ScholarLocate open access versionFindings
  • Lin, H., and Lin, C. 2003. A study on sigmoid kernel for SVM and the training of non-PSD kernels by SMO-type methods. Technical report.
    Google ScholarFindings
  • Luss, R., and d’Aspremont, A. 200Support vector machine classification with indefinite kernels. In Advances in Neural Information Processing Systems (NIPS).
    Google ScholarLocate open access versionFindings
  • Newman, D.; Hettich, S.; Blake, C.; and Merz, C. 199UCI repository of machine learning datasets.
    Google ScholarFindings
  • Ong, C.; Mary, X.; Canu, S.; and Smola, A. 2004. Learning with non-positive kernels. In Proceedings of International conference on Machine Learning (ICML).
    Google ScholarLocate open access versionFindings
  • Pekalska, E., and Haasdonk, B. 2008. Kernel discriminant analysis for positive definite and indefinite kernels. IEEE Trans. on Pattern Analysis and Machine Intelligence 31(6):1017–1032.
    Google ScholarLocate open access versionFindings
  • Pekalska, E.; Paclik, P.; and Duin, R. 2001. A generalized kernel approach to dissimilarity-based classification. Journal of Machine Learning Research 2:175–211.
    Google ScholarLocate open access versionFindings
  • Roth, V.; Laub, J.; Kawanabe, M.; and Buhmann, J. 2003. Optimal cluster preserving embedding of nonmetric proximity data. IEEE Trans. on Pattern Analysis and Machine Intelligence 25(12):1540–1551.
    Google ScholarLocate open access versionFindings
  • Saigo, H.; Vert, J.; Ueda, N.; and Akutsu, T. 2004. Protein homology detection using string alignment kernels. Bioinformatics 20, Issue 11:1682–1689.
    Google ScholarLocate open access versionFindings
  • Scholkopf, B.; Smola, A.; and Muller, K. 1999. Kernel principal component analysis. In Advances in Kernel MethodsSupport Vector Learning, 327–352.
    Google ScholarLocate open access versionFindings
  • Smola, A.; Ovari, Z.; and Williamson, R. C. 2000. Regularization with dot-product kernels. In Advances in Neural Information Processing Systems (NIPS).
    Google ScholarLocate open access versionFindings
  • Wu, G.; Chang, E.; and Zhang, Z. 2005. An analysis of transformation on non-positive semidefinite similarity matrix for kernel machines. In Proceedings of International conference on Machine Learning (ICML).
    Google ScholarLocate open access versionFindings
  • Ying, Y.; Campbelly, C.; and Girolami, M. 2009. Analysis of SVM with indefinite kernels. In Advances in Neural Information Processing Systems (NIPS).
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Best Paper
Best Paper of AAAI, 2012
Tags
Comments