AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
This paper introduces the support-vector network as a new learning machine for two-group classification problems

Support-Vector Networks

Machine Learning, no. 3 (1995): 273-297

Cited: 32951|Views230
EI

Abstract

The support-vector network is a new learning machine for two-group classification problems. The machine conceptually implements the following idea: input vectors are non-linearly mapped to a very high-dimension feature space. In this feature space a linear decision surface is constructed. Special properties of the decision surface ensures...More

Code:

Data:

0
Introduction
  • Fisher (Fisher, 1936) suggested the first algorithm for pattern recognition
  • He considered a model of two normal distributed populations, N ( m 1 , EI) and N ( m 2 , E2) of n dimensional vectors x with mean vectors m1 and m2 and co-variance matrices E1 and E2, and showed that the optimal (Bayesian) solution is a quadratic decision function: In the case where E1 = E2 = E the quadratic decision function (1) degenerates to a linear function: To estimate the quadratic decision function one has to determine "("+3) free parameters.
Highlights
  • According to the properties of the soft margin classifier method the vector w can be written as a linear combination of support vectors
  • The convolution of the dot-product in feature space can be given by any function satisfying Mercer's condition; in particular, to construct a polynomial classifier of degree d in n-dimensional input space one can use the following function
  • This paper introduces the support-vector network as a new learning machine for two-group classification problems
Methods
  • The Method of Convolution of the Dot

    Product in Feature Space

    The algorithms described in the previous sections construct hyperplanes in the input space.
  • The convolution of the dot-product in feature space can be given by any function satisfying Mercer's condition; in particular, to construct a polynomial classifier of degree d in n-dimensional input space one can use the following function
Results
  • The 7 degree polynomial has only 30% more support vectors than the 3rd degree polynomial—and even less than the first degree polynomial.
Conclusion
  • This paper introduces the support-vector network as a new learning machine for two-group classification problems.

    The support-vector network combines 3 ideas: the solution technique from optimal hyperplanes, the idea of convolution of the dot-product, and the notion of soft margins.

    The algorithm has been tested and compared to the performance of other classical algorithms.
  • The support-vector network combines 3 ideas: the solution technique from optimal hyperplanes, the idea of convolution of the dot-product, and the notion of soft margins.
  • Despite the simplicity of the design in its decision surface the new algorithm exhibits a very fine performance in the comparison study.
  • Other characteristics like capacity control and ease of changing the implemented decision surface render the support-vector network an extremely powerful and universal learning machine.
  • A. Constructing Separating Hyperplanes In this appendix the authors derive both the method for constructing optimal hyperplanes and soft margin hyperplanes
Tables
  • Table1: Performance of various classifiers collected from publications and own experiments. For references see text
  • Table2: Results obtained for dot products of polynomials of various degree. The number of "support vectors" is a mean value per classifier
  • Table3: Results obtained for a 4th degree polynomial classifier on the NIST database. The size of the training set is 60,000, and the size of the test set is 10,000 patterns
Download tables as Excel
Funding
  • The 7 degree polynomial has only 30% more support vectors than the 3rd degree polynomial—and even less than the first degree polynomial
Reference
  • Aizerman, M., Braverman, E., & Rozonoer, L. (1964). Theoretical foundations of the potential function method in pattern recognition learning. Automation and Remote Control, 25:821-837.
    Google ScholarLocate open access versionFindings
  • Anderson, T.W., & Bahadur, R.R. (1966). Classification into two multivariate normal distributions with different covariance matrices. Ann. Math. Stat., 33:420-431.
    Google ScholarLocate open access versionFindings
  • Boser, B.E., Guyon, I., & Vapnik, V.N. (1992). A training algorithm for optimal margin classifiers. In Proceedings of the Fifth Annual Workshop of Computational Learning Theory, 5, 144-152, Pittsburgh, ACM.
    Google ScholarLocate open access versionFindings
  • Bottou, L., Cortes, C, Denker, J.S., Drucker, H., Guyon, I., Jacket, L.D., LeCun, Y., Sackinger, E., Simard, P., Vapnik, V., & Miller, U.A. (1994). Comparison of classifier methods: A case study in handwritten digit recognition. Proceedings of 12th International Conference on Pattern Recognition and Neural Network.
    Google ScholarLocate open access versionFindings
  • Bromley, J., & Sackinger, E. (1991). Neural-network and it-nearest-neighbor classifiers. Technical Report 11359910819-16TM, AT&T.
    Google ScholarFindings
  • Courant, R., & Hilbert, D. (1953). Methods of Mathematical Physics, Interscience, New York. Fisher, R.A. (1936). The use of multiple measurements in taxonomic problems. Ann. Eugenics, 7:111-132.
    Google ScholarLocate open access versionFindings
  • LeCun, Y. (1985). Une procedure d'apprentissage pour reseau a seuil assymetrique. Cognitiva 85: A la Frontiere de I'Intelligence Artificielle des Sciences de la Connaissance des Neurosciences, 599-604, Paris. LeCun, Y, Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., & Jackel, L.D. (1990). Handwritten digit recognition with a back-propagation network. Advances in Neural Information Processing Systems, 2,396404, Morgan Kaufman.
    Google ScholarLocate open access versionFindings
  • Parker, D.B. (1985). Learning logic. Technical Report TR-47, Center for Computational Research in Economics and Management Science, Massachusetts Institute of Technology, Cambridge, MA. Rosenblatt, F. (1962). Principles ofNeurodynamics, Spartan Books, New York.
    Google ScholarFindings
  • Rumelhart, D.E., Hinton, G.E., & Williams, R.J. (1986). Learning internal representations by backpropagating errors. Nature, 323:533-536.
    Google ScholarLocate open access versionFindings
  • Rumelhart, D.E.,Hinton, G.E., & Williams, R.J. (1987). Learning internal representations by error propagation. In James L. McClelland & David E. Rumelhart (Eds.), Parallel Distributed Processing, 1,318-362, MIT Press.
    Google ScholarLocate open access versionFindings
  • Vapnik, V.N. (1982). Estimation of Dependences Based on Empirical Data, Addendum 1, New York: SpringerVerlag.
    Google ScholarFindings
0
Your rating :

No Ratings

Tags
Comments
avatar
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn