An Introduction to Principal Component Analysis with Examples in R

semanticscholar(2017)

引用 0|浏览0
暂无评分
摘要
Principal component analysis (PCA) is a series of mathematical steps for reducing the dimensionality of data. In practical terms, it can be used to reduce the number of features in a data set by a large factor (for example, from 1000s of features to 10s of features) if the features are correlated. This type of “feature compression” is often used for two purposes. First, if high-dimensional data is to be visualized by plotting it on a 2-D surface (such as a computer monitor or a piece of paper), then PCA can be used to reduce the data to 2-D or 3-D; in this context, PCA can be considered a complete, standalone unsupervised machine learning algorithm. Second, if a different machine learning training algorithm is taking too long to run, then PCA can be used to reduce the number of features, which in turn reduces the amount of training data and the time to train a model; here, PCA is used as a pre-processing step as part of a larger workflow. In this paper we discuss PCA largely for the first purpose of visualizing and exploring patterns in data. It is important to note that PCA does not reduce features by selecting a subset of the original features (such as what is done with wrapper feature selection algorithms that perform feature-by-feature forward or backward search [6]). Instead, PCA creates new, uncorrelated features that are a linear combination of the original features. For a given data instance, its features are transformed via a dot product with a numeric vector to create a new feature; this vector is a principal component that serves as the direction of an axis upon which the data instance is projected. The new features are thus the projections of the original features into a new coordinate space defined by the principal components. To perform the actual dimensionality reduction, the user can follow a well-defined methodology to select the fewest new features that
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要