PCA

Kinder Chen
2 min readSep 28, 2021

--

Principal Component Analysis (PCA) is the most popular dimensionality reduction algorithm by far. First it identifies the hyperplane that lies closest to the data, and then it projects the data onto it. Before we can project the training set onto a lower-dimensional hyperplane, we first need to choose the right hyperplane. It seems reasonable to select the axis that preserves the maximum amount of variance, as it will most likely lose less information than the other projections. Another way to justify this choice is that it is the axis that minimizes the mean squared distance between the original dataset and its projection onto that axis. This is the basic idea behind PCA.

PCA identifies the axis that accounts for the largest amount of variance in the training set. It also finds a second axis, orthogonal to the first one, that accounts for the largest amount of remaining variance. If it were a higher-dimensional dataset, PCA would also find a third axis, orthogonal to both previous axes, and a fourth, a fifth, and so on. The ith axis is called the ith principal component (PC) of the data. For each principal component, PCA finds a zero-centered unit vector pointing in the direction of the PC. Since two opposing unit vectors lie on the same axis, the direction of the unit vectors returned by PCA is not stable.

How to find the principal components of a training set is important issue. There is a standard matrix factorization technique called Singular Value Decomposition (SVD) that can decompose the training set matrix X into the matrix multiplication of three matrices UΣV⊺, where V contains the unit vectors that define all the principal components that we are looking for.

PCA assumes that the dataset is centered around the origin. Once we have identified all the principal components, we can reduce the dimensionality of the dataset down to d dimensions by projecting it onto the hyperplane defined by the first d principal components. Selecting this hyperplane ensures that the projection will preserve as much variance as possible.

It is also possible to decompress the reduced dataset back to the original dimensions by applying the inverse transformation of the PCA projection. This won’t give back the original data, since the projection lost a bit of information, but it will likely be close to the original data. The mean squared distance between the original data and the reconstructed data (compressed and then decompressed) is called the reconstruction error.

--

--

Kinder Chen
Kinder Chen

Written by Kinder Chen

What happened couldn’t have happened any other way…

No responses yet