An application of PCA in image classification

Principal component analysis (PCA) is a statistical technique that is widely used in unsupervised dimensionality reduction, data compression and high dimension data visalization. This article will first talk about the basic principles of principal component analysis, geometric significance and how to use the Singular Value Decomposition (SVD) to find principal components of a dataset. Next, an application of PCA in image classification will be introduced. Finally, we will discuss the implicit assumptions of the PCA approach and the implications for the results.


In signal processing, we believe that the signal has a large variance and the noise has a smaller one. The ratio of their variance, defined as the signal-to-noise ratio (SNR), is used to reflect the statisical significance of the signal.

So, whether it is for reducing the dimension or for data compression, the ultimate goal is to remove noise as as possible and retain useful information. PCA provides us a best approach to find directions with largest variation in the data space which highlight the diffirence of the data.

Considering that the total variance of the data set is the square sum of distance of all sample points to the origin after centralization, its quantitiy is independent of the coordinate system. When a direction is selected as the principal component, the total variance can be regarded as two parts, the signal, that is, the portion along the principal component direction and the noise, i.e., the portion orthogonal to the principal component direction. In order to improve the signal-to-noise ratio, we need to find a maximum variance direction in the data space.

The meaning of PCA can be explained in two aspects. On the one hand, the PCA performs orthogonal projection transformation (rotation in n-dimensional space) on the n-dimensional sample data so that the variance in the data is concentrated on several dimensions (axes) as much as possible (maximum variance explanation). On the other hand, PCA can be understood as searching for a linear projection in the data space so as to minimize the squared sum of the distances between the data points and their projections (minimum error explanation).


Suppose we have a set of observations of $p$ variables, and we want to reduce the data so that each observation can be descirbed with only $L$ variables, $L < p$. Suppose further, that the data are arranged as a set of $n$ data vectors $\mathbf{x_1}, \cdots \mathbf{x_n}$ with each $\mathbf{x_i}$ representing a single grouped observation of the $p$ variables.

In order to make the rotation of the sample points easier to operate, we have to substract the mean from each of the data dimensions to set the center of data points coincide with the origin. The mean substracted is the average across each dimension. Suppoose $X$ is the data matrix that has been centralized,

and each column vector of $X$ represent a sample point.

Then comes to the key part, how to reduce redundancies in data?

Suppose the othogonal matrix is $W$, and the covariance matrix of the projected data is given by

We want to select an appropriate rotation matrix $W$ such that $C’$ is a diagonal matrix $\Lambda$.

Notice that the left side of the last equation is the covariance matrix of original data while the right side of the equation has the same form as the Singular Value Decomposition, that is

Since the covariance matrix $C$ is positive semi-definite, the following equation holds

In fact, according to the geometrical meaning of SVD, it turns out that the eigenvector with the highest eigenvalue is the principle component of the data set. In general, once eigenvectors are found from the covariance matrix, the next step is to order them by eigenvalues, from highest to lowest, which give us the components in order of significance.

Application in image classification

In this section, we will introduce the application of PCA dimensionality reduction to image classification in combination with my research project. The image contains a lot of information, including color distribution, contrast, texture richness and the shape of the object to be detected and so on. First of all, the objects in the picture to be classified outside the background removed, and then use the statistical method to calculate the eigenvalues of the object. Because the feature dimension is too high (more than 80 dimensions), it is not suitable to classify the classifier directly, so PCA is used to reduce the feature space of the image before classification, so as to reduce noise and remove data redundancy.

Implicit assumptions and limits

Because some assumptions were introduced during the calculation, the PCA may not be applicable or give the wrong result in some cases. In this section, we will discuss the assumptions and limits of PCA.


In the paper that first proposed the PCA method, the data points are defined in Euclidean space, that is there is no difference in the units between the dimensions. But in the case of specific problems, different dimensions may take different unit scales, for example millimeters and meters or more difficult cases, that two dimensions can not be compared, such as image contrast and object shape.

The normalization of data is to scale the data so that it falls into a small, specific range. In some comparison and evaluation of indicators of treatment is often used to remove the data unit limit, to convert it into a non-dimensional pure value, to facilitate the different units or levels of indicators can be compared and weighted. One of the most typical is the normalization of data processing, that is, the data will be mapped to the uniform [0,1] interval, here are two standardized methods:

  • Min-Max normalization
    Minmax normalization is a linear transformation of the original data, the results fall into the [0,1] interval, the transfer function is as follows:where max is the maximum value of the sample data and min is the minimum value of the sample data. This method has a drawback is that when new data is added, it may lead to changes in max and min, need to be redefined. Also, when we use this method to normalize the color histogram in section 2, the color distribution information in image dropped.
  • Z-Score normalization
    Z-Score normalization, also known as standard deviation of the standard data, the data processed in accordance with the standard normal distribution, that is, mean 0 , the standard deviation of 1, the transformation function:where is the mean of all sample data and is the standard deviation of all sample data. This method is equivalent to replacing the covariance matrix in PCA with the correlation matrix.

Unsupervised method

PCA is an unsupervised method of analysis that does not distinguish between sample classes. It gives the direction of the variance of all the samples in the data set, but not the direction of the variance. Therefore, for the classification problem mentioned in section 2, the PCA may regard the difference in the category as the principal component, and the difference among the categories as noise, which will not help the classifier to classify the reduced dimension data. This is also based on the basic assumption that the large variance represents information in the data according to the definition of SNR.


In fact, the principle of PCA and SVD is the same, the data set for the rotation and tensile transformation (or determine a new base), and can not deal with non-linear characteristics. For this problem, nonlinear kernels can be introduced into normalization, and non-linear feature dimensions can be projected into a linear dimension, and PCA is used to reduce the dimensionality.

Gaussian distribution

Although PCA does not require the data to follow a Gaussian distribution, other features of non-Gaussian distribution data are ignored because only the mean and covariance are considered in the computation.