PCA — Principal Component Analysis(Part I)

Nithin Santhosh
2 min readJan 9, 2021

--

What is PCA? Why PCA? When to use PCA ?How PCA Works? — These might be the questions that will arise in your mind when you hear the word PCA for the first time .

PCA or Principal Component Analysis is nothing but a statistical procedure that uses an orthogonal (right angle) transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components. In other words, PCA is a statistical approach in which the values are plotted on to PCs(Principal Components) using orthogonal transformation and all the PCs are orthogonal to each other. Also, the first Principal Component will explain the majority of data i.e it will explain the majority of the variance.

Total information of a data set is nothing but total variance.

In Statistics, all the information is represented in the form of numbers. In Figure 1, each diamond shape represents the price of the house vs the Area in sqft. The diamond shape is information for that particular sqft area’s house so we can take it as variance i.e the difference from origin to first diamond shape is the variance for that particular diamond and so on. If we want to know the total information then the distance from the origin to all diamond shapes has to be taken, so that’s how total variance = total information.

Figure 1: Area of Plot vs Price of House

Total information = var(x1)+var(x2)+var(x3)+var(x4).

x1,x2,x3,x4 represents each diamond shape in the graph.

  • PCA is used to overcome featured redundancy in the data set. So we use PCA on high dimensional data.
  • PCA can only be applied to datasets with numeric values.
  • PCA gives better results when data is standardized. Standardization becomes extremely important when the predictors are in different units.
  • In order to apply PCA at least, there should be 3 dimensions (Variables/Column/Features)
  • PCA is a tool that helps to give a better visualization of high dimensional data.

In this article, I have tried my best to explain what is PCA and when to use PCA. In part II I will be explaining how PCA works with an example.

Thanks for reading the article.

--

--