PCA, The math behind constrained optimisations , drawbacks and other important discussions

One of the most important mathematical concept required to understand the working of PCA is the concept of Eigenvectors and eigenvalues . If you know the algorithm , you must be aware that in order to reduce the dimensions of a certain dataset with dimension d to a smaller dimension k , PCA works by calculating the eigenvectors/values of the covariance matrix of the data set and select the top k values in descending order.

But why ? Why does the concept of eigenvalues and eigenvectors , gets involved during reduction of dimensions.

## What are Eigenvectors/eigenvalues?

For any square matrix A , its Eigenvectors and eigenvalues have the following relation:

Geometrically speaking , one can see that ,for given eigenvector-value pair , the matrix multiplication is equal to a mere scaling of the eigenvector by a factor lambda . Hence its direction is unchanged .

**Eigenvalues** and **eigenvectors** are **only for square matrices**. **Eigenvectors** are by definition nonzero. **Eigenvalues** may be equal to zero.

Now lets look at another concept used in PCA

## Constrained Optimisation

Sometimes , when one needs to find maxima/minima of any expression which in turn is following a certain constraint , you cannot equate its first derivative and equate it to zero. You must take care of the constraint given .

Such constrained optimisation problems are solved using the concept of LANGRANGE MULTIPLIERS.

Note: Langrange multipliers are always positive .

## The optimisation problem in PCA

Now lets see how the 2 concepts , Eigenvector/values and Langrangian multipliers are required to solve the PCA optimisation problem . Recall that how PCA is all about finding axes with maximum variance . Below is the optimisation equation of PCA, where S is the covariance matrix of the dataset , hence a d*d dimension square matrix (where d is the original dimension of the dataset) and u is the direction along which the variance has to be maximized.

you can see that the above equation is just a constrained optimisation problem , using langrange multiplier lambda in langrangian **L(u,lambda)** and equating** dL(u,lambda)/du** and **dL(u,lambda)/d(lambda) **to zero you can see that solving the above equation will give the solution of the form

#### S*u=lambda*u (eigenvector-eigenvalue relation)

so now you see how the solutions u are just eigenvectors of S .( And lambda the corresponding eigenvalues) Now these eigenvalues with decreasing order will give decreasing variance , hence for reducing d into k smaller dimensions we select the top k eigenvalues in descending order .

This post deals with the mathematics behind the optimisation of PCA algorithm. Other important interview questions are related to its limitations , comparision to tsne , and more!

## More posts :

Till then enjoy this